AI agents are systems that use LLMs to autonomously plan and execute tasks. Unlike simple chat interactions, agents can use tools, maintain state, and work towards goals over multiple steps.
Core Concepts
Agent Loop
The fundamental pattern:
while goal not achieved:
1. Observe current state
2. Reason about next action
3. Execute action (often using tools)
4. Update state with results
Components
- LLM — The reasoning engine
- Tools — External capabilities the agent can invoke
- Memory — Short-term (conversation) and long-term (persistent storage)
- Planning — Strategy for achieving goals
Reasoning Patterns
ReAct (Reasoning + Acting)
Interleaves thinking and action:
Thought: I need to find the current weather in London.
Action: get_weather(location="London")
Observation: Partly cloudy, 15°C
Thought: Now I can answer the user's question.
Answer: The weather in London is partly cloudy at 15°C.
Chain-of-Thought (CoT)
Step-by-step reasoning before answering. Can be triggered with “Let’s think step by step” or explicit reasoning structure.
Tree-of-Thought (ToT)
Explores multiple reasoning branches:
- Generate several possible next steps
- Evaluate each branch
- Select most promising
- Backtrack if needed
Useful for planning and problem-solving where the first approach may fail.
Reflexion
Self-evaluation and improvement:
- Attempt task
- Evaluate outcome
- Reflect on failures
- Retry with learned insights
Tool Use
Function Calling
Modern LLMs support structured tool definitions:
{
"name": "search_web",
"description": "Search the web for information",
"parameters": {
"query": {"type": "string", "description": "Search query"}
}
}The model outputs structured calls that the system executes.
Common Tool Categories
- Information retrieval — Web search, database queries, file reading
- Code execution — Run code, shell commands
- External APIs — Send emails, create tickets, update systems
- Human interaction — Ask for clarification, approval
Tool Design Principles
- Clear, specific descriptions
- Well-defined parameter schemas
- Appropriate granularity (not too broad or narrow)
- Error handling and feedback
Memory
Short-term Memory
Conversation history within context window. Strategies for long conversations:
- Summarisation
- Sliding window
- Relevant message retrieval
Long-term Memory
Persistent storage across sessions:
- Vector Databases for semantic retrieval
- Structured databases for facts
- Knowledge graphs for relationships
Working Memory
Scratchpad for intermediate results during complex tasks.
Multi-Agent Systems
Architectures
Hierarchical
Orchestrator Agent
├── Research Agent
├── Coding Agent
└── Review Agent
Peer-to-peer
Agents communicate directly, no central controller.
Debate
Multiple agents argue different perspectives, synthesise conclusion.
Coordination Patterns
- Handoff — Pass task to specialist agent
- Collaboration — Agents work on subtasks in parallel
- Critique — One agent reviews another’s work
- Voting — Multiple agents vote on decisions
Agentic Coding
Agents specialised for software development tasks.
Capabilities
- Navigate and understand codebases
- Write, edit, and refactor code
- Run tests and fix failures
- Execute shell commands
- Interact with version control
Products
Best Practices
- Provide clear project context (AGENTS.md, CLAUDE.md)
- Use version control for safety
- Review changes before committing
- Start with smaller, well-defined tasks
Human-in-the-Loop
Approval Gates
Require human approval for:
- Destructive actions (delete, overwrite)
- External communications (emails, API calls)
- Expensive operations
- Uncertain decisions
Feedback Integration
- Corrections improve future behaviour
- Preferences guide decision-making
- Escalation when confidence is low
Building Agents
Frameworks
- LangChain — Popular, batteries-included
- LangGraph — Graph-based agent workflows
- LlamaIndex — Data-focused agents
- CrewAI — Multi-agent orchestration
- AutoGen — Microsoft’s multi-agent framework
- Semantic Kernel — Microsoft’s AI orchestration SDK
- Pydantic AI — Type-safe agent framework
Patterns
Minimal agent:
while True:
response = llm.chat(messages, tools=tools)
if response.tool_calls:
for call in response.tool_calls:
result = execute_tool(call)
messages.append(tool_result(result))
else:
return response.contentWith planning:
plan = llm.generate_plan(task)
for step in plan:
result = execute_step(step)
if not successful(result):
plan = llm.replan(task, result, plan)Evaluation
Metrics
- Task completion — Did it achieve the goal?
- Efficiency — Steps taken, tokens used
- Safety — Avoided harmful actions?
- Cost — API costs, compute time
Benchmarks
- SWE-bench — Real GitHub issues
- WebArena — Web task automation
- AgentBench — Diverse agent tasks
- GAIA — General AI assistants
Challenges
- Reliability — Agents can get stuck, loop, or make errors
- Cost — Many LLM calls add up
- Latency — Sequential actions are slow
- Safety — Autonomous actions need guardrails
- Debugging — Hard to trace multi-step failures