AI agents are systems that use LLMs to autonomously plan and execute tasks. Unlike simple chat interactions, agents can use tools, maintain state, and work towards goals over multiple steps.

Core Concepts

Agent Loop

The fundamental pattern:

while goal not achieved:
    1. Observe current state
    2. Reason about next action
    3. Execute action (often using tools)
    4. Update state with results

Components

  • LLM — The reasoning engine
  • Tools — External capabilities the agent can invoke
  • Memory — Short-term (conversation) and long-term (persistent storage)
  • Planning — Strategy for achieving goals

Reasoning Patterns

ReAct (Reasoning + Acting)

Interleaves thinking and action:

Thought: I need to find the current weather in London.
Action: get_weather(location="London")
Observation: Partly cloudy, 15°C
Thought: Now I can answer the user's question.
Answer: The weather in London is partly cloudy at 15°C.

Chain-of-Thought (CoT)

Step-by-step reasoning before answering. Can be triggered with “Let’s think step by step” or explicit reasoning structure.

Tree-of-Thought (ToT)

Explores multiple reasoning branches:

  1. Generate several possible next steps
  2. Evaluate each branch
  3. Select most promising
  4. Backtrack if needed

Useful for planning and problem-solving where the first approach may fail.

Reflexion

Self-evaluation and improvement:

  1. Attempt task
  2. Evaluate outcome
  3. Reflect on failures
  4. Retry with learned insights

Tool Use

Function Calling

Modern LLMs support structured tool definitions:

{
  "name": "search_web",
  "description": "Search the web for information",
  "parameters": {
    "query": {"type": "string", "description": "Search query"}
  }
}

The model outputs structured calls that the system executes.

Common Tool Categories

  • Information retrieval — Web search, database queries, file reading
  • Code execution — Run code, shell commands
  • External APIs — Send emails, create tickets, update systems
  • Human interaction — Ask for clarification, approval

Tool Design Principles

  • Clear, specific descriptions
  • Well-defined parameter schemas
  • Appropriate granularity (not too broad or narrow)
  • Error handling and feedback

Memory

Short-term Memory

Conversation history within context window. Strategies for long conversations:

  • Summarisation
  • Sliding window
  • Relevant message retrieval

Long-term Memory

Persistent storage across sessions:

  • Vector Databases for semantic retrieval
  • Structured databases for facts
  • Knowledge graphs for relationships

Working Memory

Scratchpad for intermediate results during complex tasks.

Multi-Agent Systems

Architectures

Hierarchical

Orchestrator Agent
    ├── Research Agent
    ├── Coding Agent
    └── Review Agent

Peer-to-peer
Agents communicate directly, no central controller.

Debate
Multiple agents argue different perspectives, synthesise conclusion.

Coordination Patterns

  • Handoff — Pass task to specialist agent
  • Collaboration — Agents work on subtasks in parallel
  • Critique — One agent reviews another’s work
  • Voting — Multiple agents vote on decisions

Agentic Coding

Agents specialised for software development tasks.

Capabilities

  • Navigate and understand codebases
  • Write, edit, and refactor code
  • Run tests and fix failures
  • Execute shell commands
  • Interact with version control

Products

Best Practices

  • Provide clear project context (AGENTS.md, CLAUDE.md)
  • Use version control for safety
  • Review changes before committing
  • Start with smaller, well-defined tasks

Human-in-the-Loop

Approval Gates

Require human approval for:

  • Destructive actions (delete, overwrite)
  • External communications (emails, API calls)
  • Expensive operations
  • Uncertain decisions

Feedback Integration

  • Corrections improve future behaviour
  • Preferences guide decision-making
  • Escalation when confidence is low

Building Agents

Frameworks

Patterns

Minimal agent:

while True:
    response = llm.chat(messages, tools=tools)
    if response.tool_calls:
        for call in response.tool_calls:
            result = execute_tool(call)
            messages.append(tool_result(result))
    else:
        return response.content

With planning:

plan = llm.generate_plan(task)
for step in plan:
    result = execute_step(step)
    if not successful(result):
        plan = llm.replan(task, result, plan)

Evaluation

Metrics

  • Task completion — Did it achieve the goal?
  • Efficiency — Steps taken, tokens used
  • Safety — Avoided harmful actions?
  • Cost — API costs, compute time

Benchmarks

Challenges

  • Reliability — Agents can get stuck, loop, or make errors
  • Cost — Many LLM calls add up
  • Latency — Sequential actions are slow
  • Safety — Autonomous actions need guardrails
  • Debugging — Hard to trace multi-step failures

Resources