Back

GenAI: What is an AI agent?

This working memo aims to understand what is meant by "AI agent", how it is defined in the wild and how to classify and categorise them?

Defining AI agent

When looking up for definitions of an AI agent, one finds similar, yet slightly different descriptions. An article from IBM describes AI agent as "a system or program that is capable of autonomously performing tasks on behalf of a user or another system by designing its workflow and utilizing available tools [1]. They also make a distinction between agentic and non-agentic: Agentic AI chatbots learn to adapt to user expectations over time. Non-agentic AI chatbots are ones without available tools, memory or reasoning.

AWS says (AI) agent is a software program that can interact with its environment, collect data, and use the data to perform self-determined tasks to meet predetermined goals. Humans set goals, but an AI agent independently chooses the best actions it needs to perform to achieve those goals [2].

Google puts it as follows: AI agents are software systems that use AI to pursue goals and complete tasks on behalf of users. They show reasoning, planning, and memory and have a level of autonomy to make decisions, learn, and adapt [4].

Anthropic goes a bit deeper and practical in their article and associated Youtube video https://youtu.be/LP5OCa20Zpg. They make an architectural distinction between workflows and agents: "workflows are systems where LLMs and tools are orchestrated through predefined code paths. Agents are systems where LLMs dynamically direct their own process and tool usage, maintaining control over how they accomplish tasks" [8].

Consolidating the description

Consolidated characteristics of an AI agent:

What I think is missing from the descriptions:

My own running definition of a LLM based AI agent:

An AI agent in the context of LLM is a program that achieves predetermined goals by defining its own path from prompt to prompt, incorporating additional logic. It can be invoked immediately, on a schedule, or from an event.

Marketed benefits

Risks and limitations

Adding knowledge / tools

An agent often requires information about the world in order to process the task at hand. This is possible my providing the agent a set of tools that can include external data, web search, APIs or other agents (and their tools). Once the needed information is retrieved the agent's "knowledge" can be updated.[1]

Note: (at the time of writing) not all vendors and APIs supports tools calling.

Types

Architectures

No one standard architecture exists for building AI agents [1]. Some paradigms:

Best practices

Anthropic states that "the most successful implementations weren't using complex frameworks or specialized libraries. Instead, they were building with simple, composable patterns" [8]. Their recommendation is to use simplest solution possible and iterate forward from there.

References