GOAP
I've spent some time this week thinking about how Goal-Oriented Action Planning (GOAP) might be used in Anjin's systems, so I wanted to give a quick run-down of what it is and how it works.
GOAP Explained
GOAP is a technique for managing the decision-making process of an AI agent. Most often seen in games, it allows designers to build complex behaviours by giving a set of high-level Goals to the agent, along with a set of Actions that the agent can take. A Planner then takes the current World State and the agent's goals, and returns a list of actions that the agent can take to achieve those goals.
So the main parts of a GOAP system are:
- World State / Blackboard
- Goals
- Actions
- Planner
Let's take a look at each of these in more detail.
World State / Blackboard
When building an AI system, the AI agent needs to have some kind of representation of the world it's in. This subset of the full world along with the agent's internal state is often called a Blackboard. These might be facts about the world like "The door is locked" or the agent itself, like "hunger level". The blackboard is the base state that the system uses to choose goals and the planner uses to determine what actions the agent can take to satisfy.
Goals
Goals are high-level objectives that the agent is trying to achieve. Goals are defined by some subset of the world state, a set of conditions that must be true for the goal to be considered achieved. For example, an agent might have the goal of "Eat Food", which would be triggered when when the agent's "hunger level" is above a certain threshold.
Actions
Actions are things that the agent can do. Each action has a set of preconditions and effects. Preconditions are conditions that must be true for the action to be taken, and effects are the changes that the action makes to the world state. For example, an action might be "Eat Berries", which would have a precondition of "Has Berries" and an effects of "Decrease hunger level by 3" and "Remove Berries".
Planner
The Planner brings all of the other pieces together, taking the current world state, the agent's goals, and the available actions, and returning a Plan. It does this in several steps:
- Pick a goal: Goals are ranked based on some criteria and the planner picks the highest ranked goal to work on
- Start at the end: The planner starts at the state that the goal represents and works backwards
- Build Paths: Actions are tested and their effects are applied backwards to create chains of actions
- Look for the current state: The planner continues to apply actions backwards until it reaches the current world state, meaning this goal is achievable by following that path
- Pick a plan: The planner then picks the lowest-cost path of actions to achieve the goal and returns it
In graph terminology, the possible world states are nodes and the actions are edges, allowing the planner to travel between states to find the best path to the goal. You can use different graph search algorithms to find the paths, but A-Star (A*) is a common choice, since the action cost can be used as a heuristic.
Connection to Anjin
Anjin is concerned with taking a set of high-level goals and a set of actions the user can take to make progress towards those goals. It's not a perfect mapping to GOAP, but there are enough similarities that I think it's worth exploring more.
The main issue with GOAP is that the world state, goals, and actions are all fairly tightly coupled. In Anjin, the user can set their own goals, so the system would need a way of representing those goals and any available actions in a way that the planner can understand. The open-ended nature of the goals make it difficult to represent them as a set of conditions, but it might be possible to use some kind of reward function to guide the planner, as you might see in a neural network.
I've been revisiting the "Practice" concept I discussed last week. Instead of having a strict structure, I'm thinking about allowing a planner take in a goal and a time limit (eg. 45 minutes) and having it return a set of actions that the user can take to make progress towards that goal. I'm tentatively calling the concept a "Session", as it's a focused period of time where the user can work towards a goal.
Wrapping Up
I've never implemented a GOAP system before, so the next step here is to get some practical experience with it. All the examples I've found for GOAP are tied to game engines, like Unity or Godot, but the idea does crop up in other places. For example, OpenAI's "function calling" system operates on a similar principle, where the agent is given a set of external functions it can call to achieve a goal. In that case, the planner is the LLM, and there is no real "planning" step; the agent just calls the functions that it thinks will help it achieve the goal.
I'll have to give more thought to how Anjin might represent goals and actions, but I think it's a promising direction to explore. Ideally, the user would be able to tell Anjin that they want to work on a particular goal for a set amount of time and Anjin would return a set of actions that the user could take to make progress towards that goal. Adding in a pomodoro-style timer to the interface could make it easier for a user to work through a session of several actions too, but that's a topic for another week.
See you next week!