# multiagent_intention_progression_with_blackbox_agents__15fe4956.pdf

Multi-Agent Intention Progression with Black-Box Agents

Michael Dann1 , Yuan Yao2 , Brian Logan3 and John Thangarajah1

1RMIT University 2Zhejiang University of Technology 3Utrecht University {michael.dann, john.thangarajah}@rmit.edu.au, yaoyuan@zjut.edu.cn, b.s.logan@uu.nl

We propose a new approach to intention progression in multi-agent settings where other agents are effectively black boxes. That is, while their goals are known, the precise programs used to achieve these goals are not known. In our approach, agents use an abstraction of their own program called a partially-ordered goal-plan tree (p GPT) to schedule their intentions and predict the actions of other agents. We show how a p GPT can be derived from the program of a BDI agent, and present an approach based on Monte Carlo Tree Search (MCTS) for scheduling an agent s intentions using p GPTs. We evaluate our p GPT-based approach in cooperative, selﬁsh and adversarial multi-agent settings, and show that it out-performs MCTS-based scheduling where agents assume that other agents have the same program as themselves.

1 Introduction

A key problem for an autonomous intelligent agent with multiple goals is what to do next : which goal the agent should be trying to achieve, and which means it should use to achieve it. In the popular Belief-Desire-Intention (BDI) approach to agents [Rao and Georgeff, 1992], this problem is termed the intention progression problem (IPP) [Logan et al., 2017]. BDI agents are characterised by the concepts of beliefs, goals and plans. Beliefs represent the information an agent has about itself, the environment and about other agents. Goals are states the agent would like to bring about. Plans are recipes for achieving goals, and are composed of primitive actions that directly change the state of the environment, and subgoals which are achieved by their own plans. An intention is formed when the agent commits to achieving a (top-level) goal utilising a particular plan. A key feature of BDI agents is their ability to simultaneously pursue multiple intentions. In order to do so, at each deliberation cycle, the agent must select which of its multiple intentions it should progress (i.e., intention selection) and, if the next step within the selected

Contact Author. Full source code and an extended version of this paper containing further experimental details is available at: https://github.com/mchldann/p GPT IJCAI.

intention is a subgoal, select the best plan to achieve it (i.e., plan selection). These two choices together form the intention progression problem. A number of approaches to various aspects of the intention progression problem have been proposed in the literature, including summary-information-based (SI) [Thangarajah et al., 2003; Thangarajah and Padgham, 2011], coveragebased (CB) [Waters et al., 2014; Waters et al., 2015] and Monte-Carlo Tree Search-based (MCTS) [Yao et al., 2014; Yao and Logan, 2016; Yao et al., 2016c] approaches. Much of this work has focussed on the single agent setting, where the key challenge is the interleaving of steps in plans in different intentions to avoid conﬂicts, i.e., when the execution of a step in one plan makes the execution of a step in another concurrently executing plan impossible. Recently, Dann et al. [2020] extended the MCTS-based approach in [Yao and Logan, 2016] to a multi-agent setting. In the multi-agent setting, how an agent progresses its intentions has implications for both the achievement of its own goals and the achievement of the goals of other agents, e.g., if the agent selects a plan that consumes a resource necessary for another agent to achieve its goal. While the intention-aware approach in [Dann et al., 2020] was shown to out-perform non-intentionaware scheduling such as [Yao and Logan, 2016], it assumes that agents have access to the plans comprising the other agents programs for achieving their goals. This is reasonable in multi-agent systems where the agents are co-designed, but is less plausible for agents that are not co-designed, i.e., where each agent may have no information about the plans used by other agents. One way of overcoming this limitation is for an agent to assume that the programs of other agents are the same as its own when progressing its intentions. However, such egocentric scheduling can give rise to conﬂicts if the plans used by other agents achieve subgoals in a different order, or order primitive actions differently. For example, an agent a1 whose plan to achieve the goal of doing the household chores contains subgoals to do the laundry, vacuum, and wash the dishes (in that order) may assume that another agent, a2, tasked with doing the chores will wash the dishes last. This may result in a conﬂict when a1 tries to use the sink to prepare lunch, if the program of a2 has washing the dishes as the ﬁrst subgoal. A more reasonable assumption to make is that the other agents use a similar program, but may order some subtasks

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

differently. In this paper, we propose a new approach to intention progression in a multi-agent setting, in which agents schedule their intentions and predict the actions of other agents based on an abstraction of their own program called a partially-ordered goal-plan tree (p GPT). A partially-ordered goal-plan tree is an alternating and-or tree that captures only the execution order implied by the dependency relationships inherent in the agent s program. We show how a p GPT can be derived from the program of a BDI agent, and how the MCTS-based approach in [Dann et al., 2020] can be extended and adapted to schedule an agent s intentions using p GPTs in multi-agent settings. As in [Dann et al., 2020], we evaluate our approach in cooperative, selﬁsh and adversarial multiagent settings. Our results indicate that p GPT-based scheduling out-performs MCTS-based scheduling where agents assume that other agents behave in the same way as themselves.

2 Preliminaries

In this section, we introduce and deﬁne the basic elements of our approach to intention progression, including beliefs, goals, actions and plans.

Beliefs and goals. The agent s beliefs represent the information an agent has about itself, the environment and about other agents. The agent s belief base B is a ﬁnite set of ground literals (proposition p or its negation p):

B = {b1, . . . , bn}

B is updated at each cycle to incorporate the agent s current percepts. We assume that B is consistent, i.e., there is no p such that p, p B. The agent s top-level goals represent states of affairs that the agent wants to bring about. The agent s goal base G is a ﬁnite set of ground literals:

G = {g1, . . . , gm}

G does not need to be consistent, e.g., conﬂicting goals may be achieved at different times.

Actions and plans. An agent can perform a set of primitive actions in the environment, A = {a1, . . . , ak}. The preconditions of an action ai are a set of literals φ = pre(ai) that must be true before the execution of the action, and the postconditions of the action are a set of literals ψ = post(ai) that are true after the execution of the action. We assume that actions are deterministic: if the preconditions of an action hold, then the postconditions of the action hold after executing the action. An action is executable given the agent s beliefs B if B |= φ. If the agent attempts to execute an action whose preconditions do not hold, the action fails (cannot be executed). To achieve the agent s goals, actions are organised into plans. Each goal g is associated with a set of plans π1, . . . , πn that achieve g. Each plan πi is of the form g : χ s1; . . . ; sm, where χ = pre(πi) is a set of literals specifying the context condition which must be true for πi to begin execution, and s1; . . . ; sm is a sequence of steps which are either actions or subgoals. A plan can be executed if its context condition holds, the precondition of each of its action steps holds when the step is reached, and each of its subgoal steps has an executable plan when the subgoal is reached. Note

that, for many agent plans, it is only necessary that B |= χ for the plan to be executable, as the postcondition of an action step or achievement of a subgoal si may establish the precondition of an action step sj or the context condition of plans for a subgoal step sj, i < j. This is termed a p-effect (preparatory effect) in [Thangarajah et al., 2003]. A goal g is considered achieved (and any intention with g as top-level goal is dropped) if (and only if) all the steps in a plan πi for g are successfully executed. We abuse notation slightly, and deﬁne the preconditions of a goal, g, as the union of the context conditions of the plans to achieve g, i.e., pre(g) = S πi pre(πi) (this is a technical device and only one of the plans for g must be executable for g to be executable ). The postcondition of a goal, g, is g itself, i.e., g = post(g). We denote by P the set of plans comprising the agent s program.

3 Partially-Ordered Goal-Plan Trees

In much of the work on intention progression in BDI agents, the relationships between the plans, actions and subgoals that can be used to achieve a goal are represented by a hierarchical structure termed a goal-plan tree (GPT) [Thangarajah et al., 2003; Thangarajah and Padgham, 2011; Yao et al., 2016a]. The root of a goal-plan tree is a goal-node representing a toplevel goal, and its children are plan nodes representing the potential plans to achieve the top-level goal. The agent only needs to execute one of these plans to achieve the goal, hence, the goal nodes are viewed as or-nodes. The children of a plan node are the action and subgoal nodes corresponding to the steps in the plan body. The agent needs to execute all of the child nodes to achieve the goal. Thus, plan nodes are viewed as ordered and-nodes. Each subgoal node has its associated plans as children, giving rise to a tree structure representing all possible ways an agent can achieve the top-level goal. While goal-plan trees have been shown to be effective for single-agent scheduling [Yao et al., 2016c; Yao and Logan, 2016] and multi-agent scheduling where the program(s) of the other agents are known [Dann et al., 2020], they are less appropriate in multi-agent settings where the agents are not co-designed. We therefore propose a new approach to multiagent intention progression in which agents schedule their intentions based on abstractions of their own programs which we call partially-ordered goal-plan trees (p GPT). A partiallyordered goal-plan tree is an alternating and-or tree that captures only the execution order implied by the p-effects in the agent s program, rather than strictly adhering to the textual order of steps speciﬁed by the developer. More precisely:

Deﬁnition 1 (Partially-Ordered Goal-Plan Tree) A p GPT T = (G, P, A, g0, children, { π: π P}) is an alternating and-or tree where:

elements of G A are or-nodes (goals and actions)1 and elements of P are and-nodes (plans);

g0 G is the root (top-level goal);

1Viewing action nodes as or-nodes without children is a technical device to simplify the deﬁnition.

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

children : G P 2G P A is a function assigning a (non-empty) set of children to (non-leaf) nodes in the tree;

or-nodes only have children in P: for g G, children(g) P;

and-nodes only have children in G A: for π P, children(π) G A; and

the set children(π) for each π P is partially ordered by π.

An execution of a p GPT is a traversal of the tree starting at the root, g0. If visiting an or-node, the next step in the traversal is to pick any child and visit it; if in an and-node, visit each child in an order consistent with π of the children. A p GPT T is executable if there is a traversal of T that is executable (given preand post conditions of plans and actions) from some initial state of the environment, i.e., if we start in this initial state and update it with the postconditions of each visited action, then the pre-conditions of the next action or context condition of the next plan in the traversal hold in the updated state. A p GPT T for a top-level goal g0 and agent program Π = {π1, . . . , πn} can be built recursively as follows. Set g0 to be the root of the tree (or-node) and add as children(g0) andnodes pi for each plan πi = gi : χi si 1; . . . ; si m where gi = g0. For each plan and-node, pi, add as children(pi) ornodes for the each of the steps of πi, si 1; . . . ; si m. To establish the ordering πi of the or-nodes representing the steps in πi, we proceed as follows:

for each pair of steps si j, si k, 1 j < k m, if a literal l post(si j) pre(si k) then si j πi si k, i.e., if si j establishes a precondition for si k, we add an ordering constraint that si j must be executed before si k (if si k is a subgoal, before any step in whichever plan is selected to achieve si k);

if there is a step si c, 1 c < j (or k < c m) where the complementary literal l post(si c), we add si c πi si j (respectively si k πi si c) to πi to ensure that the execution of si c does not clobber l.

Finally, for each or-node where the corresponding step sj is a (sub)goal, gj, we recurse, i.e., we add as children(gj) and-nodes for each plan to achieve gj, and so on.

For example, given the agent program Π = {π0, π1, π2}:

π0 : g0 : χ0 g1; a0; a1 π1 : g0 : χ1 a3; a4; a5 π2 : g1 : χ2 a2

where χ0 = c0 c2, χ1 = c1, χ2 = c2, pre(a0) = c0, pre(a1) = c3 c4, pre(a2) = c2, pre(a3) = c1, pre(a4) = c1, pre(a5) = c6, post(a0) = c3, post(a1) = c5, post(a2) = c4, post(a3) = c6, post(a4) = c7, post(a5) = c8, we can generate the p GPT shown in Figure 1. In Figure 1, solid arrows connect goal nodes to the plan nodes that achieve the goal, e.g., p0 and p1 are two plan nodes to achieve the goal

Figure 1: An example of a p GPT.

node g0; dashed arrows connect plan nodes to their execution steps, e.g., g1, a0, a1 are the execution steps of plan node p0; and double arrows indicate the partial ordering relationship between steps in a plan, e.g., action a0 must be executed before action a1, as one of the preconditions of a1 (c3) is established by the postcondition of a0.

4 Scheduling Approach

We now present a new approach to multi-agent intention scheduling based on p GPTs, which we call IB. We adapt and extend Dann et al. s [2020] Monte-Carlo Tree Search-based approach, IA, which was shown to outperform several competing methods in a multi-agent setting. We ﬁrst brieﬂy recall the approach of Dann et al. before explaining how we adapt it to handle partial ordering.

4.1 Intention Scheduling with MCTS

The Monte Carlo Tree Search (MCTS) algorithm is a wellknown heuristic search algorithm which has been shown to be successful in many multiplayer games [Browne et al., 2012]. The algorithm works by building a search tree incrementally through simulation. Starting at the root node, a tree policy is used to navigate to a leaf node. The tree policy strives to achieve a balance between exploration (traversing nodes that have rarely been visited) and exploitation (favouring steps that previously led to strong returns). The leaf node is then expanded, and a rollout policy is used to simulate steps until a terminal state is reached. Each node visited during the tree policy phase then has its average return updated based on the outcome of the simulation. Early work on applying MCTS to intention scheduling [Yao et al., 2014; Yao et al., 2016c; Yao and Logan, 2016] focussed on the single-agent setting. More recently, Dann et al. [2020] extended the approach to multi-agent scheduling by introducing a payoff matrix, Pij. For each (i, j) pair, the payoff matrix captures the assumed payoffs that agent i receives upon the completion of agent j s goals. For example, allied agents are modelled by deﬁning positive payoffs for both the agent s own goals and those of its allies, while adversarial agents are modelled by deﬁning positive payoffs for the agent s own goals but negative payoffs for the goals of other agents. In the scheduler s underlying MCTS algorithm, the payoff matrix is used to calculate the return for each agent at the end of a rollout.

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

4.2 IB: Multi-Agent Scheduling with p GPTs The scheduling approach taken in this work builds directly upon the IA scheduler of Dann et al. [2020]. However, in contrast to IA, we do not assume that the order in which other agents will execute their plans is known. Rather, we assume that other agents may act in any way that is consistent with the ordering implied by the p-effects in the agent s own program. Consequently, we model agents intentions using p GPTs, rather than GPTs. In the MCTS algorithm, this change impacts both node expansion and the rollout policy. Node expansion now adds branches for all steps that are executable according to the p GPT s ordering constraints. Similarly, the rollout policy chooses uniformly from amongst all such steps. This growth in the branching factor is illustrated in Figure 2, which shows a p GPT (left) and one possible linearisation of it into a GPT (right). Suppose that nodes g0 and p0 have so far been traversed. In the (totally ordered) GPT, the only step now executable is a0 (providing its pre-conditions hold). However, in the partially ordered p GPT, both a0 and g1 are executable. An important consideration here is the amount of time taken to compute the set of executable steps, since this calculation is performed repeatedly during simulation. In Dann et al. s [2020] GPT-based approach this is straightforward: since GPTs are totally ordered, one can just store a pointer to the next node in each GPT. However, in our p GPT-based approach, it is necessary to calculate the set of all steps that are executable according to the ordering constraints. This can be achieved by recursing through the tree structure, though such recursion is relatively expensive. Instead, we propose a more efﬁcient method: First, for each node in each p GPT, we store a list of its unmet temporal dependencies, i.e. the set of steps that, according to , must be executed before the node in question. Then, each time a node is executed, we loop through its temporal dependents (i.e., nodes that must come afterwards according to ) and update their lists of unmet dependencies. If all dependencies of a node are now met, it is added to a list of candidate steps. Finally, the candidate steps are ﬁltered to ensure that all other logical requirements are met. In particular, to be executable, a candidate step must have its pre-conditions met. In addition, if the agent previously executed a plan or goal node, we force it to continue executing children of that node until it executes an action node. The rationale here is that it makes little sense for an agent to commit to a plan or goal until it has executed an underlying action. Pseudocode for the computation of candidate steps can be found in the extended version of the paper.

Figure 2: A p GPT (left) and one possible linearisation (right).

5 Evaluation

In this section, we evaluate our approach to multi-agent scheduling. We compare the performance of the IB p GPTbased scheduler with IA, the GPT-based approach presented in [Dann et al., 2020], and random p GPT and GPT-based schedulers. As in [Dann et al., 2020], we consider three settings: fully aware, where agents know all the top-level goals of other agents (termed full vision in [Dann et al., 2020]); partially aware, where agents know half of the other agents top-level goals ( partial vision ); and unaware, where agents know none of the other agents top-level goals ( na ıve ). In each setting, the agents are assumed to have plans for the toplevel goals they are aware of. However the IA agents schedule on the assumption that the other agents have the same plans (and hence the same GPTs) for these goals as themselves. That is, unlike [Dann et al., 2020], the plans/GPTs of the other agents are not available to the agents; the other agents are essentially black boxes . As in [Dann et al., 2020], we assume that the agents share the same model of actions (i.e., an action has the same preand postconditions for each agent). We also assume that each agent has the same set of actions available. While this is not true in all cases, it holds for a large class of applications, e.g., where the agents interact with their environment via an API. While p GPTs can be derived from agent programs as explained in Section 3, in the interests of generality and simplicity, we implemented a p GPT generator that is capable of generating synthetic p GPTs corresponding to agent programs of varying complexity (and hence multi-agent scheduling problems of varying difﬁculty). The p GPT generator is based on the GPT generator developed for the Intention Progression Competition.2 (A detailed description of the p GPT generator can be found in the extended version of the paper.) To generate the GPTs used by the IA agents, for each IA agent in each trial we generated a random linearisation of the p GPT used by the p GPT-based agent, similarly to Figure 2. Different linearisations will typically achieve subgoals and execute steps in a different order, but each ordering is consistent with . Each linearisation thus corresponds to a GPT derived from a valid program for the task, e.g., written by a different agent developer. All agents are therefore effectively using different programs for a given top-level goal, but while the p GPT agents abstract their programs into p GPTs, the IA agents schedule and predict the next steps of other agents using the GPTs corresponding to the their own programs. The random p GPT and GPT-based schedulers behave identically to the MCTS rollout policies used by IB and IA, respectively. That is, they select uniformly from the set of all executable steps, E. The only difference between the p GPT and GPT-based schedulers is that, for the GPT-based scheduler, E is calculated via a linearisation of the agent s p GPTs, similarly to IA, and thus its behaviour is more restricted. The generated p GPTs had a depth of 5. Each subgoal had two corresponding plans, with each plan containing three actions and one subgoal (except the lowest-level plans, which contained three actions only). For each trial, 12 GPTs were generated, with 6 assigned to each agent. Thus, for example,

2Available from intentionprogression.org.

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

the partially-aware IB schedulers had access to 9 p GPTs (6 for their own goals and 3 for the goals of the other agent). The partially-aware IA schedulers likewise had access to 9 GPTs, but the 3 corresponding to the goals of the other agent were derived from their own programs for those goals). The full set of p GPTs contained 80 environment variables, where this ﬁgure was chosen to yield enough scheduling clashes that the tasks were non-trivial, yet not excessively difﬁcult. We evaluated our approach under the three multi-agent settings considered by Dann et al. [2020]: allied, neutral and adversarial. In each of these settings there are two agents. The aim in the allied setting is to maximise the total number of team goals achieved. In the neutral setting, the aim is to maximise the agent s own goal attainment and disregard the number of goals achieved by the external agent, i.e. to act selﬁshly. In the adversarial setting, the aim is to maximise the agent s own goal attainment while minimising the number of goals achieved by the external agent. Accordingly, we use the following scoring methodology: Allied: we report own goals + ally goals. Each agent is assigned 6 p GPTs, so the possible score range is [0, 12]. Neutral: we report own goals only, so the possible score range is [0, 6]. Adversarial: we report own goals opponent goals, so the possible score range is [-6, 6]. The results of these experiments are provided in Tables 1, 2 and 3, with scores averaged over 500 randomly generated p GPT forests. Cell values indicate the score achieved by the agent from that row when partnered with the agent above. The numeric sufﬁxes to the MCTS-based agents names indicate their level of awareness of the other agent s goals (100 =

fully aware, 50 = partially aware, 0 = unaware). A striking feature of the results is their consistency: In each setting and for each partner agent type, the fully aware IB scheduler performed best. Given that IB is more ﬂexible than IA, both in terms of its ability to act and its ability to model the behaviour of other agents, this may seem unsurprising. However, recall from Section 4.2 that the MCTS search tree has a greater branching factor under IB than under IA. Since IA and IB were afforded the same number of simulated rollouts this means that the rollouts were spread more thinly under IB, with nodes average returns calculated from fewer samples. IB s consistent edge over IA shows that this tradeoff was worthwhile, i.e., IB s greater uncertainty about the return was more than compensated for by its extra ﬂexibility. Note that the signiﬁcance of certain results is best appreciated by considering the number of goals unachieved. For example, in the allied setting, the (IA 100, IA 100) partnership averaged 10.157 goals, while (IB 100, IB 100) averaged 11.475; an increase of only 1.3 goals. However, since the best possible score in this setting is 12, this represents a 72% reduction in unachieved goals. The unaware agents IA 0 and IB 0 have no knowledge of other agents goals or the plans used to achieve those goals, and so are unable to make any predictions about the next action of the other agent. As such, they are essentially doing single-agent scheduling with no consideration of the other agents intentions, which is the baseline case in [Dann et al., 2020]. The small improvement in performance of IB 0 over IA 0 in the allied and neutral settings (4% increase in goals achieved by two IB 0 agents over two IA 0 agents in the allied setting, and 3% in neutral) is attributable to the IB agents having more ﬂexibility in the order in which actions are exe-

rand p GPT rand GPT IB 100 IB 50 IB 0 IA 100 IA 50 IA 0

rand p GPT 4.023 3.771 9.103 7.743 5.766 7.744 6.864 5.631 rand GPT 3.771 3.634 8.327 7.400 5.850 7.292 6.649 5.416 IB 100 9.103 8.327 11.475 11.158 10.779 10.904 10.661 10.247 IB 50 7.743 7.400 11.158 10.446 9.342 10.342 9.895 9.204 IB 0 5.766 5.850 10.779 9.342 7.180 9.522 8.673 7.097 IA 100 7.744 7.292 10.904 10.342 9.522 10.157 9.928 9.357

Total Score

IA 50 6.864 6.649 10.661 9.895 8.673 9.928 9.343 8.466 IA 0 5.631 5.416 10.247 9.204 7.097 9.357 8.466 6.897

Table 1: Allied setting. The best total team score with each ally type is bolded.

Other Scheduler

rand p GPT rand GPT IB 100 IB 50 IB 0 IA 100 IA 50 IA 0

rand p GPT 2.001 2.070 2.034 1.989 2.017 2.029 2.000 2.102 rand GPT 1.723 1.866 1.793 1.781 1.830 1.840 1.883 1.863 IB 100 5.537 5.211 5.339 5.299 5.548 5.098 5.098 5.282 IB 50 4.677 4.631 4.510 4.419 4.621 4.452 4.468 4.568 IB 0 3.738 3.907 3.510 3.475 3.585 3.621 3.592 3.787 IA 100 4.577 4.538 4.418 4.355 4.643 4.455 4.418 4.562

IA 50 4.007 4.171 3.898 3.795 4.025 3.945 3.852 4.175 IA 0 3.457 3.669 3.319 3.115 3.235 3.282 3.296 3.479

Table 2: Neutral setting. The best results achieved with respect to the other scheduler type are bolded.

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

rand p GPT rand GPT IB 100 IB 50 IB 0 IA 100 IA 50 IA 0

rand p GPT 0.000 0.224 -4.839 -3.001 -1.767 -3.429 -2.274 -1.270 rand GPT -0.224 0.000 -4.205 -2.951 -2.094 -3.395 -2.605 -1.690 IB 100 4.839 4.205 0.000 2.280 4.317 1.666 2.542 3.735 IB 50 3.001 2.951 -2.280 0.000 2.050 -0.471 0.746 2.058 IB 0 1.767 2.094 -4.317 -2.050 0.000 -2.604 -0.977 0.652 IA 100 3.429 3.395 -1.666 0.471 2.604 0.000 1.280 2.672

IA 50 2.274 2.605 -2.542 -0.746 0.977 -1.280 0.000 1.489 IA 0 1.270 1.690 -3.735 -2.058 -0.652 -2.672 -1.489 0.000

Table 3: Adversarial setting. The best net score (own goals minus opponent goals) achieved against each opponent type is bolded.

cuted. This allows each IB agent to avoid more conﬂicts between its own intentions. However, neither the IB 0 nor the IA 0 agents can effectively avoid conﬂicts with the intentions of the other agent (since these are unknown). As the results show, as the agents knowledge of the other agent s goals increases, the performance of both IB and IA increases, but IB is better able to exploit the additional information. For example, in the neutral setting, two IB 100 agents can achieve 20% more goals than two IA 100 agents because they are better at avoiding conﬂicts with the other agent s intentions. Indeed the total number of goals achieved by two IB 100 agents in the neutral setting is greater than two IA 100 agents in the allied setting, i.e., two selﬁsh IB 100 agents are more effective than two cooperating IA 100 agents. A similar pattern is seen when IB is paired with other schedulers. The experiments also show that a key result from [Dann et al., 2020] holds in the black-box setting; namely, both IA and IB improve as they are afforded more knowledge of the other agent s goals, demonstrating that they truly are intentionaware. In the extended version of the paper, we break the allied setting scores down into own goals and ally goals. From this, it is evident that the fully aware IB scheduler was the most effective at helping its partner achieve goals, similar to ﬁndings in [Dann et al., 2020].

6 Related Work The ﬁrst MCTS-based approach to intention scheduling with BDI agents was that of Yao et al. [2014]. They used a variant of MCTS called Single-Player MCTS [Schadd et al., 2012] to schedule the intentions of a single agent at the level of plans rather than actions. The work was later extended to scheduling intentions at the action level [Yao and Logan, 2016], with deadlines [Yao et al., 2016b] and for exploiting synergies [Yao et al., 2016c]. This work on single agent scheduling formed the basis for the work by Dann et al. [2020] on using MCTS-based scheduling for multi-agent settings. There have been other approaches to scheduling intentions. For example, the early work by Thangarajah et al. [2002; 2003; 2011] and Clement et al. [2007; 1999; 2000] used the notion of summary information to look-ahead at the intention structures and schedule accordingly. Whilst Thangarajah et al. focussed on scheduling the intentions of a single agent in BDI setting using GPTs, Clement et al. focussed on centrally scheduling the intentions of multiple agents in planning agents using hierarchical task networks (HTNs).

Another line of work considers incorporating an HTN planner into a BDI agent language, thus providing the ability to look-ahead and ﬁnd solutions to ensure the successful execution of an intention, when necessary [Sardi na et al., 2006; de Silva, 2017]. Whilst these methods focus on ﬁnding solutions to achieve a single intention and do not take into account interactions with other concurrent intentions, it could be possible to incorporate the work of Clement et al., mentioned above, to centrally schedule intentions of BDI agents. Conversely, de Silva [2018] considers incorporating ideas from the BDI paradigm into HTN planning, so as to support interleaved deliberation, acting, and failure recovery. Again, however, this work focuses only on the pursuit of single goals. Finally, we note the coverage-based approach as used in [Thangarajah et al., 2012; Waters et al., 2014; Waters et al., 2015], where the probability of executing plans to achieve a goal in the different possible states of the environment is used to schedule intentions. The intuition is to progress the intention that has the highest probability of becoming nonexecutable in future states. In contrast to our work, this approach has so far been applied only to single-agent settings. Beyond the differences already mentioned, unlike our approach, all of the above methods require full knowledge of the agents intention structures.

7 Conclusion

In this paper we proposed an approach to multi-agent intention progression in environments where the other agents are not co-designed, and thus their precise program(s) are unknown. We evaluated our approach in allied, neutral and adversarial settings, and showed that it outperformed MCTSbased scheduling where agents assume that other agents will execute plans in the same order as themselves. A possible direction for future work is to further relax the assumptions regarding other agents, e.g. by assuming that other agents will pursue some subset of all known goals in the environment, but where this subset must be inferred at runtime.

Acknowledgments

Yuan Yao was supported by National Natural Science Foundation of China (61906169) and Zhejiang Provincial Natural Science Foundation of China (LQ19F030009).

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

References [Browne et al., 2012] Cameron B Browne, Edward Powley, Daniel Whitehouse, Simon M Lucas, Peter I Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton. A Survey of Monte Carlo Tree Search Methods. IEEE Transactions on Computational Intelligence and AI in games, 4(1):1 43, 2012. [Clement and Durfee, 1999] Bradley J. Clement and Edmund H. Durfee. Theory for coordinating concurrent hierarchical planning agents using summary information. In Proceedings of the Sixteenth National Conference on Artiﬁcial Intelligence (AAAI 1999), pages 495 502, 1999. [Clement and Durfee, 2000] Bradley J. Clement and Edmund H. Durfee. Performance of coordinating concurrent hierarchical planning agents using summary information. In Proceedings of the 4th International Conference on Multi-Agent Systems, pages 373 374, 2000. [Clement et al., 2007] Bradley J. Clement, Edmund H. Durfee, and Anthony C. Barrett. Abstract reasoning for planning and coordination. Journal of Artiﬁcial Intelligence Research, 28:453 515, 2007. [Dann et al., 2020] Michael Dann, John Thangarajah, Yuan Yao, and Brian Logan. Intention-aware multiagent scheduling. In B. An, N. Yorke-Smith, A. El Fallah Seghrouchni, and G. Sukthankar, editors, Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2020), pages 285 293, 2020. [de Silva, 2017] Lavindra de Silva. BDI agent reasoning with guidance from HTN recipes. In Proceedings of the 16th Conference on Autonomous Agents and Multi Agent Systems (AAMAS 2017), pages 759 767, 2017. [de Silva, 2018] Lavindra de Silva. HTN acting: A formalism and an algorithm. In Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2018), pages 363 371, 2018. [Logan et al., 2017] Brian Logan, John Thangarajah, and Neil Yorke-Smith. Progressing intention progresson: A call for a goal-plan tree contest. In Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2017), pages 768 772, 2017. [Rao and Georgeff, 1992] Anand S. Rao and Michael P. Georgeff. An abstract architecture for rational agents. In Proceedings of the 3rd International Conference on Principles of Knowledge Representation and Reasoning (KR 1992), pages 439 449, 1992. [Sardi na et al., 2006] Sebastian Sardi na, Lavindra de Silva, and Lin Padgham. Hierarchical planning in BDI agent programming languages: a formal approach. In Proceedings of the 5th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2006), pages 1001 1008, 2006. [Schadd et al., 2012] Maarten P. D. Schadd, Mark H. M. Winands, Mandy J. W. Tak, and Jos W. H. M. Uiterwijk.

Single-player Monte-Carlo tree search for Same Game. Knowledge-Based Systems, 34:3 11, 2012. [Thangarajah and Padgham, 2011] John Thangarajah and Lin Padgham. Computationally effective reasoning about goal interactions. Journal of Automated Reasoning, 47(1):17 56, 2011. [Thangarajah et al., 2002] John Thangarajah, Michael Winikoff, Lin Padgham, and Klaus Fischer. Avoiding resource conﬂicts in intelligent agents. In Proceedings of the 15th European Conference on Artiﬁcial Intelligence (ECAI 2002), pages 18 22, 2002. [Thangarajah et al., 2003] John Thangarajah, Lin Padgham, and Michael Winikoff. Detecting & avoiding interference between goals in intelligent agents. In Proceedings of the Eighteenth International Joint Conference on Artiﬁcial Intelligence (IJCAI-03), pages 721 726, 2003. [Thangarajah et al., 2012] John Thangarajah, Sebastian Sardina, and Lin Padgham. Measuring plan coverage and overlap for agent reasoning. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2012), pages 1049 1056, 2012. [Waters et al., 2014] Max Waters, Lin Padgham, and Sebastian Sardina. Evaluating coverage based intention selection. In Proceedings of the 13th International Conference on Autonomous Agents and Multi-agent Systems (AAMAS 2014), pages 957 964, 2014. [Waters et al., 2015] Max Waters, Lin Padgham, and Sebastian Sardi na. Improving domain-independent intention selection in BDI systems. Autonomous Agents and Multi Agent Systems, 29(4):683 717, 2015. [Yao and Logan, 2016] Yuan Yao and Brian Logan. Actionlevel intention selection for BDI agents. In Proceedings of the 15th International Conference on Autonomous Agents and Multiagent Systems, pages 1227 1236, 2016. [Yao et al., 2014] Yuan Yao, Brian Logan, and John Thangarajah. SP-MCTS-based intention scheduling for BDI agents. In Proceedings of the 21st European Conference on Artiﬁcial Intelligence (ECAI 2014), 2014. [Yao et al., 2016a] Yuan Yao, Lavindra de Silva, and Brian Logan. Reasoning about the executability of goal-plan trees. In Proceedings of the 4th International Workshop on Engineering Multi-Agent Systems (EMAS 2016), pages 181 196, Singapore, May 2016. [Yao et al., 2016b] Yuan Yao, Brian Logan, and John Thangarajah. Intention selection with deadlines. In Proceedings of the 22nd European Conference on Artiﬁcial Intelligence (ECAI-2016), pages 1700 1701, 2016. [Yao et al., 2016c] Yuan Yao, Brian Logan, and John Thangarajah. Robust execution of BDI agent programs by exploiting synergies between intentions. In Proceedings of the Thirtieth AAAI Conference on Artiﬁcial Intelligence (AAAI 2016), pages 2558 2565, 2016.

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)