# budgetconstrained_coalition_strategies_with_discounting__bacfdbea.pdf

Budget-Constrained Coalition Strategies with Discounting

Lia Bozzone1 and Pavel Naumov2

1Vassar College 2King s College lbbozzone@gmail.com, pgn2@cornell.edu

Discounting future costs and rewards is a common practice in accounting, game theory, and machine learning. In spite of this, existing logics for reasoning about strategies with cost and resource constraints do not account for discounting. The paper proposes a sound and complete logical system for reasoning about budget-constrained strategic abilities that incorporates discounting into its semantics.

1 Introduction Several logical systems for reasoning about agent and coalition power in game-like settings have been previously proposed. Among them are coalition logics [Pauly, 2001; 2002], ATL [Alur et al., 2002], ATEL [van der Hoek and Wooldridge, 2003], ATLES [Walther et al., 2007], know-how logics [ Agotnes and Alechina, 2019; Wang, 2018; Naumov and Tao, 2017; Fervari et al., 2017; Naumov and Tao, 2018b; 2018a; 2019], and STIT [Belnap and Perloff, 1990; Xu, 1995; Horty, 2001]. Some of these systems have been extended to incorporate resources and costs of actions [Alechina et al., 2011; Cao and Naumov, 2020; Alechina et al., 2011; Cao and Naumov, 2017; Alechina and Logan, 2018; Alechina et al., 2016; Della Monica et al., 2011; Alechina et al., 2017; 2015]. Even in the case of multi-step actions, these systems treat current and future costs equally. At the same time, in game theory, accounting, and machine learning, costs of multi-step transitions are often discounted to reﬂect the fact that future costs and earnings have lesser present values. Thus, there is a gap between the way resources and costs currently are treated in logic and the way they are accounted for in other ﬁelds. To address this gap, in this paper we propose a sound and complete logic of coalition power whose semantics incorporates discounting. Although we formulate our work in terms of cost, it could be applied to any other resource measured in real numbers. It can also be straightforwardly extended to vectors of real numbers to incorporate multiple resources. As an example, consider a single-player game depicted in Figure 1. This game has four game states w, u, v, and s and a single terminal state t. Propositional variable p is true in game states w, u, and v and is false in game state s. We assume that the values of propositional variables are not deﬁned

in the terminal state t. The agent a has multiple actions in each game state. These actions are depicted in Figure 1 using directed edges. The cost of each action to agent a is shown as a label on the directed edge. For instance, the directed edge from state w to state u with label 2 means that the agent a has an action with cost 2 to transition the game from state w to state u. Transitioning to the terminal state t represents the termination of the game.

Figure 1: A game.

Note that in state w the agent has two strategies to maintain condition p indeﬁnitely. The ﬁrst strategy consists of transitioning the game to state u at cost 2 and then repeatedly applying the action with cost 2 to keep the game in state u. Without discounting, the cost of this strategy is 2 + 2 + 2 + = + . The agent also has another strategy to maintain condition p that consists in transitioning to state v at cost 1 and then keeping the game in state v with recurrent cost 1. Intuitively, the second strategy is less expensive than the ﬁrst because each step costs half as much. However, formally, the cost of the second strategy without discounting is the same as the ﬁrst one: 1 + 1 + 1 + = + . The problem that we observe here is not speciﬁc to costs of strategies. A similar situation also appears in repetitive games, accounting, and reinforcement machine learning algorithms based on Markov decision processes. The solution commonly used to resolve this problem is discounting. It consists of counting the cost on the ﬁrst step at nominal value, the cost on the second step with a discount factor γ (0, 1), the cost on the third step with discount factor γ2, etc. With dis-

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

counting, the total cost of our ﬁrst strategy is

2 + 2γ + 2γ2 + 2γ3 + = 2 1 γ ,

while the cost of our second strategy is

1 + 1γ + 1γ2 + 1γ3 + = 1 1 γ .

Since 1 1 γ < 2 1 γ , we can say that with discounting the second strategy is less expensive than the ﬁrst. In the rest of this paper, we assume a ﬁxed discount factor γ (0, 1).

The rest of this paper is structured as follows: In the next section, we introduce a class of games that will later be used to deﬁne the semantics our logical system. Section 4 deﬁnes the language of our system. Section 5 gives the discountingbased semantics of this language. Section 6 shows that the properties of strategies with discounting depend on whether we consider strategies with or without perfect recall. Section 7 lists and discusses the axioms of our logical system for the strategies with perfect recall. Section 8 proves the completeness of our system. Section 9 concludes. Additionally, the proof of soundness can be found in the full version of this paper [Bozzone and Naumov, 2021].

3 Game Deﬁnition

Throughout the paper, we assume a ﬁxed nonempty set of propositional variables and a ﬁxed set of agents A. By a coalition we mean any subset of A. By XA we mean the set of all functions from set A to a set X. The class of games that we consider is speciﬁed below.

Deﬁnition 1 A game is a tuple (W, t, , ε, M, π), where

1. W is a set of game states,

2. t / W is a terminal state, by W + we denote the set of all states W {t},

3. is an arbitrary set called domain of actions,

4. ε is a zero-cost action,

5. M W A [0, )A W + is a relation called mechanism, such that

(a) for each tuple (w, δ, u, w ) M and each agent a A, if δ(a) = ε, then u(a) = 0, (b) for each state w W and each complete action proﬁle δ A, there is a function u [0, + )A and a state w W + such that (w, δ, u, w ) M,

6. π is a valuation function that maps propositional variables into subsets of W.

Intuitively, mechanism is a set of all quadruples (w, δ, u, v) such that the game might transition from state w to state v under action proﬁle δ at costs to the individual agents speciﬁed by function u. The deﬁned above games are similar to resource-bounded action frames, which are the semantics of Resource-Bounded

Coalition Logic (RBCL) [Alechina et al., 2011]. In particular, both of them have a zero-cost action. However, there are several differences between these two classes of models. Unlike RBCL frames, our games have only one resource that we call cost . We do this for the sake of presentation simplicity. Multiple resources could be incorporated into our system without any signiﬁcant changes to the results in this paper. RBCL allows only non-negative integer resource requirements, while costs in our games are nonnegative real numbers. RBCL assigns a unique cost to each action, while our games assign cost to each transition. As a result, the cost to the agent in our setting depends not only on the action of the agent but also on the actions of the other agents. This is similar to how the utility function of an agent in game theory is a function of the complete action proﬁle, not just of the action of that agent. We achieve this by including the cost of the transition for each agent as the third component of a tuple from the mechanism relation. Furthermore, the RBCL frames are deterministic while our games are not deterministic because we represent mechanism as a relation, not a function. Unlike RBCL frames, our games can be terminated by the agents. In order to make our semantics more general, the games are terminated through a transition to a terminal state. Such transitions allow the agents to be charged upon the termination of a game. This ability is signiﬁcant for our proof of completeness.

Figure 2: A game. The unreachable terminal state t is not shown in the diagram.

The game depicted in Figure 1 has only one player. Figure 2 depicts a two-player game. Note that our general notion of the game captured in Deﬁnition 1 allows each agent to inﬂuence the outcome of each transition and imposes a cost for each transition on each agent. For the sake of simplicity, the game depicted in Figure 2 designates a single dictator agent in each state. For example, in state w, the dictator is agent a. The dictator is solely responsible for the choice of the next state and bears all the costs associated with the transition. In the diagram, the dictator is speciﬁed inside each state s circle. Note that in state w agent a has a strategy to maintain condition p at cost 100+100γ +100γ2 + = 100

1 γ to herself and cost 0 + 0γ + 0γ2 + = 0 to agent b. Deﬁnition 2 A play in game (W, t, , ε, M, π) is a ﬁnite sequence w0, δ0, u0, w1, . . . , δn 1, un 1, wn such that 1. wi W for 0 i < n and wn W +, 2. δi A, where 0 i < n,

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

3. ui [0, )A is a cost function, where 0 i < n,

4. (wi 1, δi 1, ui 1, wi) M, where 1 i n.

The set of all plays of a given game is denoted by Play.

The language Φ of our system is deﬁned by the grammar

ϕ := p | ϕ | ϕ ϕ | [C]xϕ,

where p is a propositional variable, C is a coalition, and x is a constraint function from set C to [0, + ). We read [C]xϕ as coalition C has a strategy to maintain condition ϕ at individual cost no more than x(a) to each member a C . If C is a coalition {a1, . . . , an} and x is a function from set C to [0, + ) such that x(ai) = xi for each i n, then we will use shorthand notation [a1, . . . , an]x1,...,xnϕ to refer to formula [C]xϕ.

Deﬁnition 3 For any real µ > 0 and any formula ϕ Φ, formula ϕ/µ is deﬁned recursively as follows:

1. p/µ p, for any propositional variable p,

2. ( ϕ)/µ (ϕ/µ),

3. (ϕ ψ)/µ (ϕ/µ) (ψ/µ),

4. ([C]xϕ)/µ [C]x/µ(ϕ/µ).

For example, ([a, b]4,6 [b, c]8,2 p)/2 = [a, b]2,3 [b, c]4,1 p.

5 Semantics

In this section we deﬁne the semantics of our logical system.

Deﬁnition 4 An action proﬁle of a coalition C is a function from set C to set .

Deﬁnition 5 A strategy of a coalition C is a function from set C Play to set .

Note that each strategy takes into account not just the current state but the whole play. Thus, the strategies that we consider are perfect recall strategies. We will discuss this in detail in the next section.

Deﬁnition 6 A play w0, δ0, u0, w1, . . . , un 1, wn Play satisﬁes strategy s of a coalition C if for each i such that 0 i < n and each agent a C,

δi(a) = s(a, (w0, δ0, u0, w1, . . . , ui 1, wi)).

For any functions x and y, we write x C y if x(a) y(a) for each a C. We deﬁne notation x =C y similarly.

Deﬁnition 7 For each formula ϕ Φ and each state w W of a game (W, t, , ε, M, π), satisfaction relation w ϕ is deﬁned recursively as follows:

1. w p, if w π(p),

2. w ϕ, if w ϕ,

3. w ϕ ψ, if w ϕ or w ψ,

4. w [C]xϕ if there is a strategy s of coalition C such that for any play w0, δ0, u0, w1, . . . , un 1, wn Play that satisﬁes strategy s, if w = w0, then

(a) Pn 1 i=0 uiγi C x and (b) if wn = t, then wn ϕ/γn.

To understand why item 4(b) of the above deﬁnition uses formula ϕ/γn instead of formula ϕ, let us consider an example of a formula ϕ [D]yψ. Note that formula [C]x[D]yψ states that coalition C can maintain at cost x the ability of coalition D to maintain ψ at cost y. Consider the hypothetical case where C, at cost x to C, will be maintaining this ability of D for, say, 10 transitions. The formula [C]x[D]yψ states that after 10 moves coalition D should be able to take over and maintain condition ψ at cost y to D. Given that in our setting the costs are discounted, an important question is whether y is measured in today s money or future money. Note that y in future money is yγ10 in today s money. On the other hand, y in today s money is y/γ10 in future money. In this paper we decided to measure all costs in today s money. Thus, cost y in [C]x[D]yψ refers to costs in today s money (in state w0 of Deﬁnition 7). In future money (in state wn), the same cost is y/γn. As a result, item 4(b) of Deﬁnition 7 uses formula ϕ/γn instead of just ϕ. Consider again the game depicted in Figure 2. As discussed earlier, in state w, single-agent coalition {a} has a strategy to maintain condition p by looping in state w at recurrent cost 100. The total cost of this strategy is 100+100γ+ 100γ2 + = 100 1 γ . Thus, w [a]100/(1 γ) p. In the same game, single-agent coalition {b} also has a strategy to maintain condition p. The strategy consists in pushing the game back to state w each time when agent a transitions the game out of state w either into state u or state v. The cost of the pushing back action from state u and v is 1 and 200 respectively. Hence, the total cost to agent b could be no more than 0 + 200γ + 0 + 200γ3 + 0 + = 200γ/(1 γ2). Then, w [b]200γ/(1 γ2) p. Finally, note that if agents a and b decide to cooperate, then maintaining condition p becomes signiﬁcantly less expensive for both of them because they can alternate the state of the game between states w and u. The total cost of the joint strategy to agent a is 1+0+γ2+0+γ3+ = 1 1 γ2 and to agent b is 0+γ+0+γ3+ = γ 1 γ2 . Therefore, w [a, b]1/(1 γ2),γ/(1 γ2) p.

6 Perfect Recall Assumption

Deﬁnition 5 speciﬁes a strategy of a coalition as a function that assigns an action to each member of a coalition based on a play of the game. In other words, any strategy has access to the whole history of the game rather than just to the current state. Such strategies are often referred to as perfect recall strategies. As the next example shows, perfect recall strategies might have different discounted costs than memoryless strategies for the same condition to maintain in the same game. Consider the game depicted in Figure 3 and assume, for this example only, that γ = 2/3. Suppose that coalition {a, b, d} wants to maintain condition p starting from state w1. Since agent c is not a member of the coalition, the coalition has no control whether the system transitions from state w1 to state w2 or w3. Once the system is either in state w2 or state w3, in order to maintain the condition p, agent a or agent b,

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

Figure 3: A game. The unreachable terminal state t is not shown in the diagram.

respectively, will have to transition the game to state w4 at cost 4

9 to the agent. In state w4, the coalition faces a choice between (i) transitioning game into state w5 in which agent a encounters cost

1γ3 + 1γ4 + = γ3

1 γ = (2/3)3

to maintain p and (ii) transitioning game into state w6 in which agent b encounters the same cost 1γ3 + 1γ4 + = 8

9 to maintain condition p. If agent d has a perfect recall, then she can balance the costs between agents a and b by transitioning to state w6 if the game transitioned to w4 from state w2 and transitioning to state w5 if the game transitioned to w4 from state w3. This way, agents a and b encounter the same total costs 8/9:

w0 [a, b, d]8/9,8/9,0 p.

At the same time, if agent d does not have memory about the previous state of the game, then either agent a or b might encounter a total cost as high as 8/9 + 8/9 = 16/9 while executing the coalition strategy to maintain condition p:

w0 [a, b, d]16/9,16/9,0 p.

In this paper, we consider discounted costs under perfect recall assumption for all agents.

7 Axioms In this section, we introduce a logical system describing the properties of coalition power modality [C]xϕ. In addition to propositional tautologies in language Φ, the system contains the following axioms:

1. Reﬂexivity: [C]xϕ ϕ, 2. Cooperation: if C D = , then [C]x(ϕ ψ) ([D]yϕ [C D]x yψ), 3. Monotonicity: [C]xϕ [C]yϕ, where x C y, 4. Transitivity: [C]xϕ [C]x[C]xϕ.

Recall that the value of discount factor γ has been ﬁxed at the end of Section 1. It is worth noting that this factor does not appear explicitly in any of the above axioms.

The Reﬂexivity axiom says that if coalition C can maintain condition ϕ at discounted cost x starting from the current state, then condition ϕ must be true in the current state. The Cooperation axiom states that if coalitions C and D are disjoint, coalition C can maintain condition ϕ ψ at cost x, and D can maintain condition ϕ at cost y, then together they can maintain condition ψ at cost x y. Here, by x y we mean the union of two functions with disjoint domains. The Monotonicity axiom states that if a coalition can maintain condition at some cost, then it can maintain the same condition at any larger cost. The assumption of the Transitivity axiom states that coalition C has a strategy, say s, to maintain condition ϕ at cost x in perpetuity. The conclusion states that the same coalition can, at cost x, maintain its own ability to maintain ϕ at cost x. To achieve this, coalition C can use the same strategy s. Indeed, assume that coalition C used strategy s for some number of steps at cost x C x in today s money. Thus, it should be able to keep using it at cost x x C x in today s money to maintain ϕ. Note that it is crucial for this argument that all costs are computed in today s money. Furthermore, the Transitivity axiom is not sound if the cost in the internal modality is measured in future money. A non-trivial proof of soundness of the Transitivity axiom as well as the proofs soundness of all other axioms can be found in [Bozzone and Naumov, 2021]. We write ϕ and say that formula ϕ is a theorem if ϕ is derivable from the above axioms using the Modus Ponens and the Necessitation inference rules:

In addition to unary relation ϕ, we also consider binary relation X ϕ. Let X ϕ if formula ϕ is provable from the theorems of our logical system and the set of additional assumptions X using only the Modus Ponens inference rule. The proofs of the next three auxiliary lemmas can be found in [Bozzone and Naumov, 2021].

Lemma 1 If ϕ1, . . . , ϕn ψ and sets C1,. .. ,Cn are pairwise disjoint, then

[C1]x1ϕ1, . . . , [Cn]xnϕn [C1 Cn]x1 xnψ.

Lemma 2 If ϕ1/γ, . . . , ϕn/γ ψ/γ, then ϕ1, . . . , ϕn ψ.

Lemma 3 [C]xϕ [D]yϕ, where C D and x C y.

8 Completeness

In this section, we prove the completeness of our logical system. We start the proof by deﬁning the canonical game (W, t, , ε, M, π). The set W is the set of all maximal consistent sets of formulae in language Φ, and t is an arbitrary element such that t / W. Let ε be an arbitrary element such that ε / Φ and the set of actions be Φ {ε}.

Deﬁnition 8 Mechanism M is the set of all quadruples (w, δ, u, w ) W A [0, )A W + such that if [C]xϕ w and δ(a) = [C]xϕ for each agent a C, then

1. u C x and

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

2. if w = t, then ([C]x uϕ)/γ w .

Informally, action δ(a) = [C]xϕ of an agent a C means as a part of coalition C, I request to maintain condition ϕ at individual cost x(b) to each member b C . In order for the request to be valid, it should be submitted by all members of coalition C. Even if all members of coalition C submit the request, it is enforced by the mechanism only if formula [C]xϕ belongs to the current state w. Condition 1 of Deﬁnition 8 stipulates that although the mechanism is free to set the cost u of the transition below what the members of the coalition offered to pay, the mechanism cannot overcharge them. If the mechanism decides to charge members of the coalition the amount u for transition to state w , then it also must provide the opportunity for the members to continue to maintain the condition ϕ at cost x u. The latter is captured by condition 2 of Deﬁnition 8.

Deﬁnition 9 π(p) = {w W | p w}.

This concludes the deﬁnition of the canonical game (W, t, , ε, M, π). As usual, the key step in proving the completeness theorem is an induction (or truth ) lemma, which in our case is Lemma 6. Lemma 4 and Lemma 5 below are two auxiliary lemmas that capture the two directions of the induction lemma in the case when formula ϕ has the form [C]xψ.

Lemma 4 For each state w W + and each formula [C]xϕ w, there is strategy s of coalition C such that, for each play w0, δ0, u0, w1, . . . , un 1, wn satisfying strategy s, if w = w0, then

1. Pn 1 i=0 uiγi C x and

2. if wn = t, then ϕ/γn wn.

PROOF. Let action s(a, λ) for any agent a C and any play λ = w0, δ0, u0, w1, . . . , un 1, wn be deﬁned1 as follows:

s(a, λ) = ([C]x zϕ)/γn, if z C x, , otherwise, (1)

where z = Pn 1 i=0 uiγi. Consider an arbitrary play w0, δ0, u0, w1, . . . , un 1, wn satisfying strategy s such that w = w0. It will be sufﬁcient to show that conditions 1 and 2 of the lemma hold for this play.

Claim 1 For each a C and each k such that 0 k n,

1. Pk 1 i=0 uiγi C x,

2. if wk = t, then [C]x Pk 1 i=0 uiγiϕ /γk wk.

PROOF OF CLAIM. We prove the claim by induction on integer k. If k = 0, then Pk 1 i=0 uiγi = 0 C x by the deﬁnition of language Φ because [C]xϕ is a formula. Also, [C]x Pk 1 i=0 uiγiϕ /γk = ([C]x 0ϕ) /γ0 = [C]xϕ w0

by the assumption [C]xϕ w of the lemma and the assumption w = w0.

1Informally, strategy s always requests to maintain condition ϕ using remaining budget x z.

Suppose k > 0. Then, (wk 1, δk 1, uk 1, wk) M by Deﬁnition 2, the assumption of the lemma that w0, δ0, u0, w1, . . . , un 1, wn is a play, and the assumption of the claim that k n. Thus, wk 1 = t by item 5 of Deﬁnition 1. Hence, by the induction hypothesis,

i=0 uiγi C x, (2)

[C]x Pk 2 i=0 uiγiϕ /γk 1 wk 1. (3)

By Deﬁnition 6 (step i), equation (2) and equation (1) (step ii), and item 4 of Deﬁnition 3 (step iii),

δk 1(a) i= s(a, (w0, δ0, u0, w1, . . . , uk 2, wk 1))

ii= ([C]x Pk 2 i=0 uiγiϕ)/γk 1

iii= [C](x Pk 2 i=0 uiγi)/γk 1(ϕ/γk 1). (4)

At the same time, by item 4 of Deﬁnition 3 (step iv) and equation (3) (step v),

[C](x Pk 2 i=0 uiγi)/γk 1(ϕ/γk 1)

iv= [C]x Pk 2 i=0 uiγiϕ /γk 1

v wk 1. (5)

Also, (wk 1, δk 1, uk 1, wk) M by Deﬁnition 2 and the assumption that w0, δ0, u0, w1, . . . , un 1, wn is a play. Thus, by Deﬁnition 8 and statements (5) and (4),

1. uk 1 C x Pk 2 i=0 uiγi /γk 1 and

2. if wk = t, then

([C]((x Pk 2 i=0 uiγi)/γk 1 uk 1)(ϕ/γk 1))/γ wk.

Thus, by the laws of algebra and item 4 of Deﬁnition 3,

1. uk 1γk 1 C x Pk 2 i=0 uiγi and 2. if wk = t, then

([C]((x Pk 2 i=0 uiγi) uk 1γk 1)ϕ)/γk wk.

The last two statements imply, respectively, parts 1 and 2 of the claim. The statement of the lemma follows from the above claim when k = n. The ﬁrst part follows immediately. To show the second part, note that by Deﬁnition 3, item 2 of the claim implies [C](x Pn 1 i=0 uiγi)/γn(ϕ/γn) wn. Thus, wn ϕ/γn by the Reﬂexivity axiom and the Modus Ponens inference rule. Therefore, ϕ/γn wn because set wn is maximal.

Lemma 5 For each state w W, each formula [C]xϕ / w, and each action proﬁle α of coalition C, there is a complete action proﬁle δ, a cost function u [0, + )A, and a state w W + such that α =C δ, (w, δ, u, w ) M, and either (i) u C x or (ii) w = t and ϕ/γ / w .

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

PROOF. Deﬁne the complete action proﬁle

δ(a) = α(a), if a C, , otherwise (6)

and cost function2

y(a), if α(a) = [D]yψ for some [D]yψ Φ, where a C and y(a) > x(a), 0, otherwise.

Note that α =C δ. We consider the following two cases: Case I: u(a) = 0 for each agent a C. Deﬁne set X to be

X = { (ϕ/γ)} {([D]yψ)/γ | [D]yψ w, D C, a D(α(a) = [D]yψ)}.

Claim 2 Set X is consistent.

PROOF OF CLAIM. Suppose set X is not consistent. Thus, there are formulae

[D1]y1ψ1, . . . , [Dn]ynψn w (7)

D1, . . . , Dn C, (8) α(a) = [Di]yiψi i n a Di, (9)

and ([D1]y1ψ1)/γ, . . . , ([Dn]ynψn)/γ ϕ/γ. (10) Without loss of generality, we can assume that formulae ([D1]y1ψ1)/γ, . . . , ([Dn]ynψn)/γ are distinct. Thus, formulae [D1]y1ψ1, . . . , [Dn]ynψn are also distinct by Deﬁnition 3. Hence, sets D1, . . . , Dn are pairwise disjoint due to assumption (9). By Lemma 2, statement (10) implies that

[D1]y1ψ1, . . . , [Dn]ynψn ϕ.

Then, by Lemma 1 and because sets D1, . . . , Dn are pairwise disjoint,

[D1]y1[D1]y1ψ1, . . . , [Dn]yn[Dn]ynψn [D1 Dn]y1 ynϕ.

Thus, by the Transitivity axiom and the Modus Ponens inference rule applied n times,

[D1]y1ψ1, . . . , [Dn]ynψn [D1 Dn]y1 ynϕ.

Notice that yi(a) x(a) for any i n and any agent a Di. Indeed, suppose that yi(a) > x(a). Hence u(a) = yi(a) by the choice of cost function u and statements (8) and (9). Thus, u(a) > x(a). Then, u(a) > 0 because function x is nonnegative by the assumption [C]xϕ Φ, which contradicts the assumption u(a) = 0 of the case. Hence, by Lemma 3 and the Modus Ponens inference rule,

[D1]y1ψ1, . . . , [Dn]ynψn [C]xϕ.

2The choice of function u is perhaps the most unexpected step in our proof. Informally, if agent a is blufﬁng and is offering to pay more than x(a), then function u charges the agent the amount she offered to pay, y(a). If the agent makes a modest offer of no more than x(a), then she is not charged at all.

Then, w [C]xϕ by the assumption (7). Thus, [C]xϕ w because set w is maximal, which contradicts the assumption [C]xϕ / w of the lemma.

Let w be any maximal consistent extension of set X. Note that (ϕ/γ) X w by the choice of sets X and w . Thus, ϕ/γ / w because set w is consistent. Claim 3 (w, δ, u, w ) M. PROOF OF CLAIM. Consider any formula [D]yψ w such that δ(a) = [D]yψ for each agent a D. (11) By Deﬁnition 8, it sufﬁces to show that u D y and ([D]y uψ)/γ w . We consider the following two cases: Case Ia: D C. Thus, α(a) = δ(a) = [D]yψ for each agent a D by equation (6) and assumption (11). Hence, ([D]yψ)/γ X by the choice of set X. Then, ([D]y uψ)/γ X by the assumption of Case I that u =C 0 and the assumption D C of Case Ia. Therefore, ([D]y uψ)/γ w by the choice of set w . Additionally, u =D 0 D y because 0 D y by the deﬁnition of Φ. Case Ib: There is an agent a D \ C. Hence, = δ(a) = [D]yψ by equation (6) and assumption (11). Therefore, formula [D]yψ is identical to formula , which is a contradiction. Note that (ϕ/γ) X w by the choice of sets X and w . Therefore, ϕ/γ / w because set w is consistent. Case II: u(a) = 0 for at least one agent a C. Thus, u(a) = y(a) > x(a) by the choice of function u. Therefore, u C x. Choose w to be the terminal state t. Claim 4 (w, δ, u, w ) M. PROOF OF CLAIM. Consider any formula [D]yψ w such δ(a) = [D]yψ for each agent a D. By Deﬁnition 8 and because w = t, it sufﬁces to show that u D y. Recall that 0 D y because [D]yψ is a formula. Therefore, u D y by the choice of function u. This concludes the proof of the lemma.

The next lemma is usually referred to as an induction or truth lemma. It is proven by induction on the structural complexity of formula ϕ using Lemma 4 and Lemma 5 in the case where formula ϕ has the form [C]xψ. The proof of this lemma can be found in [Bozzone and Naumov, 2021]. Lemma 6 w ϕ iff ϕ w for each state w W and each formula ϕ Φ. Theorem 1 If X ϕ, then there is a state w of a game such that w χ for each χ X and w ϕ. PROOF. Suppose that X ϕ. Let w be any maximal consistent extension of set X { ϕ}. Note that w is a state of the canonical game. Then, w χ for each χ X and w ϕ by Lemma 6. Therefore, w ϕ by Deﬁnition 7.

9 Conclusion In this paper we proposed a coalition power logic whose semantics incorporates discounting. The main technical result is a strongly sound and strongly complete logical system for coalition strategies with perfect recall.

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

[ Agotnes and Alechina, 2019] Thomas Agotnes and Natasha Alechina. Coalition logic with individual, distributed and common knowledge. Journal of Logic and Computation, 29:1041 1069, 11 2019. [Alechina and Logan, 2018] Natasha Alechina and Brian Logan. Resource logics with a diminishing resource. In Proceedings of the 17th International Conference on Autonomous Agents and Multi Agent Systems, pages 1847 1849. International Foundation for Autonomous Agents and Multiagent Systems, 2018. [Alechina et al., 2011] Natasha Alechina, Brian Logan, Hoang Nga Nguyen, and Abdur Rakib. Logic for coalitions with bounded resources. Journal of Logic and Computation, 21(6):907 937, December 2011. [Alechina et al., 2015] Natasha Alechina, Brian Logan, Hoang Nga Nguyen, Franco Raimondi, and Leonardo Mostarda. Symbolic model-checking for resourcebounded ATL. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, pages 1809 1810. International Foundation for Autonomous Agents and Multiagent Systems, 2015. [Alechina et al., 2016] Natasha Alechina, Mehdi Dastani, and Brian Logan. Verifying existence of resource-bounded coalition uniform strategies. In Proceedings of the Twenty Fifth International Joint Conference on Artiﬁcial Intelligence, pages 24 30. AAAI Press, 2016. [Alechina et al., 2017] Natasha Alechina, Brian Logan, Hoang Nga Nguyen, and Franco Raimondi. Modelchecking for resource-bounded ATL with production and consumption of resources. Journal of Computer and System Sciences, 88:126 144, 2017. [Alur et al., 2002] Rajeev Alur, Thomas A. Henzinger, and Orna Kupferman. Alternating-time temporal logic. Journal of the ACM, 49(5):672 713, 2002. [Belnap and Perloff, 1990] Nuel Belnap and Michael Perloff. Seeing to it that: A canonical form for agentives. In Knowledge representation and defeasible reasoning, pages 167 190. Springer, 1990. [Bozzone and Naumov, 2021] Lia Bozzone and Pavel Naumov. Budget-constrained coalition strategies with discounting. ar Xiv:2105.04692, 2021. [Cao and Naumov, 2017] Rui Cao and Pavel Naumov. Budget-constrained dynamics in multiagent systems. In Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017, pages 915 921, 2017. [Cao and Naumov, 2020] Rui Cao and Pavel Naumov. Knowing the price of success. Artiﬁcial Intelligence, page 103287, 2020. [Della Monica et al., 2011] Dario Della Monica, Margherita Napoli, and Mimmo Parente. On a logic for coalitional games with priced-resource agents. Electronic Notes in Theoretical Computer Science, 278:215 228, 2011.

[Fervari et al., 2017] Raul Fervari, Andreas Herzig, Yanjun Li, and Yanjing Wang. Strategically knowing how. In Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence, IJCAI-17, pages 1031 1038, 2017. [Horty, 2001] John F Horty. Agency and deontic logic. Oxford University Press, 2001. [Naumov and Tao, 2017] Pavel Naumov and Jia Tao. Coalition power in epistemic transition systems. In Proceedings of the 2017 International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 723 731, 2017. [Naumov and Tao, 2018a] Pavel Naumov and Jia Tao. Strategic coalitions with perfect recall. In Proceedings of Thirty-Second AAAI Conference on Artiﬁcial Intelligence, 2018. [Naumov and Tao, 2018b] Pavel Naumov and Jia Tao. Together we know how to achieve: An epistemic logic of know-how. Artiﬁcial Intelligence, 262:279 300, 2018. [Naumov and Tao, 2019] Pavel Naumov and Jia Tao. Knowing-how under uncertainty. Artiﬁcial Intelligence, 276:41 56, 2019. [Pauly, 2001] Marc Pauly. Logic for Social Software. Ph D thesis, Institute for Logic, Language, and Computation, 2001. [Pauly, 2002] Marc Pauly. A modal logic for coalitional power in games. Journal of Logic and Computation, 12(1):149 166, 2002. [van der Hoek and Wooldridge, 2003] Wiebe van der Hoek and Michael Wooldridge. Cooperation, knowledge, and time: Alternating-time temporal epistemic logic and its applications. Studia Logica, 75(1):125 157, 2003. [Walther et al., 2007] Dirk Walther, Wiebe van der Hoek, and Michael Wooldridge. Alternating-time temporal logic with explicit strategies. In Proceedings of the 11th conference on Theoretical aspects of rationality and knowledge, pages 269 278. ACM, 2007. [Wang, 2018] Yanjing Wang. A logic of goal-directed knowing how. Synthese, 195(10):4419 4439, 2018. [Xu, 1995] Ming Xu. On the basic logic of STIT with a single agent. The Journal of Symbolic Logic, 60(2):459 483, 1995.

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)