# constraintbased_symmetry_detection_in_general_game_playing__b71b85c1.pdf

Constraint-Based Symmetry Detection in General Game Playing

Fr ed eric Koriche, Sylvain Lagrue, Eric Piette and S ebastien Tabary CRIL, CNRS UMR 8188, Universit e d Artois, France {koriche, lagrue, epiette, tabary}@cril.fr

Symmetry detection is a promising approach for reducing the search tree of games. In General Game Playing (GGP), where any game is compactly represented by a set of rules in the Game Description Language (GDL), the state-of-the-art methods for symmetry detection rely on a rule graph associated with the GDL description of the game. Though such rule-based symmetry detection methods can be applied to various tree search algorithms, they cover only a limited number of symmetries which are apparent in the GDL description. In this paper, we develop an alternative approach to symmetry detection in stochastic games that exploits constraint programming techniques. The minimax optimization problem in a GDL game is cast as a stochastic constraint satisfaction problem (SCSP), which can be viewed as a sequence of one-stage SCSPs. Minimax symmetries are inferred according to the microstructure complement of these one-stage constraint networks. Based on a theoretical analysis of this approach, we experimentally show on various games that the recent stochastic constraint solver MAC-UCB, coupled with constraint-based symmetry detection, signiﬁcantly outperforms the standard Monte Carlo Tree Search algorithms, coupled with rule-based symmetry detection. This constraint-driven approach is also validated by the excellent results obtained by our player during the last GGP competition.

1 Introduction The topic of General Game Playing (GGP) is to develop artiﬁcial agents capable of playing a wide variety of games, without any human intervention [Genesereth et al., 2005]. In a GGP tournament, the rules of an unknown game are supplied to the agent and, after a short period of deliberation, the agent must be able to play this game effectively against other players. The game rules are represented in the Game Description Language (GDL) [Love et al., 2008]. Basically, a GDL program is a set of ﬁrst-order logical clauses specifying the initial state of the game, the legal moves of each player, the effects of these moves, and the scores obtained at terminal states.

Recent versions of GDL [Schiffel and Thielscher, 2014; Thielscher, 2016] are expressive enough to cover various classes of ﬁnite-horizon games including, in particular, stochastic games in which a distinguished chance player is used to simulate the probabilistic effects of joint actions. Conceptually, any turn-based two-player zero-sum (deterministic or stochastic) game has an optimal, minimax value function that speciﬁes the expected outcome of the game, for every possible state, under perfect play by all players. In theory, such games may be solved by recursively computing the value function in a search tree of size bd, where b is the number of legal moves per state, and d is the number of moves needed to reach a terminal state. Yet, even for games of moderate size, exhaustive search is unfeasible in GGP tournaments, due to the short deliberation time allowed to players. To this point, symmetry detection is a popular inference method for reducing exploration in a search space, by transferring knowledge between equivalent regions of this space. As games typically involve many equivalent states and moves, such similarities can be exploited for transferring the value function between nodes of the search tree. Furthermore, since most symmetry detection methods are implemented using graph automorphism algorithms [Mc Kay and Piperno, 2014], the main component for inferring game symmetries is a graphical structure from which a permutation group is induced. Ideally, this permutation group should be as close as possible to the group of minimax symmetries, which preserve the optimal strategies of the game. Most approaches to symmetry detection in GGP rely on some rule graph associated with the GDL representation [Kuhlmann and Stone, 2007; Schiffel, 2010; Zhang et al., 2015]. Basically, a rule graph contains nodes for the terms, atoms, and rules of the GDL representation, together with edges connecting these symbols. Any automorphism of this rule graph is a symmetry that preserves the game semantics. However, only a restricted subset of game symmetries can be recognized from such automorphisms, due to the syntactical bias imposed by the GDL representation. In this paper, we propose an alternative approach to symmetry detection in stochastic games, inspired from constraint programming techniques. A Constraint Satisfaction Problem (CSP) consists in a set of variables, each associated with a domain of values, and a set of constraints specifying allowed combinations of values for subsets of

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

variables. A CSP solution is an assignment of variables to values that satisﬁes all constraints. A common approach for detecting solution symmetries in CSPs is to use the microstructure complement of the network [J egou, 1993; Cohen et al., 2006]. This structure is a hypergraph, whose nodes are associated with variable-value pairs, and whose hyperedges joint sets of nodes which are disallowed by some constraint. Any automorphism of this hypergraph, called constraint symmetry, is a solution symmetry. Since stochastic games are sequential decision problems, they generally cannot be reduced to CSPs. Instead, our approach is to encode a GDL program and its minimax objective function into a Stochastic Constraint Satisfaction Problem (SCSP) [Walsh, 2002; Tarim et al., 2006; Hnich et al., 2012]. This problem involves variables, domains and constraints, but the variables are partitioned into decision variables, encoding state descriptions and players actions, and stochastic variables, encoding the actions of the chance player. Any solution of the SCSP is a minimax strategy of the GDL game. By construction, such a network can be viewed as a sequence of one-stage SCSPs, each specifying a game turn. Our symmetry detection method uses the microstructure complement of these one-stage SCSPs, from which constraint symmetries form a subgroup of the group of minimax symmetries. Importantly, these microstructure complements can also be exploited for detecting symmetries between approximate minimax strategies, which are found by combining constraint propagation and bandit exploration. We provide an experimental evaluation of this approach using the MAC-UCB algorithm for solving SCSPs [Koriche et al., 2016]. We show on various games that MAC-UCB, coupled with constraint-based symmetry detection, signiﬁcantly outperforms Monte Carlo Tree Search algorithms, coupled with rule-based symmetry detection. We also emphasize that this approach was implemented in the Woodstock player, which has won the 2016 International GGP competition.

2 Games and Symmetries 2.1 Stochastic Games The problems under consideration in this study are ﬁnitehorizon combinatorial games that incorporate probabilistic choices. A game signature consists in a tuple (I, F, A), where I = {1, , k} is the set of players, F is a ﬁnite set of ﬂuents used to describe game states, and A is a ﬁnite set of actions or moves. A stochastic game over the signature (I, F, A) is a tuple G = (s1, S , L, P, u), where:

s1 2F is the initial state, and S 2F is the set of terminal states, L : I 2F 2A maps each player i and each state s to the set Li(s) of legal moves of i at s, P : 2F Ak 2F [0, 1] maps each state s and joint action a L(s) to a probability distribution P( | s, a) over 2F , where L(s) = L1(s) Lk(s), u : I S R maps each player i and each terminal state s to the utility ui(s) of i at s. A history (or ﬁnite play) of length T in G is a sequence of the form (s1, a1, s2, , s T 1, a T , s T ) where at L(st)

and P(st+1 | st, at) > 0, for t {1, , T 1}. G is called a T-horizon game if every history of length T in G includes a terminal state in S , and conversely, every terminal state in S is included in some history of length T in G. As usual, a pure stationary strategy is a map from states to actions. Of particular interest in this study is the minimax strategy for which the value function (for player i) is

V i (s) = maxai mina i Q i (s, a) if s S ui(s) otherwise

where Q i (s, a) = X

s S P(s |s, a)V i (s ) (1)

and where ai ranges over Li(s), and a i ranges over the projection of L(s) onto I \ {i}. This strategy is optimal for the important class of turn-based two-player zero-sum stochastic games [Condon, 1992], given by k = 2, u1(s) = u2(s) for s S , and L1(s) = { } or L2(s) = { } for s SG, where is a distinguished noop action with no effects. Evaluating the minimax value function in combinatorial games is generally intractable. A natural approach to alleviate this issue is to approximate V i by a depth-bounded minimax value that limits the recursion to some ﬁxed depth d, and applies a heuristic evaluation function when d is reached [Lanctot et al., 2013]. Given a map ˆVi : S R such that ˆVi(s) = ui(s) when s S , the depth-d minimax strategy is

V d i (s) = maxai mina i Qd i (s, a) if d > 0 and s S ˆVi(s) otherwise

where Qd i (s, a) = X

s S P(s |s, a)V d 1 i (s ). (2)

2.2 Game Symmetries Intuitively, a symmetry in a set O of objects is a permutation over O that leaves some property of those objects unchanged. The particular group of symmetries that we obtain depends on the property we choose to preserve. In this paper, two main types of game symmetries are distinguished: structural symmetries that preserve the game structure, and minimax symmetries that preserve the minimax strategies of the game. Given a stochastic game G with signature (I, F, A), a structural symmetry of G is a bijective function σ on I F A such that for all i I, f F, s, s SG, and a Ak, σ(i) = i, σ(a) A, σ(f) F (3a) σ(f) s1 iff f s1 (3b) σ(s) S iff s S (3c) σ(a) Li(σ(s)) iff a Li(s) (3d)

P(σ(s ) | σ(s), σ(a)) = P(s | s, a) (3e) ui(σ(s)) = ui(s) (3f) where σ(s)={σ(f)|f s}, and σ(a)=(σ(a1), ,σ(ak)). A minimax symmetry is a bijective function σ over I F A satisfying conditions (3a-3c) together with V i (σ(s)) = V i (s) (4) for all i I and s SG. By extension, depth-d minimax symmetries are deﬁned by replacing (4) with

V d i (σ(s)) = V d i (s) (5)

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

Any structural symmetry is a minimax symmetry, since the conditions (3d-3f) imply that two states with equivalent subgames must have the same minimax value. Yet, structural symmetries and depth-d minimax symmetries are generally incomparable, excepted for d = T in T-horizon games.

3 Rule-Based Symmetry Detection 3.1 GDL Representations In the GGP setting, games are represented by ﬁrst-order stratiﬁed logic programs, using the GDL language that includes several keywords specifying the game dynamics. Stochastic games are captured by an extension of GDL that describes the chance player random [Schiffel and Thielscher, 2014]. Example 1. In the standard Tic-Tac-Toe game, two players 1 and 2 take turns by marking the cells in a grid until a row, column or diagonal is ﬁlled by a player. We focus here on a randomized variant of this game for the 2 2 grid; the players 1 and 2 can only choose a row or a column, and the exact position in the chosen line is drawn at random by the chance player. For instance, the rules

next(row(X, I)) does(I,choose Row(X)) next(col(Y, I)) does(I,choose Col(Y)) next(cell(X, Y, I)) does(random, mark(X, Y, I))

specify the effects of players actions. Namely, if a player I selects a row X (resp. a column Y) in the current state, then the ﬂuent row(X, I) (resp. col(Y, I)) is true in the next state, and if the chance player random marks a position (X, Y) with player I in the current state, then the ﬂuent cell(X, Y, I) is true in the next state. Alternatively, the rules:

legal(random, mark(X, Y, I)) true(row(X, I)), true(cell(X, Y, )) (6a)

legal(random, mark(X, Y, I)) true(col(Y, I)), true(cell(X, Y, )) (6b)

capture the legal moves of the chance player: marking a cell (X, Y) with I is possible if the position (X, Y) is available ( ), and the row X or column Y is chosen by player I.

For a set R of GDL rules, the dependency graph of R associates a node with each literal L in R, and an arc (L, L ) if L occurs in the body of some rule with head L . Because GDL is Turing complete [Safﬁdine, 2014], even for valid programs [Love et al., 2008], we focus here on a propositional fragment of this language (including random), denoted P-GDL, satisfying two properties: (7a) the number of ground instances of functional terms and the size of clauses are bounded, and (7b) the dependency graph of the program is acyclic. Any P-GDL program R can be rewritten into an equivalent ground instance Rg of size polynomial in |R|. The ground terms in Rg are partitioned into a set I+ = {0, 1, , k} of players, where 0 denotes the chance player, a set F of ﬂuent terms, a set A of action terms, and a set U of utility

constants. Similarly, the rules in Rg can be partitioned into six categories: role(i) init(f) terminal L1, , Lq legal(i, a) L1, , Lq next(f) does(0, a0), , does(k, ak) goal(i, ν) L1, , Lq

where i I+, ν U, f F and and a, a0, , ak A. Furthermore, each Lj is an atom Aj or its negation not Aj, where Aj is an expression of the form true(f) or f1 = f2. The Clark s completion [Clark, 1978] of Rg is the set of propositional formulas Rc, where (i) each symbol not in Rg is replaced with , (ii) each atom A occurring as a head of some rule is associated with its set BA of bodies, i.e. BA = {L1 Lq | A L1, , Lq Rg} and (iii) for each atom A, all rules with head A are replaced with a single formula A W BA. For an atom A, we use Rg A to denote that A is true in the (unique) stable model of Rg, and we use Rc |= A to denote that Rc A is unsatisﬁable. By Corollary 4.21 in [Ben-Eliyahu and Dechter, 1994], if Rg satisﬁes (7b), we must have Rg A iff Rc |= A, that is, Rg and Rc are logically equivalent. Let G(R) = (s1, S , L, P, u) be the stochastic game over the signature (I, F, A), where I = I+ \ {0}, and such that f s1 iff Rg init(f), s S iff Rg true(s) terminal, a Li(s) iff Rg true(s) legal(i, a), ui(s) = ν iff Rg true(s) goal(i, ν), and

P(s |s, a)= 1 |L0(s)| if s ρ(s, a), and 0 otherwise.

Here, true(s) = {true(f) | f s}, and ρ(s, a) is the set of states s for which there is an action a0 L0(s), such that for all f s , Rg true(s) {does(i, ai)}k i=0 next(f ). Thus, P( | s, a) is a uniform distribution over the states updated from s, using the moves (a0, a) coupling the action proﬁle a of players, and all possible actions a0 L0(s). Example 2. In the ground instance Rg of the GDL program R describing the randomized Tic-Tac-Toe game for the 2 2 grid, the rules (6a) and (6b) are each replaced with 8 ground formulas, obtained by instantiating the variables X {x1, x2}, Y {y1, y2}, and I {1, 2}. Notably, in the ground formulas

legal(random, mark(x1, y2, 1)) true(row(x1, 1)), true(cell(x1, y2, )) (8a)

legal(random, mark(x1, y2, 1)) true(col(y2, 1)), true(cell(x1, y2, )) (8b) row(x1, 1), col(y2, 1) and cell(x1, y2, ) are ﬂuent terms, and mark(x1, y2, 1) is an action term. If we assume that in the current state s, the ﬁrst player has chosen row x1, and the number of free positions in x1 is y, then the probability distribution P( | s, a) over the next states using the action proﬁle a = (mark(x1, y2, 1), , ) is simply 1/y.

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

random mark(x1, y2, 1)

cell(x1, y2, )

Figure 1: The ground rule graph of formulas (8a) and (8b).

3.2 Rule Graphs For the propositional fragment of GDL examined in this study, we use a simpliﬁed version of the (enhanced) rule graph speciﬁed in [Schiffel, 2010]. Formally, the ground rule graph of a P-GDL program R is the colored graph G = (N, E, χ) over Rg, such that

each ground term t I+ F A U is mapped to a distinct node nt; if t is a player i I+, then χ(nt) = i, if t is a utility value ν U, then χ(nt) = ν, if t is a ﬂuent term in F, then χ(nt) = fluent, and if t is an action term in A, then χ(nt) = action;

each atom A of the form p(t1, , tq) is mapped to a distinct node n A with color χ(n A) = p, and a clique over {n A, nt1, , ntp};

each literal L of the form not A is mapped to a distinct node n L with χ(n L) = not, and an edge (n L, np);

each rule r of the form A L1, , Lq is mapped to a distinct node nr with color χ(nr) = rule, and a clique over {nr, n A, n L1, , n Lq}.

Proposition 1. Let R be a P-GDL program and G be its ground rule graph. Then, any automorphism of G is a structural symmetry of G(R).

Example 3. As depicted in Figure 1, the ground rule graph of the ground formulas (8a) and (8b) includes two similar four-cliques. An automorphism of this graph can be obtained by permuting the nodes r1, true2 and row(x1, 1), with r2, true3 and col(y2, 1), respectively.

Note that the group of automorphisms of G depends on how the game is described by Rg. For example, suppose that two actions a and a have the same preconditions and effects, but Rg uses two rules legal(i, a) A and legal(i, a) not A for action a, and one fact legal(i, a ) for a . Then, no automorphism in G can permute a and a because the neighbors of na and na are not equivalent.

4 Constraint-Based Symmetry Detection

The key idea of this study is to translate GDL programs into stochastic constraint networks, from which the microstructure complement can be used for detecting various types of

symmetries. We focus on a slight generalization of the original SCSP model [Walsh, 2002] that incorporates conditional probability distributions over stochastic variables. Formally, a Stochastic Constraint Satisfaction Problem is a tuple N = (X, Y, D, C, P, θ), where

X = (x1, , xn) is an ordered ﬁnite set of variables,

Y X is the set of stochastic variables,

D = {D(xi)}n i=1 is a set of ﬁnite domains each associated with a variable in X,

C is a set of constraints,

P = {Py}y Y is a set of conditional probability tables each associated with a variable in Y , and

θ [0, 1] is a threshold value.

Any variable in X \ Y is called a decision variable. Given an ordered subset Z = (z1, , zm) of variables in X, we denote by D(Z) the relation D(z1) D(zm). An instantiation on Z is a tuple v D(Z), also written as a set of pairs variable-value {(z1, v1), , (zm, vm)}. The instantiation v is complete if Z = X. For a subset Z Z, we use v|Z to denote the projection {(zi, vi) : zi Z } of v onto Z . As usual, each constraint c C is deﬁned over a set of variables scpc X, called the scope of c, and consists in a mapping from D(scpc) to {0, 1}. Any instantiation v on scpc for which c(v) = 1 (resp. c(v) = 0) is called allowed (resp. disallowed). Each conditional probability table Py is deﬁned over a set of variables scpy X occurring before y in X, and maps every instantiation v on scpy to a probability distribution Py( | v) over D(y). The probability P(v) and the consistency C(v) of a complete instantiation v are respectively given by

y Y Py(v|{y} | v|scpy) and C(v) = Y

c C c(v|scpc)

An instantiation v is globally consistent (GC) if it can be extended to a complete instantiation v such that C(v ) = 1. A policy π for N is a tree, in which nodes are labeled according to the ordering X; the root is labeled with x1, and each child of a node xi is labeled with xi+1. Each outgoing edge of xi is labeled with a value in D(xi); decision nodes have a unique child, and stochastic nodes yi have |D(yi)| children. Each leaf is labeled with the value C(v) of the complete instantiation v connecting that leaf to the root. The expected consistency of a policy π is given by

where v ranges over the root to leaf paths of π. A solution of N is a policy π such that C(π) θ. Note that in the particular case where θ = 1, π is a solution of N iff every instantiation in π is GC. Finally, N is T-stage SCSP if X is partitioned into T stages, i.e. X = (Z1, Y1, , ZT , YT ), where (Z1, , ZT ) is a partition of X \ Y , and (Y1, , YT ) is a partition of Y .

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

4.1 From GDL to SCSP Based on the framework suggested in [Koriche et al., 2016], any P-GDL program R can be encoded into a multi-stage SCSP, where each stage simulates a game turn. Formally, let Rg be the ground instance of R, where I+ = {0, 1, , k} is the set of players, A is the set of actions, F is the set of ﬂuents, and U is the set of utility values. Let Rc be the Clark s completion of Rg, and T be a positive integer. Then, the T-stage SCSP of R is the tuple NR,T = (X, Y, D, C, P, θ) such that θ = 1, and X = (Z1, Y1, , ZT , YT ), where each Zt includes a Boolean variable ωt indicating the presence of a terminal state, a set of Boolean variables {fj,t}n j=1, each associated with a distinct ﬂuent in F, a set of variables {ai,t}k i=1 with domain A, each associated with a distinct player i, and a set of variables {ui,t}k i=1 with domain U each associated with a player i. Each Yt includes a single variable a0,t with domain A, associated with random; C = C1 CT , where C1 includes constraints encoding the unary clauses over init in Rc, and for every stage t, Ct includes a constraint c A,t encoding the rule A W BA in Rc. Constraints over legal for the same player i are merged into a single constraint legali,t;

P = {P1, , PT }, where Pt encodes the conditional probability distribution of at,0 at stage t. The scope of Pt is given by the scope {fj,t}m j=1 of the constraint legal0,t (minus a0,t), and for each instantiation v on this scope, Pt( | v) is the uniform distribution over the set legal moves {a A | legal0,t(a, v) = 1}. Based on this construction, we can observe that for each stage Xt = (Zt, Yt), the scopes of the constraints terminalt, legali,t and goali,t, together with the scope of the table Pt, are all covered by the variables in Xt. Thus, the only component that connects a stage and its successor is the constraint nextt, which determines the values of ﬂuents at turn t + 1, given the actions simultaneously played in turn t. Example 4. Consider again the randomized Tic-Tac-Toe game for the 2 2 grid. By Clark s completion, the ground formulas (8a) and (8b) are merged into a single formula: legal(random, mark(x1, y2, 1)) true(row(x1, 1)), true(cell(x1, y2, )) true(col(y2, 1)), true(cell(x1, y2, )) At each stage t, this formula is encoded into a legal0,t constraint with scope mark0,t, row(x1, 1)t, col(y2, 1)t and cell(x1, y2, )t. The relation of this constraint states that the value (x1, y2, 1) is allowed for the action variable mark0,t iff the ﬂuent variable cell(x1, y2, )t is true, and at least one of the ﬂuent variables row(x1, 1)t and col(y2, 1)t is true. Similarly, by Clark s completion, the ground formula next(cell(x1, y2, 1)) does(random, mark(x1, y2, 1))

is replaced with

next(cell(x1, y2, 1)) does(random, mark(x1, y2, 1)) and then encoded at each stage t into a next0,t constraint which states the ﬂuent variable cell(x1, y2, 1)t+1 is true at turn t + 1 iff the value of mark0,t at turn t is (x1, y2, 1).

4.2 Microstructure complements

We have now all ingredients in hand to extend the notion of microstructure complement [Cohen et al., 2006] to stochastic constraint networks. Given an SCSP N, the microstructure complement of N is the hypergraph H, where the set of nodes is the collection of all variable-value pairs {(x, v) | x X, v D(x)}, and the set of hyperedges is the union over all constraints c C of the instantiations on scpc which are disallowed by c. Thus, v = {(z1, v1), , (zk, vk)} is a hyperedge of H iff {z1, , zk} is the scope of some constraint c C for which C(v) = 0. By extension, the microstructure complement of NR,T is the colored hypergraph HR,T , wherein nodes and hyperedges are deﬁned as above, and the coloring function χ of HR,T maps each node (x, v) to a triplet, given as follows:

if x is a ﬂuent variable fj,t, then χ(x, v) = (t, , ),

if x is a terminal variable ωt, then χ(x, v) = (t, , v),

if x is an action variable ai,t then χ(x, v) = (t, i, ), and

if x is a utility variable ui,t then χ(x, v) = (t, i, v),

where is an arbitrary symbol disjoint from I+ U. By construction, different colors are assigned to variables of different type, and to variables occurring at different stages. Any color-preserving automorphism of HR,T is called a constraint symmetry of NR,T .

Proposition 2. Let R be the P-GDL program of a T-horizon game. Then, any constraint symmetry of NR,T is a structural symmetry of G(R).

It is important to keep in mind that any solution policy of NR,T is just a legal strategy that satisﬁes all game rules. In order to encode minimax policies, let N R,T be the extension of NR,T to the constraints C = (C 1, , C T ), where C t = Ct {V i,t, Q i,t}k i=1. Here, V i,t and Q i,t are used to encode the minimax strategy (1). Namely, the scopes of V i,t and Q i,t are respectively given by {fj,t}n j=1 {ui,t} and {fj,t}n j=1 {ai,t}k i=1 {ui,t}, where ui,t is the utility variable of player i. The state-value constraint V i,t maps any state s to

ui,t = maxai mina i Q i,t(s, a) if ωt = 0, goali,t(s) otherwise.

Correspondingly, the action-value constraint Q i,t maps any state s and action proﬁle a to

a0 Pt(a0 | s, a)V i,t+1(s)

where a0 ranges over all values in D(a0,t) such that legal0,t(a0, s) = 1. By enforcing global consistency over N R,T , we must have Q i,t(s, a) = V i,t(s), which in turn implies that only optimal actions for player i are kept in the instantiations (s, a) which are GC. The network N d R,T is deﬁned analogously using the d-depth minimax strategy (2).

Proposition 3. Let R be the P-GDL program of a T-horizon game. Then, any constraint symmetry of N R,T (resp. N d R,T ) is a minimax (resp. d-depth minimax) symmetry of G(R).

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

x1 (x1, y2, ) y1

x2 (x2, y1, ) y2

(x2, y2, ) choose Row1,t

choose Col1,t

Figure 2: A sub-hypergraph of the micro-structure complement associated with the one-stage SCSP of Example 5.

Though theoretically interesting, the above result can be difﬁcult to implement due to the large size of the microstructure complement of a T-stage SCSP. Fortunately, N R,T can be viewed as a sequence of one-stage SCSPs, each associated with a restricted microstructure complement. Concretely, the tth one-stage SCSP of N R,T is the restriction N R,t of N R,T to the stage Xt, that is, N R,t = (Xt, Yt, D, C t , Pt, θ). The corresponding microstructure complement is the projection H R,t of HR,T onto the variables Xt, satisfying the following backward consistency condition: if vt is a tuple on Xt which does not include any hyperedge in H R,t, then there is an extension v of vt on (X1, , Xt) which does not include any hyperedge in H R,1 H R,t. For the stochastic constraint network N d R,T , the tth one-stage SCSP N d R,t and its microstructure complement Hd R,t are deﬁned in a similar way. Proposition 4. Let R be the P-GDL program of a T-horizon game. Then, for every t {1, , T}, any constraint symmetry of N R,t (resp. N d R,t) preserves the minimax (resp. d-depth minimax) strategies of G(R).

From a practical viewpoint, constraint symmetries of one-stage SCSPs can be exploited for reducing the search of minimax strategies: once an instantiation vt on the stage Xt has been proved GC, automorphisms of Hd R,t can be applied to derive symmetric GC instantiations at this stage. Equivalently, if any extension of vt has reached a dead end, then all symmetric instantiations of vt at stage Xt can be ignored. Example 5. Consider the randomized 2 2 Tic-Tac-Toe game after a couple of turns. During the ﬁrst stage, player 1 selected choose Row(x1), and the chance player marked (x1, y1) with 1. Next, player 2 selected choose Col(y2), and the chance player marked (x2, y2) with 2. Figure 2 depicts the microstructure complement of the third one-stage SCSP. For the sake of clarity, only action variables of players 0 (random) and 1 are represented. We can here observe that all legal moves for player 1 are symmetrical. Indeed, the automorphism obtained by permuting the symbols 1 and 2 in the values of the action variables, yields a symmetry between rows x1 and x2, and a symmetry between columns y1 and y2. Analogously, the automorphism given by permuting the symbols x and y in those values, gives rise to a symmetry between row x1 (resp. x2) and column y1 (resp. y2).

5 Experiments

Based on our framework, we now present a series of experiments conducted on a cluster of Intel Xeon E5-2643 CPU 3.3 GHz with 64 GB of RAM and four threads under Linux. All problems used in experiments are turn-based two-player zero-sum games. Speciﬁcally, we focus on 25 representative games: 20 deterministic GDL games, and 5 stochastic GDL games (with random). For all games, we used the 2015 Tiltyard Open (International GGP Competition) setup: 180s for the start clock (deliberation time before the ﬁrst turn) and 15s for the play clock (deliberation time per turn).

MAC-UCB-SYM. In order to implement our constraintbased symmetry detection method, we used (a variant of) the MAC-UCB algorithm developed in [Koriche et al., 2016]. Any GDL description R is ﬁrst translated into the Clark s completion of its ground instance Rc, which is then encoded into a T-stage SCSP NR,T . The horizon T was ﬁxed to 200. The goal of MAC-UCB is to ﬁnd d-minimax strategies, that is, solution policies for the SCSP N d R,T . The MAC (Maintaining Arc Consistency) technique searches solution policies up to depth d, and the UCB (Upper Conﬁdence Bounds) stochastic bandit method estimates the value ˆV (s) of states s at depth d. In what follows, MAC-UCB-SYM is the upgrading of MAC-UCB that incorporates symmetry detection using the microstructure complement of one-stage SCSPs. The choice of the depth d was derived by partitioning the deliberation time into three ratios (r MAC, r UCB, r SYM), which correspond to the proportions of runtime allocated for MAC, FLAT-UCB, and symmetry detection, respectively. Based on a sensitivity analysis of MAC-UCB-SYM, we used the ratios (45%, 30%, 25%). MAC-UCB-SYM satisﬁes the condition of backward consistency by iteratively expanding the partial instantiations of a policy tree rooted at the initial state s1. For a partial instantiation v on (X1, , Xt 1) which is consistent with all constraints in (Cd 1, , Cd t 1), the algorithm stores in Hd R,t all hyperedges et such that v et is inconsistent with N d R,t. So, for any independent set vt of Hd R,t, the instantiation v vt is guaranteed to be consistent with (Cd 1, , Cd t ). We used the NAUTY algorithm [Mc Kay and Piperno, 2014] for detecting automorphisms of microstructure complements. Concretely, MAC-UCB-SYM uses a hash table of 32Gb with a hash function speciﬁed in [Zobrist, 1990]. For each one-stage SCSP N d R,t, we store in the hash table the tuples vt on Xt for which the value Qd(vt) has been estimated by MAC-UCB.1 When the algorithm expands a partial instantiation v on (X1, , Xt 1) with an independent set vt of Hd R,t, we detect whether vt is symmetric to some tuple v t in the hash-table. If so, the value Qd(v t) is directly mapped to Qd(vt) without exploring the whole subtree up to depth d.

GGP Competitors. MAC-UCB-SYM was compared to several General Game Players: the ﬁrst player is the (multiagent version) of the UCT algorithm [Sturtevant, 2008], which is the state-of-the-art method for deterministic games. The

1Here, Qd(vt) = Qd i,t(s, a) where (s, a) is the projection of vt onto the ﬂuent and action variables {fj,t}n j=1 {ai,t}k i=1 of Xt.

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

Deterministic GDL games Game MAC-UCB UCT UCT-SYM GRAVE GRAVE-SYM SANCHO

Amazons torus 10 10 84.2 ( 1.2%) 96.0 ( 2.2%) 98.1 ( 1.7%) 78.4 ( 3.4%) 86.7 ( 2.7%) 86.2 ( 3.1%) Breakthrough suicide 93.0 ( 2.3%) 93.1 ( 1.6%) 81.9 ( 3.7%) 59.4 ( 0.9%) 73.2 ( 2.9%) 77.8 ( 4.0%) Chess 76.4 ( 2.5%) 95.8 ( 1.8%) 95.3 ( 2.1%) 88.1 ( 4.2%) 95.4 ( 2.5%) 87.9 ( 2.1%) Connect Four 20 20 87.5 ( 3.5%) 97.0 ( 1.2%) 100.0 ( 0.0%) 65.1 ( 3.1%) 88.5 ( 2.2%) 96.0 ( 0.9%) Connect Four Simultaneous 73.7 ( 2.8%) 87.8 ( 1.7%) 96.1 ( 0.9%) 77.2 ( 3.4%) 93.2 ( 3.6%) 82.0 ( 2.6%) Copolymer with pie 73.9 ( 1.5%) 87.2 ( 2.3%) 93.3 ( 0.5%) 80.2 ( 1.6%) 91.6 ( 1.8%) 77.9 ( 3.6%) Dots and boxes suicide 65.4 ( 1.7%) 95.2 ( 0.9%) 83.7 ( 2.4%) 80.5 ( 1.6%) 70.3 ( 2.7%) 88.0 ( 1.5%) English Draughts 85.1 ( 2.8%) 97.9 ( 1.3%) 97.4 ( 1.3%) 70.1 ( 4.1%) 71.2 ( 3.1%) 59.3 ( 1.5%) Free For All 2P 53.4 ( 0.7%) 81.5 ( 2.4%) 84.8 ( 1.9%) 61.2 ( 0.7%) 72.3 ( 1.6%) 71.2 ( 2.3%) Hex 84.0 ( 1.4%) 100.0 ( 0.0%) 100.0 ( 0.0%) 74.2 ( 3.2%) 89.8 ( 2.9%) 78.1 ( 1.5%) Knight Through 81.2 ( 2.4%) 96.0 ( 1.3%) 90.6 ( 2.6%) 82.1 ( 3.5%) 81.2 ( 2.3%) 88.1 ( 2.6%) Majorities 84.4 ( 2.6%) 95.7 ( 1.6%) 96.8 ( 1.4%) 81.2 ( 2.6%) 92.3 ( 2.3%) 87.2 ( 3.2%) Pentago 53.1 ( 1.5%) 84.8 ( 3.1%) 66.2 ( 2.8%) 72.1 ( 2.3%) 58.4 ( 2.8%) 54.3 ( 0.9%) Quarto 54.9 ( 1.6%) 87.1 ( 3.4%) 65.6 ( 2.6%) 65.2 ( 3.2%) 63.8 ( 1.6%) 57.9 ( 2.3%) Sheep and Wolf 74.8 ( 3.2%) 92.5 ( 2.4%) 94.6 ( 0.9%) 64.4 ( 3.7%) 63.2 ( 3.6%) 62.1 ( 1.5%) Shmup 58.0 ( 1.7%) 88.2 ( 2.6%) 63.7 ( 2.2%) 55.3 ( 1.5%) 52.1 ( 0.2%) 53.0 ( 0.6%) Skirmish zero-sum 85.4 ( 2.5%) 98.7 ( 0.4%) 100.0 ( 0.0%) 77.7 ( 1.6%) 95.8 ( 2.3%) 67.4 ( 4.2%) Tic Tac Chess 2P 94.9 ( 3.4%) 97.6 ( 0.7%) 96.5 ( 0.4%) 76.3 ( 1.3%) 93.2 ( 2.3%) 86.1 ( 3.3%) TTCC4 2P 84.4 ( 2.3%) 97.3 ( 1.2%) 97.2 ( 2.1%) 62.4 ( 2.3%) 85.7 ( 3.1%) 65.8 ( 4.1%) Reversi Suicide 72.2 ( 3.2%) 100.0 ( 0.0%) 100.0 ( 0.0%) 66.1 ( 2.7%) 78.7 ( 2.2%) 58.2 ( 2.2%) Stochastic GDL games

Backgammon 92.1 ( 2.7%) 98.0 ( 0.6%) 96.1 ( 1.4%) 70.2 ( 4.5%) 86.8 ( 3.9%) 100.0 ( 0.0%) Can t Stop 88.2 ( 1.7%) 94.1 ( 2.3%) 96.8 ( 1.7%) 93.4 ( 1.3%) 93.7 ( 3.2%) 100.0 ( 0.0%) Kaseklau 73.5 ( 3.6%) 73.4 ( 1.3%) 72.1 ( 0.9%) 67.4 ( 2.8%) 60.2 ( 3.2%) 88.1 ( 2.6%) Pickomino 75.4 ( 1.8%) 74.3 ( 2.4%) 82.4 ( 2.8%) 87.1 ( 3.0%) 95.6 ( 1.0%) 92.1 ( 2.9%) Yahtzee 87.4 ( 1.6%) 88.2 ( 2.3%) 83.1 ( 3.3%) 64.1 ( 3.2%) 60.9 ( 2.5%) 91.8 ( 3.3%)

Table 1: Results of MAC-UCB-SYM against each GGP player.

second player is Cazenave s GRAVE algorithm (2015), that implements the Generalized Rapid Action Value Estimation technique, a generalization of RAVE [Gelly and Silver, 2007]. The last player is SANCHO (version 1.61c), a Monte Carlo Tree Search method elaborated by S. Draper and A. Rose, which has won the 2014 GGP Competition. Rule-based symmetry detection methods were implemented for UCT and GRAVE, yielding the UCT-SYM player and GRAVE-SYM player, respectively. By analogy with MAC-UCB-SYM, UCT-SYM and GRAVE-SYM used NAUTY for detecting automorphisms in the ground rule graph, together with a hash table for exploiting symmetries. Based on a sensitivity analysis of these players, UCT-SYM (resp. GRAVE-SYM) used 20% (resp. 25%) of its deliberation time for symmetry detection. Experiments were conducted on 66, 000 game contests, namely, 300 matches for each deterministic game, and 1000 matches for each stochastic game, in order to consider the probabilistic effect of outcomes. For the sake of fairness, the role of players was exchanged during each match.

Results. In Table 1, each column reports the proportion of wins of MAC-UCB-SYM against the selected adversary. For example, the third row of the second column indicates that, for the Chess game, MAC-UCB-SYM wins 95.8% of contests against UCT with a standard deviation of 1.8%.

In light of these results, the following observations can be made: (i) MAC-UCB-SYM outperforms MAC-UCB especially for large games (ex: Amazons torus 10 10), which indicates that constraint-based symmetry detection is an effective method for reducing exploration in large search trees; (ii) MAC-UCB-SYM also dominates UCT, GRAVE, and SANCHO, even for deterministic games which is the class targeted by these algorithms; and (iii) MAC-UCB-SYM is even better against UCT-SYM and GRAVE-SYM for most games. The last point clearly indicates that constraint-based symmetry methods, used to recognize (approximate) minimax symmetries, are more effective than rule-based symmetry detection methods, used to recognize structural symmetries.

6 Conclusion

In this paper, we have shown that constraint-based symmetry detection can prove effective for solving GDL games. Beyond the conclusive results obtained by coupling the stochastic constraint solver MAC-UCB with constraint-based symmetry detection, our framework emerges as a ﬂexible approach for recognizing various types of game symmetries. Notably, a key perspective of further research is to detect equilibrium symmetries that preserve (mixed) Nash equilibria, or correlated equilibria, using a constraint-based approach for modeling these solution concepts.

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

References [Ben-Eliyahu and Dechter, 1994] Rachel Ben-Eliyahu and Rina Dechter. Propositional semantics for disjunctive logic programs. Annals of Mathematics and Artiﬁcial Intelligence, 12(1-2):53 87, 1994. [Cazenave, 2015] Tristan Cazenave. Generalized rapid action value estimation. In Proceedings of IJCAI, pages 754 760, 2015. [Clark, 1978] Keith L. Clark. Negation as failure. In Logic and Databases, pages 293 322. Plenum Press, 1978. [Cohen et al., 2006] David Cohen, Peter Jeavons, Christopher Jefferson, Karen E. Petrie, and Barbara M. Smith. Symmetry deﬁnitions for constraint satisfaction problems. Constraints, 11(2-3):115 137, 2006. [Condon, 1992] Anne Condon. The complexity of stochastic games. Information and Computation, 96:203 224, 1992. [Gelly and Silver, 2007] Sylvain Gelly and David Silver. Combining online and ofﬂine knowledge in UCT. In Proceedings of ICML, pages 273 280, 2007. [Genesereth et al., 2005] Michael Genesereth, Nathaniel Love, and Barney Pell. General game playing: Overview of the AAAI competition. AAAI Magazine, 26(2):62 72, 2005. [Hnich et al., 2012] Brahim Hnich, Roberto Rossi, S. Armagan Tarim, and Steven Prestwich. Filtering algorithms for global chance constraints. Artiﬁcial Intelligence, 189:69 94, 2012. [J egou, 1993] Philippe J egou. Decomposition of domains based on the micro-structure of ﬁnite constraintsatisfaction problems. In Proceedings of AAAI, pages 731 736, 1993. [Koriche et al., 2016] Fr ed eric Koriche, Sylvain Lagrue, Eric Piette, and S ebastien Tabary. General game playing with stochastic CSP. Constraints, 21(1):95 114, 2016. [Kuhlmann and Stone, 2007] Gregory Kuhlmann and Peter Stone. Graph-based domain mapping for transfer learning in general games. In Proceedings of ECML, pages 188 200, 2007. [Lanctot et al., 2013] Marc Lanctot, Abdallah Safﬁdine, Joel Veness, Christopher Archibald, and Mark H.M. Winands. Monte Carlo *-minimax search. In Proceedings of IJCAI, pages 580 586, 2013. [Love et al., 2008] Nathaniel Love, Timothy Hinrichs, David Haley, Eric Schkufza, and Michael Genesereth. General game playing: Game description language speciﬁcation. Technical Report LG-2006-01, Stanford University, 2008. [Mc Kay and Piperno, 2014] Brendan D. Mc Kay and Adolfo Piperno. Practical graph isomorphism, II. Journal of Symbolic Computation, 60(0):94 112, 2014. [Safﬁdine, 2014] Abdallah Safﬁdine. The game description language is Turing complete. IEEE Transactions on Computational Intelligence and AI in Games, 6(4):320 324, 2014.

[Schiffel and Thielscher, 2014] Stephan Schiffel and Michael Thielscher. Representing and reasoning about the rules of general games with imperfect information. Journal of Artiﬁcial Intelligence Research, 49:171 206, 2014. [Schiffel, 2010] Stephan Schiffel. Symmetry detection in general game playing. In Proceedings of AAAI, 2010. [Sturtevant, 2008] Nathan R. Sturtevant. An analysis of UCT in multi-player games. International Computer Games Association Journal, 31(4):195 208, 2008. [Tarim et al., 2006] S. Armagan Tarim, Suresh Manandhar, and Toby Walsh. Stochastic constraint programming: A scenario-based approach. Constraints, 11(1):53 80, 2006. [Thielscher, 2016] Michael Thielscher. GDL-III: A proposal to extend the game description language to general epistemic games. In Proceedings of ECAI, pages 1630 1631, 2016. [Walsh, 2002] Toby Walsh. Stochastic Constraint Programming. In Proceedings of ECAI, pages 111 115, 2002. [Zhang et al., 2015] Haifeng Zhang, Dangyi Liu, and Wenxin Li. Space-consistent game equivalence detection in General Game Playing. In Proceedings of CGW, pages 165 177, 2015. [Zobrist, 1990] Albert L. Zobrist. A new hashing method with application for game playing. International Computer Games Association Journal, 13(2):69 73, 1990.

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)