# factored_probabilistic_belief_tracking__90e655ea.pdf

Factored Probabilistic Belief Tracking

Blai Bonet Universidad Sim on Bol ıvar

Caracas, Venezuela

bonet@ldc.usb.ve

Hector Geffner ICREA & Universitat Pompeu Fabra

Barcelona, SPAIN hector.geffner@upf.edu

The problem of belief tracking in the presence of stochastic actions and observations is pervasive and yet computationally intractable. In this work we show however that probabilistic beliefs can be maintained in factored form exactly and efﬁciently across a number of causally closed beams, when the state variables that appear in more than one beam obey a form of backward determinism. Since computing marginals from the factors is still computationally intractable in general, and variables appearing in several beams are not always backward-deterministic, the basic formulation is extended with two approximations: forms of belief propagation for computing marginals from factors, and sampling of non-backward-deterministic variables for making such variables backwarddeterministic given their sampled history. Unlike, Rao-Blackwellized particle-ﬁltering, the sampling is not used for making inference tractable but for making the factorization sound. The resulting algorithm involves sampling and belief propagation or just one of them as determined by the structure of the model.

1 Introduction Keeping track of beliefs when actions and sensors are probabilistic is crucial and yet computationally intractable, with exact algorithms running in time that is exponential in the number of state variables in the worst case. Current approaches rely on samples for approximating probabilistic beliefs by sets of particles [Kanazawa et al., 1995; Thrun et al., 2001] or decompositions where joint beliefs are approximated by products of smaller local beliefs [Boyen and Koller, 1998]. Particle ﬁltering methods however may require too many particles even in the Rao-Blackwellized (RB) version [Murphy, 1999; Doucet et al., 2000], while decomposition approaches may result in poor approximations.

In this work we take a different approach and show that probabilistic beliefs can be maintained in factored form exactly and efﬁciently across a number causally closed beams when the state variables that appear in more than one beam obey a form of backward determinism by which the value of

a variable at time t is determined by the value of the variable at time t + 1, the history, and the prior beliefs. Since computing marginals from a factorized representation is computationally hard in general, and variables appearing in several beams are not always backward-deterministic, the basic formulation is extended through two approximations: forms of belief propagation for computing marginals from the factorized representation [Pearl, 1988], and sampling of nonbackward-deterministic variables for making them backwarddeterministic given their sampled history. This last part is similar to Rao-Blackwellized particle-ﬁltering with one crucial difference: sampling is not introduced for making inference tractable but for making the factorization sound. Like the method of Boyen and Koller [1998], the algorithm maintains global beliefs in terms of smaller local factors but the scope of these local factors is determined by the structure of the model and variables usually appear in many factors.

We call the general algorithm probabilistic beam tracking (PBT) as it is the probabilistic version of the beam tracking scheme of Bonet and Geffner [2014] that deals with beliefs represented as sets of states rather than probability distributions. When the size of the beams and the number of particles are bounded, PBT runs in polynomial time. In addition, when the structure of the beams results in factored representations that are acyclic and where factors share variables that are backward-deterministic only, the algorithm is exact.

As an illustration, the 1-line SLAM problem of Murphy [1999] that involves an agent that moves along a line sensing the color of each cell i, results in factors Bi of size two, where one variable represents the agent location, and the other, the color of cell i. PBT reduces then to RB particle-ﬁltering, as the variable that appears in more than one beam (the agent location) is not backward deterministic and must be sampled, leaving each beam with a single, unique variable. On the other hand, if the problem is modiﬁed so that the color sensed in a cell depends on the color of the two surrounding cells, PBT would still sample the agent location variable, but in addition would keep track of factors of size three representing the color of each cell and the color of the two surrounding cells. Inference over these factors can be done exactly by the jointree algorithm, as the treewidth of the factor graph is bounded and small [Darwiche, 2014], or approximately but more efﬁciently in general, by belief propagation. In all cases, the formulation determines the scope of the factors (the

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16)

beams) and the set of variables that need to be sampled from the structure of the model.

We begin by discussing the background and related work, and then present the model, the structure, and the beams that follow from them. Next, we derive the equations for factorized belief tracking in belief decomposable systems and extend the formulation over non-decomposable systems. We conclude with an approximation algorithm for computing marginals, some experimental results, and discussion.

2 Background and Related Work

In the ﬂat model, a state s is a value assignment to a set of state variables, actions a affect the state through transition probabilities P(s0|s, a), and observation tokens o provide partial information about the resulting state s0 through sensing probabilities P(o|s0, a). Given the prior P(s), the target belief P(s|h), where h = ha0, o0, . . . , ai, oii is an interleaved sequence of actions and observations, is characterized inductively as P(s|h) = P(s) for empty h, P(s|h, a) = P

s0 P(s|s0, a) P(s0|h), and P(s|h, a, o) = P(o|s, a) P(s|h, a) where = 1/P(o|h, a) is a normalizing constant.

Keeping track of beliefs using this representation is exponential in the number of state variables. Approaches have thus been developed to exploit problem structure. This structure is often made explicit through the language of Bayesian networks [Pearl, 1988]. However, while the posterior P(s|h) can be obtained from a Dynamic Bayesian network (DBN) with as many slices as time steps [Dean and Kanazawa, 1989], exact inference over such networks is hard (yet see [Vlasselaer et al., 2016]), so approximate inference schemes have been pursued instead [Murphy, 2002]. In the method of Boyen and Koller [1998], global joint beliefs are approximated as products of smaller local beliefs, while in particle ﬁltering methods (PF), global beliefs are formed from a set of samples [Kanazawa et al., 1995; Thrun et al., 2001; Koller and Friedman, 2009]. The methods are related to well known approximation techniques for general Bayesian networks like the mini-bucket approximation [Dechter and Rish, 2003], (sampled) cut-set conditioning [Pearl, 1988], and restricted forms of belief propagation [Murphy and Weiss, 2001].

Our work is related to the ideas underlying these methods but it does not build explicitly on them. It is indeed a generalization of the beam tracking (BT) method for keeping track of beliefs given by sets of states as opposed to probability distributions [Bonet and Geffner, 2014]. Probabilistic beam tracking is aimed at combining the effectiveness of BT with the ability to handle noisy actions and sensing.

3 Model, Structure, and Beams

Model. The set of all state variables is denoted as X. A state x deﬁnes a value for each state variable. In general, subsets of variables are denoted with uppercase letters and their values with lowercase letters. Actions a affect the state stochastically with given transition probabilities tr(x0|x, a). The set of all observation variables is denoted with O and lowercase o denotes an observation; i.e. a value for each of the

observation variables. The sensor model is also Markovian with probabilities q(o|x, a). The joint prior is P(x).

Histories and Beliefs. A history or execution h is an interleaved sequence of action and observations that begins with an action. A history is complete if it is empty or ends with an observation. The state and observation variables at time t are denoted as Xt and Ot respectively. The observation ot in the history h encodes the value of the observation variables Ot. For each history h, either complete or incomplete, there is a probability measure Ph over the events deﬁned by the random variables associated with h. We abuse notation by writing P(A|h) instead of Ph(A), and P(A|a, h) instead of Pha(A). For a complete execution h for t time steps, the joint over the state variables Xt is denoted by Bh(x) .= P(x|h). The prior belief for the empty history h is Bh(x) = P(x).

Causal Structure and 2-DBN. The transition and sensing probabilities are given by a 2-slice DBN whose nodes V , V 0, and O stand for the state variables before and after the action, and for the observation variables. The transition and sensing probabilities (parameters) are P(V 0|pa(V 0), a) and P(O|pa(O), a) where the parents pa(V 0) of V 0 are among the V -variables, and the parents pa(O) of O are among the V 0-variables. A variable W is a cause of V if W or W 0 is a parent of V in the 2-DBN, and W is causally relevant to V if it is a cause of V or is causally relevant to a cause of V .

Beams. A beam is a non-empty subset of state variables that is causally closed; i.e., if a variable belongs to the beam, its causes must belong to the beam as well. A collection of beams is complete if each state variable is included in some beam and for each observation variable, its set of causes is included in some beam. The size of a beam is the number of variables that it contains, and a complete collection of beams is minimal if no beam can be replaced by a collection of beams of smaller size. There is actually a unique minimal complete collection of beams B that can be constructed as follows: if Bi is the set of state variables that are causally relevant to a state variable Vi or to an observation variable Oi, then B is the collection of beams Bi with the duplicate beams removed, and the beams Bk properly contained in other beams removed as well. By default, we assume this beam structure B. If we enumerate the beams in this structure as B1, . . . , Bm, we will refer to beam Bi also by its index i. This beam structure is the same as the one in the non-probabilistic formulation [Bonet and Geffner, 2014] where the causal width of the problem is deﬁned as the size of the largest beam. This structural measure is important because beam tracking runs in time that is exponential in the causal width.1 The same will be true for probabilistic beam tracking. Many problems of interest can be formulated so that their causal width is bounded and small.

Example: In Minesweeper, whether in the normal or noisy version, there are hidden state variables Vi encoding whether there is a mine at cell i, and observation variables Oi that

1Actually, variables that are determined, meaning that their initial value is known and can be affected by deterministic actions only, do not add to the causal width [Bonet and Geffner, 2014].

L M1 M2 . . . Mn

2 . . . M 0

Beam B Time

Figure 1: 2-slice DBN and beam structure for the direct formulation of the 1-line SLAM problem. The observation probabilities satisfy P(o| , m1, . . . , mn, a) = q(o| , m ) that corresponds to a form of context-speciﬁc independence. This independence is not exploited and results in a beam structure that contains a single beam B with all the n+1 state variables.

after probing cell i reveal the total number of mines at the 8 surrounding cells and Vi itself. The causes of Oi are the 9 cell variables Vk that have no further causes (observation variables for border cells have fewer causes). The minimal complete beam structure is given by beams Bi, one for each cell i, each of size no greater than 9. Thus, while tracking beliefs in Minesweeper is NP-hard [Kaye, 2000], the causal width of the problem is 9.

Example: In the direct formulation of Murphy s 1-line SLAM problem mentioned above, there are color variables Mi for each cell i, an agent location variable L, and one observation variable O. The observation o that results after applying an action a is determined by the probabilities q(o| , m1, . . . , mn, a) = q(o| , m ) that encode a form of context-speciﬁc independence [Boutilier et al., 1996] in which the observation o depends only on the color m of the cell for L = . In this representation, all the state variables L and {Mi}i are causally relevant to the observation variable O, determining a beam structure with a single beam B of size n + 1 (Fig. 1). It is possible, however, to take advantage of context-speciﬁc independence to reformulate the model so that its causal width becomes bounded and small. For this, it sufﬁces to split the single observation variable O into n observation variables O1, . . . , On, one for each cell i, such that the parents of variable Oi are the variables L and Mi only, and to set the probability qi(oi| , mi) for variable Oi equal to q(o | , m ) when = i and to 1/2 (a normalizing constant) when 6= i. The value of all the artiﬁcial observation variables Oi at time t is set to the value of the real observation variable O at time t; that is, if the observation O has value o at time t then all variables Oi are set to o at time t. The sensor model of the reformulated task is deﬁned as q(ho1, . . . , oni| , m1, . . . , mn, a) = Qn

i=1 qi(oi| , mi). While the two models are equivalent, the ﬁrst has one beam of size n + 1, while the latter has n beams Bi of size 2, each containing the agent location variable L and the cell variable Mi, i = 1, . . . , n, for a causal width of 2 (Fig. 2).

Internal and External Variables. A variable that appears in more than one beam is called external, while one that appears

L M1 M2 M3 . . . Mn

3 . . . M 0

O1 O2 O3 . . . On

Figure 2: 2-slice DBN and beam structure for the 1-line SLAM problem formulated with multiple dummy observation variables Oi that are set to the value of O. In this model, there is one beam Bi for each cell i that contains two variables only, L and Mi, for a causal width of 2.

in one beam only is called internal. The internal and external variables for beam Xj are denoted as Yj and Zj respectively. The set X of all (state) variables is partitioned as X = Y Z where Y are all the internal variables and Z are all the external variables. In the second formulation of 1-line SLAM, the color variables are all internal, and the agent location variable is external. In Minesweeper, all variables are external.

Factored Model. Given the beam structure determined by the 2-DBN structure, the transition and sensing probabilities tr(x0|x, a) and q(o0|x0, a) can be factorized as tr(x0|x, a) = tr(y0z0|x, a) = tr(y0|z0, x, a)tr(z0|x, a) with tr(y0|z0, x, a) = Q

j, xj, a), tr(z0|x, a) = Q

j|xj, a), and q(o0|x0, a) = Q

j, a), where j ranges over the beam indices. All the trj and qj probabilities are determined by the conditional probabilities P(V 0|pa(V 0), a) in the 2-DBN.2 It is also assumed without loss of generality that the prior belief Bh(x) for the empty history factorizes across the beams as Q

j (xj). When this is not the case, the model can be extended with an extra action-observation pair that must start any non-empty history.

4 Tracking in Belief Decomposable Systems

Given the model structure and beams, our task is to show that the posterior joint beliefs Bh(x) = Bh(Xt = x) for histories h = ha0, o0, . . . , at, oti can be expressed as the normalized product of belief factors Q

j (xj), one for each beam j, that are tracked independently. For this, we assume a form of determinism over the state variables that appear in more than

2If Yj is {V1, . . . , Vk} where the variables Vi are ordered topologically, trj(y0

j, xj, a) is Qk

i ), a). Associating each external variable V with the smallest j such that Bj contains V , and each observation variable W with the smallest j such that Bj contains all of its causes, trj(z0

j|xj, a) factorizes as Qk

i ), a), where V1, . . . , Vk are the external variables in Zj associated with the beam j in topological order, and qj(w0|x0

j, a) is P(w0|pa(w0), a) when the observation variable W is associated with the beam j, and else qj(w0|x0

one beam; namely, the external variables. The following key deﬁnition is from Bonet and Geffner [2014]:

Deﬁnition 1 (Backward Determinism) A state variable V is backward deterministic if the value of V at any time t is determined by the value of V at time t + 1, the action at time t, the history h up to time t, and the priors.

Notice that static variables and variables that are initially fully known and are affected by deterministic actions only are backward deterministic, as are the variables that are fully observable and the variables V affected by deterministic actions that map different values of V into different values of V 0 [Amir and Russell, 2003]. When V is backward deterministic, we write Ra(v|h) to denote the value of V at time t that is determined by the value v of V at time t + 1, the complete history h up to time t, the next action a, and the priors. We refer to Ra(v|h) as the regression of V = v given h, a, and the priors (left implicit in the notation).

Deﬁnition 2 (Belief Decomposable Model) A model is belief decomposable when all the external variables are backward deterministic.

It will be convenient to use the abbreviation BD to refer to both belief decomposable models and to backward deterministic variables. The notation Ra(z|h) for the set Z of external BD variables denotes the regression of the value vector z through h and a. If the value v for a variable or set of BD variables V is impossible given h and a, we write Ra(z|h) = ?. Clearly, Ra(z|h) = ? iff Ra(zj|h) = ? for some beam j, and Ra(z|h)j = Ra(zj|h).

4.1 Equations for the Belief Factors We want to show that the distribution Bh(x) = P(x|h) after history h for a BD model is the normalized product of factors Bh

Bh(x) = β Q

j (xj) denotes the belief factor over the variables in beam j, and β = β(h) is a normalization factor that only depends on h. We show this inductively by using the assumption of factorized priors and by expressing the factors that deﬁne the joint belief Bh0(x) for h0 = hh, a, oi in terms of the factors that deﬁne the joint belief Bh(x), where a is an action and o an observation such that P(o|a, h) > 0. The factors Bh

j (xj) for the empty history h are given.

Let x0 = y0z0 be a valuation for Xt+1. Using Bayes rule, the posterior can be expressed as

P(x0|o, a, h) = P(o|x0, a, h) P(x0|a, h) (2)

where = 1/P(o|a, h) is a normalizing constant, and the second term is

P(x0|a, h) = P

y P(y0|z0, y, a, h) P(z0, y|a, h) . (3)

Assume now that Ra(z0|h) 6= ?. Using backward determinism and factored transitions, the ﬁrst term in (3) becomes:

P(y0|z0, y, a, h) = tr(y0|z0, y, Ra(z0|h), a) (4) = Q

j, yj, Ra(z0|h)j, a) . (5)

For the second term, we use the inductive hypothesis:

P(z0, y|a, h) = P(z0, y, Ra(z0|h)|a, h) (6)

= P(z0|y, Ra(z0|h), a, h) P(y, Ra(z0|h)|a, h) (7)

= tr(z0|y, Ra(z0|h), a) P(y, Ra(z0|h)|h) (8)

= tr(z0|y, Ra(z0|h), a) β Q

j (yj, Ra(z0|h)j) (9)

j|yj, Ra(z0

j|h), a) Bh

j (yj, Ra(z0

j|h)) . (10)

Substituting these expressions back into (3), abbreviating Ra(z0

j|h) as R(z0

j), and using Yi \ Yj = ; for i 6= j:

P(x0|a, h) β 1 =

j (yj, R(z0

j (yj, R(z0

j (yj, R(z0

Finally, from the factorization of the observations

P(o|x0, a, h) = q(o|x0, a) = Q

j, a) , (14)

and the inductive hypothesis, the factors Bh0(x0

j) = 0 qj(oj|y0

j | yj, Ra(z0

j|h), a) Bh

j (yj, Ra(z0

when Ra(z0|h) 6= ?, and Bh0

j) = 0 otherwise. That is, the belief factors Bh

j are progressed and ﬁltered independently for each beam j. The complexity of updating each belief factor, i.e., mapping Bh

j for h0 = hh, a, oi, is exponential in the beam j size, and more precisely, in the number of internal variables in beam j.

While the factors Bh

j determine the joint distribution Bh, the computation of marginals from such a joint is hard in general. We will come back to this issue but focus now on extending the formulation to models where not all external variables are backward deterministic. For this, like in Rao Blackwellized PF methods, we sample such variables to make them backward deterministic given their sampled history.3

5 Tracking in Non-Decomposable Systems We assume now that the set X of state variables is partitioned into three sets YZU where Y stands for the internal variables (variables appearing in one beam only), Z for the external variables that are backward deterministic, and the new set U for the external variables that are not BD and need to be sampled. If the context above was provided by the history h of actions and observations, the new context is provided by h

3The assumption of backward determinism appears in (15) through the use of the regressions Ra(z0

j|h). A different type of approximation can be deﬁned by replacing such regressions by values zj that are summed over with weights given by P(zj|z0

j, h, a). When the BD assumption holds, these weights are either 0 or 1, and the new formula reduces to the old formula. When BD does not hold, the new formula encodes an approximation. We tested this approximation empirically, but in the examples considered, it does not run faster than the particle-based approximation and its quality does not appear to be better either.

and the sampled history u of the U variables. The expression Ra(z0, u0| u0, h) denotes the pair of values Ra(z0|h) and Ra(u0| u0) where the latter denotes u, the value preceding the last value u0 in the sampled history u0. The joint over the Y and Z variables can be expressed as

P(y, z|h) = P

u P(y, z| u, h) P( u|h) (16)

which can be approximated as:

P(y, z|h) PN

i=1 P(y, z| ui, h) (17)

where the histories ui are sampled with probability P( ui|h). It is often convenient however to sample ui with a different distribution ( u|h), called the sampling or proposal distribution, as long as no possible U history is made impossible. In this method, called importance-based sampling [Bishop, 2007] the approximation becomes:

P(y, z|h) PN

i=1 wi P(y, z| ui, h) (18)

where is a normalizing constant, and the weights wi are

wi = P( ui|h)/ ( ui|h) . (19)

Provided with the sampled histories u, it can be shown that P(y, z| u, h) in (16) becomes:

P(y, z| u, h) = Bh(y, z| u) = β Q

j (yj, zj| u) (20)

where the factors Bh

j ( | u) can be progressed to h0 = hh, a, oi and u0 = h u, u0i as:

j| u0) = 00 qj(oj|x0

j|yj, Ra(z0

j|h), uj, a) Bh

j (yj, Ra(z0

j|h)| u) (21)

j|h) 6= ?, else Bh0

j| u0) = 0. This equation is indeed exactly like (15) except for the sampled history u included in the context, and the new component u0

j. In summary, the ﬁlter for approximating the target joint distribution P(y, z|h) is given by the sequence of triplets Fh = {( ui, wi, {Bh

j }j)}i where h is the action-observation history, ui is the sampled history of the U variables, i = 1, . . . , N, wi is the weight associated with ui, and Bh

j (yj, zj| ui) represents the belief factor given h and ui for each beam j. These belief factors determine the probability P(y, z| u, h) via (20), that provides the approximation for P(y, z|h) via (18). The ﬁlter Fh is extended to Fh0 for h0 = hh, a, oi by 1) extending the sampled history, 2) computing its associated weight, and 3) updating the belief factors with the new action-observation pair and the new sample. This last operation is deﬁned by (21). We focus next on the other two operations. Initially, ui is empty and wi = 1 for i = 1, . . . , N.

5.1 Marginals, Samples, and Weights The factors Bh

j (xj| u), where xj represents the valuations yjzj over the non-sampled variables in beam bj, do not represent themselves the probabilities P(xj|h, u), that stand actually for the marginals:

P(xj|h, u) = β P

i (xi| u) (22)

where β is a normalizing constant and the sum ranges over all the variables W that are not in the beam j. Such marginals can be computed from the factors Bh

j (xj| u) by standard methods such as the jointree algorithm or belief propagation. We show next how to compute the proposal distribution and weights from such marginals.

We consider two proposal distributions : the so-called motion and optimal distributions [Doucet, 1998; Grisetti et al., 2005]. The motion distribution uses only the transition probabilities to generate new histories u0 = h u, u0i from previous histories u (not using the information provided by o):

motion(u0|h, a, o, u) = P(u0|h, a, u) , (23) while the optimal proposal makes use of both a and o:

opt(u0|h, a, o, u) = P(u0|h, a, o, u) . (24) For computing these probabilities effectively, we will assume that there is at least one beam j that contains the whole set of variables U, i.e. for which U = Uj. This assumption holds automatically when U is a singleton as in many SLAM problems. In addition and without loss of generality, we assume that the observation o = ot at time t, for each time step, falls into a single beam only; i.e., q(o|x, a) = qj(oj|xj, a) for some beam j, and qk(ok|xk, a) = 1 for k 6= j. This is true when one observation variable is observed at each time point, and when this is not true, it can be made true by serializing the observations, using dummy actions in between. Under these assumptions, the proposal distributions and their weights can be computed from the marginals P(xj|h, u) associated with one beam. Indeed, if U = Uj and xj = yj, zj, uj:

P(u0|h, a, u) = P

yj,zj P(u0|xj, a) P(yj, zj|h, u) (25)

and P(u0|h, a, o, u) / P(o|h, a, u0)P(u0|h, a, u) where

P(o|h, a, u0) = P

k|h, a, u0) (26)

if o falls into the beam k. The marginal P(x0

k|h, a, u0) = P(y0

k|h, a, u0) excludes the observation o, and hence it is obtained with (21) by setting all qj(oj|x0

j, a) to 1. The weights w0 = P( u0|h0)/ ( u0|h0) can be computed incrementally from the weights w = P( u|h)/ ( u|h) when h0 = hh, a, oi and u0 = h u, u0i. For the motion proposal, the weight is the ratio between:

P( u0|h0) = 0 P(o| u0, h, a) P(u0|h, a, u) P( u|h) and motion( u0|h0) = P(u0|h, a, u) ( u|h), which results in:

motion = 0 wmotion P(o| u0, h, a) . (27) For the optimal proposal, the weight is the ratio between

P( u0|h0) = 00 P(u0|h, a, o, u) P(o|h, a, u) P( u|h) and opt( u0|h0) = P(u0|h, a, o, u) ( u|h) which results in:

opt = 00 wopt P(o|h, a, u) . (28) If the observation o falls into beam k, the marginals required can be computed as:

P(o| u0, h, a) = P

k|h, a, u0) , and

P(o| u, h, a) = P

k|h, a, u0)P(u0|h, a, u)

j|h, a, u0) = P(y0

j|h, a, u0) is obtained as indicated above, and P(u0|h, a, u) is given by (25).

6 Faster Approximation of Marginals

The main bottleneck of the algorithm is the computation of the marginals P(xj|h, u) from the factors Bh

j (xj| u) following (22). Such marginals are needed for computing the samples and weights, and for answering queries. The marginals can be computed using the jointree algorithm or belief propagation (BP). However, since scalability is crucial, we introduce a third method that will be evaluated in comparison with the other two. It is motivated by the results reported by beam tracking that uses arc consistency (AC) [Mackworth, 1977; Dechter, 2003] for progressing logical (non-probabilistic) beliefs. The relation between BP and AC is well-known: both methods are exact for trees and BP propagates the zeroprobability entries in agreement with AC [Dechter and Mateescu, 2002]. We call the new method Iterated AC which is aimed at combining the speed and monotonic convergence of AC with the ability to approximate probabilistic beliefs, even if roughly.

Iterated AC approximates the marginals from the belief factors by using arc consistency along with order-ofmagnitude probabilities also called -rankings [Spohn, 1988; Goldszmidt and Pearl, 1996; Darwiche and Goldszmidt, 1994]. For this, it follows three steps that we sketch brieﬂy with no much justiﬁcation. The algorithm uses two parameters and m. First, real values p 2 (0, 1] are mapped into the smallest non-negative integer = (p) for which +1 < p, while p = 0 is mapped into (p) = 1. This mapping is used to transform the belief factor Bh

j ( | u) into tables Di

j that contain all the valuations xj (for the variables Xj in beam j) that satisfy (Bh

j (xj| u)) i + j for a given nonnegative integer i where j = minxj (Bh

j (xj| u)) is a normalization constant. Second, the tables Di

j associated with the different beams j and the same i are made arc consistent (the tables share variables). The -marginal (xj|h, u) is deﬁned then as i iff i is the minimum non-negative integer for which the tuple xj belongs to Di

j after running AC, while (xj|h, u) = 1 when there is no such i. Finally, the (xj|h, u) marginals are used to approximate the marginals as P(xj|h, u) = (xj|h, u) where is a normalizing constant.

A further simpliﬁcation is that measures that are greater than the m parameter but less than 1, are treated as if they were equal to m. This means that the approximation of the marginals P(xj|h, u) from the belief factors Bh

j ( | u) are computed by running arc consistency m + 1 times. Iterated AC will be denoted as ACm where m is the parameter used. The parameter is ﬁxed to 0.1.

7 Experimental Results

The general PBT algorithm can use different algorithms for computing marginals from the factors. We experiment with the jointree (JT), belief propagation (BP), and ACm algorithms for m 2 {0, 1}. For JT and BP we use libdai [Mooij, 2010] while ACm is ours. The experiments were performed on Intel Xeon E5-2666 CPUs running at 2.9GHz with a memory cap of 10Gb (exhausted/approached only by JT).

JT BP AC0 board mines %succ time %succ time %succ time

6 6 6 84.1 .002 68.5 .046 84.2 .000 8 8 10 83.3 .070 66.3 .069 84.6 .002 16 16 40 41.4 .232 79.9 .005 30 16 99 1.8 .991 33.4 .003

Table 1: PBT in Minesweeper using three methods for computing marginals. Time is average time per decision in seconds. JT runs out of memory in larger maps. Figures are averages over 500 runs.

Minesweeper. Minesweeper cannot be fully solved by pure inference and requires guessing in certain situations, even if variables are static and sensing is noiseless. Since all variables are external but static, and hence backward deterministic, no sampling is required. The beam decomposition has one beam Bi for each cell i that contains up to 9 variables. Table 1 shows the results for PBT using JT, BP, and AC0, with a policy that chooses to tag (resp. open) the cell that is most certain to contain a mine (resp. to be clear). The success ratio indicates the percentage of maps solved, i.e., without doing a wrong action. In this noise-free example, it can be shown that the marginals computed by PBT with AC0 are equivalent to the ones computed by beam tracking where the belief factors represent sets of states [Bonet and Geffner, 2014]. Actually, PBT with AC0 scales up best with a quality that matches the quality of JT in the small instances. JT does not scale up to larger instances and BP does but achieves a much lower score.

1-line-3-SLAM. This is a variant of the 1-line SLAM task considered before in which the observation received by the agent when in a certain cell depends also on the colors of the two adjacent cells. In such version of the problem, sampling the agent location is no longer sufﬁcient for decoupling the color cell variables, but in our formulation makes the belief factors acyclic, so that exact inference over such factors is not exponential in the total number of variables but in the size of the largest factor. The observation is a 0/1 token equal to the color of the current cell with probability p = 0.9noise where noise is one plus the number of adjacent cells with a different color. For a line of n cells, there are n beams of size up to 4, one for each cell, that include the agent location and the color of the three cells that inﬂuence the observation at the location. The agent location is the only external variable and is not backward deterministic, so it must be sampled. Table 2 shows results for instances of size n = 64 and n = 512, using 16 and 256 particles sampled with the optimal proposal distribution. The executions choose actions randomly until each cell of the grid is visited 10 times. The table shows averages with 97.5% conﬁdence intervals over 100 executions for percentages of cells labeled correctly (one of two colors), not labeled at all (no sufﬁciently certainty), and times per step and per execution. For this problem, a cell is assumed labeled when for one of the possible colors its marginal probability is 0.55 of higher. Since the resulting factorized model has bounded and small treewidth, JT scales up well to provide a good baseline with few errors, although 40% of the cells

1 n #p inf. %good %unknowns time / step time / exec

1 64 16 JT 97.7 0.2 41.7 0.7 0.0 0.0 2.9 0.0 1 64 16 BP 98.5 0.2 43.0 0.7 0.0 0.0 8.9 0.0 1 64 16 AC0 82.9 0.6 36.5 0.7 0.0 0.0 1.1 0.0 1 64 16 AC1 84.0 0.5 36.5 0.7 0.0 0.0 1.1 0.0

1 64 256 JT 97.9 0.2 41.0 0.8 0.0 0.0 42.8 0.4 1 64 256 BP 98.5 0.2 40.8 0.7 0.1 0.0 110.9 0.9 1 64 256 AC0 82.5 0.6 34.5 0.7 0.0 0.0 19.1 0.1 1 64 256 AC1 83.9 0.6 33.3 0.8 0.0 0.0 19.6 0.1

1 512 16 JT 97.8 0.1 47.8 0.2 0.0 0.0 319.7 1.0 1 512 16 BP 98.3 0.1 47.8 0.2 0.1 0.0 681.7 2.8 1 512 16 AC0 81.8 0.2 42.3 0.2 0.0 0.0 88.7 0.3 1 512 16 AC1 81.7 0.2 42.1 0.2 0.0 0.0 91.0 0.3

1 512 256 JT 98.2 0.1 47.5 0.2 0.7 0.0 4, 012.9 11.5 1 512 256 BP 98.4 0.1 47.7 0.2 1.4 0.0 8, 193.1 22.6 1 512 256 AC0 82.7 0.2 41.7 0.2 0.2 0.0 1, 117.9 3.1 1 512 256 AC1 82.6 0.2 42.1 0.2 0.2 0.0 1, 121.0 3.0

Table 2: PBT in 1-line SLAM using three methods for computing marginals. In this task, observations depend on the color of the current and adjacent cells. Figures are averages over 100 random executions, each of length roughly 10 times the number of cells.

are not labeled. The quality for BP in this case is similar but runs slower. AC0 is an order-of-magnitude faster but of lower quality. The results for AC1 are similar.

Minemapping. This problem is a version of Minesweeper that involves an agent that moves stochastically and receives noisy information about the presence/absence of mines in the cell and surrounding cells. The task for the agent is to map the mineﬁeld instead of clearing it as in Minesweeper, and the observations are similar but noisy. More precisely, movement actions have .9 probability of success and .1 probability of doing nothing, and when the current location is i, the observation token o is generated by summing stochastic indicator variables I(j) for each adjacent cell j and for j = i, where the variable I(j) is equal to 1 (resp. 0) with probability .9 if the cell j contains a mine (resp. no mine). The resulting beam structure is similar to the one for Minesweeper except that the agent location belongs to all the beams and must be sampled. Results for this problem are shown Table 3 in a format similar to Table 2. This time, JT is feasible for the small instances only, and BP appears to be the best choice for approximating the marginals: it doesn t make many mistakes when labeling cells (with mines or not) and runs faster than AC methods that make many more mistakes.

8 Discussion We have introduced a formulation and algorithm PBT for tracking probabilistic beliefs in the presence of stochastic actions and sensors in the form of local belief factors that can be progressed independently in time when variables appearing in more than one beam are backward deterministic. In such a case, the local belief factors provide an exact decomposition of the joint distribution at any time point, and progressing the factors in time is exponential in the size of the beams. The beams are fully determined by the 2-DBN model structure, and are usually bounded and small. For computing marginal probabilities, however, the local belief factors need to be merged. This computation can be performed exactly

n n #p inf . %good %unknowns time / step time /exec

6 6 32 JT 97.3 0.6 12.1 1.6 0.2 0.0 73.6 1.0 6 6 32 BP 98.4 0.6 16.6 1.8 0.2 0.0 81.0 2.2 6 6 32 AC0 68.0 1.5 18.1 0.8 0.1 0.0 57.4 3.2 6 6 32 AC1 69.0 1.4 17.6 0.8 0.1 0.0 56.8 3.3

6 6 256 JT 97.9 0.6 8.5 1.4 1.7 0.0 515.9 5.9 6 6 256 BP 97.9 0.6 10.3 1.5 1.6 0.0 506.7 9.8 6 6 256 AC0 67.0 1.6 17.2 0.8 1.3 0.0 395.9 20.6 6 6 256 AC1 66.8 1.6 17.4 0.7 1.3 0.0 397.8 20.9

10 10 32 BP 98.9 0.5 33.9 1.3 0.8 0.0 814.6 8.3 10 10 32 AC0 55.6 0.9 23.7 0.6 0.9 0.0 852.3 53.3 10 10 32 AC1 55.0 0.9 22.5 0.6 0.9 0.0 883.9 48.7

10 10 256 BP 97.4 0.6 27.9 1.3 5.7 0.0 5, 326.6 48.9 10 10 256 AC0 54.7 1.1 23.8 0.6 7.0 0.3 6, 531.9 333.7 10 10 256 AC1 54.3 1.2 23.5 0.7 7.1 0.3 6, 643.6 307.2

Table 3: PBT in Minemapping using three methods for computing marginals. Figures are averages over 100 random executions. Each execution consists of roughly 5 times the number of cells in the grid.

using the jointree algorithm or approximately by belief propagation or other local consistency methods. In addition, when the beams share variables that are not backward deterministic, such variables must be sampled to make them backward deterministic given their sampled histories.

As far as we know, there are no other general, principled approaches for dealing effectively with problems such as Minemapping or even 1-line-3-SLAM. RB particle-ﬁltering methods would need to consider too many particles for making inference tractable in the ﬁrst problem, and would need to be programmed to exploit the resulting tractable factorization in the second. In our setting, this all follows from the problem structure and the general formulation. At the same time, decomposition methods that operate over disjoint factors result in poor approximations, and standard grid SLAM algorithms [Grisetti et al., 2005] involve a number of domain-dependent tricks that would not apply to more general problems, like the idea of associating one particular map to each particle, drastically simplifying the uncertainty about the map.

In the future, we want to develop new ideas for scaling PBT further so that it can be applied to more realistic SLAM problems. The bottleneck is not the progression of factors but the computation of marginals from factors that is done from scratch at every time point. We want to explore ways for making such computation incremental. Likewise, the performance of PBT needs to be compared with other approaches, in particular, approaches that also manage to exploit forms of context-speciﬁc independence and determinism [Vlasselaer et al., 2016].

Acknowledgments We thank the reviewers for useful comments. This work is partially funded by grant TIN2015-67959, MEC, Spain.

References [Amir and Russell, 2003] E. Amir and S. Russell. Logical

ﬁltering. In Proc. IJCAI, pages 75 82, 2003. [Bishop, 2007] C. M. Bishop. Pattern Recognition and Ma-

chine Learning. Springer, 2007.

[Bonet and Geffner, 2014] B. Bonet and H. Geffner. Belief

tracking for planning with sensing: Width, complexity and approximations. J. of AI Research, pages 923 970, 2014. [Boutilier et al., 1996] C. Boutilier, N. Friedman, M. Gold-

szmidt, and Da. Koller. Context-speciﬁc independence in Bayesian networks. In Proc. Int. Conf. Uncertainty in AI (UAI), pages 115 123, 1996. [Boyen and Koller, 1998] X. Boyen and D. Koller. Tractable

inference for complex stochastic processes. In Proc. UAI, pages 33 42, 1998. [Darwiche and Goldszmidt, 1994] A. Darwiche and M. Goldszmidt. On the relation between kappa calculus and probabilistic reasoning. In Proc. UAI, pages 145 153, 1994. [Darwiche, 2014] A. Darwiche. Modeling and Reasoning with Bayesian Networks. Cambridge, 2014. [Dean and Kanazawa, 1989] T. Dean and K. Kanazawa. A

model for reasoning about persistence and causation. Computational intelligence, 5(2):142 150, 1989. [Dechter and Mateescu, 2002] R. Dechter and R. Mateescu.

A simple insight into iterative belief propagation s success. In Proc. UAI, pages 175 183, 2002. [Dechter and Rish, 2003] R. Dechter and I. Rish. Minibuckets: A general scheme for bounded inference. Journal of the ACM (JACM), 50(2):107 153, 2003. [Dechter, 2003] R. Dechter. Constraint Processing. Morgan

Kaufmann, 2003. [Doucet et al., 2000] A. Doucet, N. De Freitas, K. Murphy,

and S. Russell. Rao-Blackwellised particle ﬁltering for dynamic bayesian networks. In Proc. UAI, pages 176 183, 2000. [Doucet, 1998] A. Doucet. On sequential simulation-based

methods for Bayesian ﬁltering. Ph D thesis, U. of Cambridge, 1998. [Goldszmidt and Pearl, 1996] M. Goldszmidt and J. Pearl.

Qualitative probabilities for default reasoning, belief revision, and causal modeling. Artiﬁcial Intelligence, 84(1):57 112, 1996. [Grisetti et al., 2005] G. Grisetti, C. Stachniss, and W. Burgard. Improving grid-based SLAM with Rao Blackwellized particle ﬁlters by adaptive proposals and selective resampling. In Proc. Robotics and Automation (ICRA), pages 2432 2437, 2005. [Kanazawa et al., 1995] K. Kanazawa, D. Koller, and S. Russell. Stochastic simulation algorithms for dynamic probabilistic networks. In Proc. UAI, pages 346 351, 1995. [Kaye, 2000] R. Kaye. Minesweeper is NP-Complete. Math-

ematical Intelligencer, 22(2):9 15, 2000. [Koller and Friedman, 2009] D. Koller and N. Friedman.

Probabilistic graphical models: Principles and techniques. MIT press, 2009. [Mackworth, 1977] A. Mackworth. Consistency in networks

of relations. Artiﬁcial intelligence, 8(1):99 118, 1977.

[Mooij, 2010] Joris M. Mooij. lib DAI: A free and open source C++ library for discrete approximate inference in graphical models. Journal of Machine Learning Research, 11:2169 2173, August 2010. [Murphy and Weiss, 2001] K. Murphy and Y. Weiss. The factored frontier algorithm for approximate inference in DBNs. In Proc. Int. Conf. Uncertainty in AI (UAI), pages 378 385, 2001. [Murphy, 1999] K. Murphy. Bayesian map learning in dy-

namic environments. In NIPS, pages 1015 1021, 1999. [Murphy, 2002] K. Murphy. Dynamic bayesian networks: representation, inference and learning. Ph D thesis, University of California, Berkeley, 2002. [Pearl, 1988] J. Pearl. Probabilistic Reasoning in Intelligent

Systems. Morgan Kaufmann, 1988. [Spohn, 1988] W. Spohn. A general non-probabilistic theory

of inductive reasoning. In Proceedings UAI, pages 315 322, 1988. [Thrun et al., 2001] S. Thrun, D. Fox, W. Burgard, and

F. Dellaert. Robust monte carlo localization for mobile robots. Artiﬁcial intelligence, 128(1):99 141, 2001. [Vlasselaer et al., 2016] J. Vlasselaer, W. Meert, G. Van den

Broeck, and L. De Raedt. Exploiting local and repeated structure in dynamic bayesian networks. Artiﬁcial Intelligence, 232:43 53, 2016.