# adaptive_stochastic_optimization_from_sets_to_paths__21fc295e.pdf

Adaptive Stochastic Optimization: From Sets to Paths

Zhan Wei Lim David Hsu Wee Sun Lee Department of Computer Science, National University of Singapore

{limzhanw,dyhsu,leews}@comp.nus.edu.sg

Adaptive stochastic optimization (ASO) optimizes an objective function adaptively under uncertainty. It plays a crucial role in planning and learning under uncertainty, but is, unfortunately, computationally intractable in general. This paper introduces two conditions on the objective function, the marginal likelihood rate bound and the marginal likelihood bound, which, together with pointwise submodularity, enable efﬁcient approximate solution of ASO. Several interesting classes of functions satisfy these conditions naturally, e.g., the version space reduction function for hypothesis learning. We describe Recursive Adaptive Coverage, a new ASO algorithm that exploits these conditions, and apply the algorithm to two robot planning tasks under uncertainty. In contrast to the earlier submodular optimization approach, our algorithm applies to ASO over both sets and paths.

1 Introduction

A hallmark of an intelligent agent is to learn new information as the world unfolds and to improvise by fusing the new information with prior knowledge. Consider an autonomous unmanned aerial vehicle (UAV) searching for a victim lost in a jungle. The UAV acquires new information on the victim s location by scanning the environment with noisy onboard sensors. How can the UAV plan and adapt its search strategy in order to ﬁnd the victim as fast as possible? This is an example of stochastic optimization, in which an agent chooses a sequence of actions under uncertainty in order to optimize an objective function. In adaptive stochastic optimization (ASO), the agent s action choices are conditioned on the outcomes of earlier choices. ASO plays a crucial role in planning and learning under uncertainty, but it is, unfortunately, computationally intractable in general [5].

Adaptive submodular optimization provides a powerful tool for approximate solution of ASO and has several important applications, such as sensor placement, active learning, etc. [5]. However, it has been so far restricted to optimization over a set domain: the agent chooses a subset out of a ﬁnite set of items. This is inadequate for the UAV search, as the agent s consecutive choices are constrained to form a path. Our work applies to ASO over both sets and paths.

Our work aims to identify subclasses of ASO and provide conditions that enable efﬁcient nearoptimal solution. We introduce two conditions on the objective function, the marginal likelihood rate bound (MLRB) and the marginal likelihood bound (MLB). They enable efﬁcient approximation of ASO with pointwise submodular objective functions, functions that satisfy a diminishing return property. MLRB is different from adaptive submodularity; we prove that adaptive submodularity does not imply MLRB and vice versa. While there exist functions that do not satisfy either the adaptive submodular or the MLRB condition, all pointwise submodular functions satisfy the MLB condition, albeit with different constants.

We propose Recursive Adaptive Coverage (RAC), a polynomial-time approximation algorithm that guarantees near-optimal solution of ASO over either a set or a path domain, if the objective function satisﬁes the MLRB or the MLB condition and is pointwise monotone submodular. Since MLRB differs from adaptive submodularity, the new algorithm expands the set of problems that admit efﬁcient approximate solutions, even for ASO over a set domain. We have evaluated RAC in simulation on two robot planning tasks under uncertainty and show that RAC performs well against several commonly used heuristic algorithms, including greedy algorithms that optimize information gain.

2 Related Work

Submodular set function optimization encompasses many hard combinatorial optimization problems in operation research and decision making. Submodularity implies a diminishing return effect where adding an item to a smaller set is more beneﬁcial than adding the same item to a bigger set. For example, adding a new temperature sensor when there are few sensors helps more in mapping temperature in a building than when there are already many sensors. Submodular functions can be efﬁciently approximated using a greedy heuristic [11]. Recent works have incorporated stochasticity to submodular optimization [1, 5] and generalized the problem from sets optimization to path optimization [2].

Our work builds on progress in submodular optimization on paths to solve the adaptive stochastic optimization problem on paths. Our RAC algorithm share a similar structure and analysis as the RAId algorithm in [10] that is used to solve adaptive informative path planning (IPP) problems without noise. In fact, noiseless adaptive IPP is a special case of adaptive stochastic optimization problems on paths that satisﬁes the marginal likelihood rate bound condition. We can derive the same approximation bound using the results in Section 6. Both works are inspired by the algorithm in [8] used to solve the Adaptive Traveling Salesperson (ATSP) problem. In the ATSP problem, a salesperson has to service a subset of locations with demand that is not known in advance. However, the salesperson knows the prior probabilities of the demand at each location (possibly correlated) and the goal is to ﬁnd an adaptive policy to service all locations with demand.

Adaptive submodularity [5] generalizes submodularity to stochastic settings and gives logarithmic approximation bounds using a greedy heuristic. It was also shown that no polynomial time algorithm can compute approximate solution of adaptive stochastic optimization problems within a factor of O(|X|1 ) unless PH = Pp

2, that is the polynomial-time hierarchy collapses to its second level [5]. Many Bayesian active learning problems can be modeled by suitable adaptive submodular objective functions [6, 4, 3]. However, [3] recently proposed a new stochastic set function for active learning with a general loss function that is not adaptive monotone submodular. This new objective function satisﬁes the marginal likelihood bound with nontrivial constant G.

Adaptive stochastic optimization is a special case of the Partially Observable Markov Decision Process (POMDP), a mathematical principled framework for reasoning under uncertainty [9]. Despite recent tremendous progress in ofﬂine [12] and online solvers [14, 13], most partially observable planning problems remain hard to solve.

3 Preliminaries

We now describe the adaptive stochastic optimization problem and use the UAV search and rescue task to illustrate our deﬁnitions. Let X be the set of actions and let O be the set of observations. The agent operates in a world whose events are determined by a static state called the scenario, denoted as φ : X ! O. When the agent takes an action x 2 X, it receives an observation o = φ(x) 2 O that is determined by an initially unknown scenario φ. We denote a random scenario as Φ and use a prior distribution p(φ) := P[Φ = φ] over the scenarios to represent our prior knowledge of the world.

For e.g., in the UAV task, the actions are ﬂying to various locations, observations are the possible sensors readings, and a scenario is a victim s position. When the UAV ﬂies to a particular location x, it observes its sensors readings o that depends on actual victim s position φ. Prior knowledge about the victim s position can be encoded as a probability distribution over the possible victim s positions.

After taking actions x1, x2, . . . and receiving observations o1, o2, . . . after each action, the agent has a history = {(x1, o1), (x2, o2), . . . }. We say that a scenario φ is consistent with a history when the actions and corresponding observations of the history never contradict with the φ, i.e. φ(x) = o for all (x, o) 2 . We denote this by φ . We can also say that a history 0 is consistent with another history if dom( 0) dom( ) and 0(x) = (x) for all x 2 dom( ), where dom( ) is the set of actions taken in . For example, a victim s position φ has not been ruled out given the sensors readings at various locations when φ .

An agent s goal can be characterized by a stochastic set function f : 2X OX ! R, which measures progress toward the goal given the actions taken and the true scenario. In this paper, we assume that f is pointwise monotone on ﬁnite domain. i.e., f(A, φ) f(B, φ) for any φ and for

all A B X. An agent achieves its goal and covers f when f has maximum value after taking actions S X and given it is in scenario φ, i.e., f(S, φ) = f(X, φ). For example, the objective function can be the sum of prior probabilities of impossible victim s positions given a history. The UAV ﬁnds the victim when all except the true victim s position are impossible.

An agent s strategy for adaptively taking actions is a policy that maps a history to its next action. A policy terminates when there is no next action to take for a given history. We say that a policy covers the function f when the agent executing always achieves its goal upon termination. That is, f(dom( ), φ) = f(X, φ) for all scenarios φ , where is the history when the agent executes . For example, a policy tells the UAV where to ﬂy to next given the locations visited and whether it has a positive sensor at those locations or not and it covers the objective function when the UAV executing it always ﬁnd the victim.

Formally, an adaptive stochastic optimization problem on paths consists of the tuple (X, d, p, O, r, f), the set of actions X is the set of locations the agent can visit, r is the starting location of the agent, and d is a metric that gives the distance between any pair of locations x, x0 2 X. The cost of the policy , C( , φ), is the length of the path starting from location r traversed by the agent until the policy terminates, when presented with scenario φ, e.g., the distance traveled by UAV executing policy for a particular true victim position. We want to ﬁnd a policy that minimizes the cost of traveling to cover the function. We formally state the problem: Problem 1. Given an adaptive stochastic optimization problem on paths I = (X, d, p, O, r, f), compute an adaptive policy that minimizes the expected cost

C( ) = E[C( , φ)] =

C( , φ)p(φ). (1)

subject to f(dom( ), φ0) = f(X, φ0), where is the history encountered when executing on φ0, for all φ .

Adaptive stochastic optimization problems on sets can be formally deﬁned by a tuple, (X, c, p, O, f). The set of actions X is a set of items that an agent may select. Instead of a distance metric, the cost of selecting an item is deﬁned by a cost function c : X ! R and the cost of a policy C( , φ) = P

x2S c(x), where S is the subset of items selected by when presented with scenario φ.

4 Classes of Functions

This section introduces the classes of objective functions for adaptive stochastic optimization problems and gives the relationship between them.

Given a ﬁnite set X and a function on subsets of X, f : 2X ! R, the function f is submodular if f(A) + f(B) f(A [ B) + f(A \ B) for all A, B X. Let f(S, φ) be a stochastic set function. If f(S, φ) is submodular for each ﬁxed scenario φ 2 OX, then f is pointwise submodular.

Adaptive submodularity and monotonicity generalize submodularity and monotonicity to stochastic settings where we receive random observations after selecting each item [6]. We deﬁne the expected marginal value of an item x given a history , 4(x| ) as: 4(x| ) = E [f(dom( ) [ {x}, Φ) f(dom( ), Φ) | Φ ] . A function f : 2X OX ! R is adaptive monotone with respect to a prior distribution p(φ) if , for all such that P[Φ ] > 0 and all x 2 X, it holds that 4(x| ) 0. i.e. the expected marginal value of any ﬁxed item is nonnegative. Function f is adaptive submodular with respect to a prior distribution p(φ) if, for all and 0 such that 0 and for all x 2 X\dom( 0), it holds that 4(x| ) 4(x| 0). i.e. the expected marginal value of any ﬁxed item does not increase as more items are selected. A function can be adaptive submodular with respect to a certain distribution p but not be pointwise submodular. However, it must be pointwise submodular if it is adaptive submodular with respect to all distributions.

We denote ˆf(S, ) = minφ f(S, φ) as the worst case value of f given a history and p( ) := P[Φ ] as the marginal likelihood of a history. The marginal likelihood rate bound (MLRB) condition requires a function f such that: For all 0 , if p( 0) 0.5p( ) then ,

Q ˆf(dom( 0), ) 1

Q ˆf(dom( ), )

except for scenarios already covered, where K > 1 and Q maxφ f(X, φ) is a constant upper bound for the maximum value of f for all scenarios.

Intuitively, this condition means that the worst case remaining objective value decreases by a constant fraction whenever the marginal likelihood of history decreases by at least half.

Example: The version space reduction function V with arbitrary prior is adaptive submodular and monotone [5]. Furthermore, it satisﬁes the MLRB. The version space reduction function V is deﬁned as:

V(S, φ) = 1

for all scenario φ, S X and φ(S) gives the history of visiting locations x in S when the scenario is φ. The version space reduction function is often used for active learning, where the true hypothesis is identiﬁed once all the scenarios are covered. We present the proof that the version space reduction function satisﬁes the MLRB condition (and all other proofs) in the supplementary material. Proposition 1. The version space function V satisﬁes the MLRB with constants Q = 1 and K = 2.

The following proposition teases apart the relationship between the MLRB condition and adaptive submodularity. Proposition 2. Adaptive submodularity does not imply the MLRB condition, and vice versa.

The marginal likelihood bound (MLB) condition requires that there exists some constant G, such that for all ,

f(X, φ) ˆf(dom( ), ) G p( ). (4)

In other words, the worst remaining objective value must be less than the marginal likelihood of its history multiplied by some constant G. Our quality of solution depends on the constant G. The smaller the constant G, the better the approximation bound.

We can make any adaptive stochastic optimization problem satisfy the MLB with a large enough constant G. To trivially ensure the bound of MLB, we set G = Q 1/δ, where δ = minφ p(φ). Hence, Q G p( ) unless we have visited all locations and covered the function by deﬁnition.

Example: The version space reduction function V can be interpreted as the expected 0 1 loss of a random scenario φ0 differing from true scenario φ. The loss is counted as one whenever φ0 6= φ. For example, a pair of scenarios that differ in observation at one location has the same loss of 1 as another pair that differs in all observations. Thus, it can be useful to assign different loss to different pair of scenarios with a general loss function. The generalized version space reduction function is deﬁned as: f L(S, φ) = Eφ0 [L(φ, φ0)1(φ(S) 6= φ0(S))] , where 1( ) is an indicator function and L : OX OX ! R 0 is a general loss function that satisﬁes L(φ0, φ) = L(φ, φ0) and L(φ, φ0) = 0 if φ = φ0. The generalized version space reduction function is not adaptive submodular [3] and does not satisfy the MLRB condition. However, it satisﬁes condition MLB with a non-trivial constant G. Proposition 3. The generalized version space reduction function f L satisﬁes MLB with G = maxφ,φ0 L(φ, φ0).

5 Algorithm

Adaptive planning is computationally hard due to the need to consider every possible observation after each action. RAC assumes that it always receive the most likely observation to simplify adaptive planning. RAC is a recursive algorithm that partially covers the function in each step and repeats on the residual function until the entire function is covered.

In each recursive step, RAC uses the mostly like observation assumption to transform adaptive stochastic optimization problem into a submodular orienteering problem to generate a tour and traverse it. If the assumption is true throughout the tour, then RAC achieves the required partial coverage. Otherwise, RAC receives some observation that has probability less than half (since only the most likely observation has probability at least half), the marginal likelihood of history decreases by at least half, and the MLRB and MLB conditions ensures that substantial progress is made towards covering the function.

Submodular orienteering takes a submodular function g : X ! R and a metric on X and gives the minimum cost path that covers function g such that g( ) = g(X). We now describe the submodular orienteering problem used in each recursive step. Given the current history , we construct a restricted set of location-observation pairs, Z = {(x, o) : (x, o) /2

, o is the most likely observation at x given }. Using ideas from [7], we construct a submodular function g

: 2Z ! R to be used in the submodular orienteering problem. Upon completion of the recursive step, we would like the function to be either covered or have value at least for all scenarios consistent with [Z0 where Z0 is the selected subset of Z. We ﬁrst restrict φ to a subset of scenarios that are consistent with . To simplify, we transform the function so that its maximum value for all φ is at least by deﬁning f (S, φ) = f(S, φ) + ( f(X, φ)) whenever f(X, φ) < and f (S, φ) = f(S, φ) otherwise. For Z0 Z, we now deﬁne g (Z0, φ) = f (dom( [ Z0), φ) if Z0 is consistent with φ and g (Z0, φ) = f (X, φ) otherwise. Finally, we construct the submodular function g

(Z0) = 1/| | P

φ2 min( , g (Z0, φ)). The constructions have the following properties that guarantees the effectiveness of the recursive steps of RAC. Proposition 4. Let f be a pointwise monotone submodular function. Then g is pointwise monotone submodular and g

is monotone submodular. In addition g

(Z0) if and only if f is either covered or have value at least for all scenarios consistent with [ Z0.

We can replace g

by a simpler function if f satisﬁes a minimal dependency property where the value of function f depends only on the history, i.e. f(dom( ), φ0) = f(dom( ), φ) for all φ, φ0 . We deﬁne a new submodular set function gm

(Z0) = g (Z0, Z [ ). Proposition 5. When f satisﬁes minimal dependency, gm

(Z0) implies g

RAC needs to guard against committing to costly plan made under the most likely observation assumption which is bound to be wrong eventually. RAC uses two different mechanisms for hedging. For MLRB, instead of requiring complete coverage, we solve partial coverage using a submodular path optimization problem g

(1 1/K)Q so that f(S) (1 1/K)Q for all consistent scenarios under the most likely observation assumption in each recursive step. For MLB, we solve submodular orienteering for complete coverage of g

Q but also solve for the version space reduction function with 0.5 as the target, V

0.5, as a hedge against over-commitment by the ﬁrst tour when the function is not well aligned with the probability of observations. The cheaper tour is then traversed by RAC in each recursive step.

We deﬁne the informative observation set x for every location x 2 X: x = { o | p(o|x) 0.5}. RAC traverses the tour and adaptively terminates when it encounters an informative observation. Subsequent recursive calls work on the residual function f 0 and normalized prior p0. Let be the history encountered so far just before the recursive call, for any set S dom( ) f 0(S, φ) = f(S, φ) f(dom( ), φ). We assume that function f is integer-valued. The recursive step is repeated until the residual value Q0 = 0. We give the pseudocode of RAC in Algorithm 1. We give details of SUBMODULARPATH procedure and prove its approximation bound in supplementary material.

Algorithm 1 RAC

procedure RECURSERAC(p, f, Q)

if maxφ2{φ0|p(φ0)>0} f(X, φ) = 0 then

return GENTOUR(p, f, Q) EXECUTEPLAN( ) p0 p( |φ)p(φ)

p( ) f 0 f(Y, φ) f( , φ) Q0 Q minφ f( , φ) for all φ RECURSERAC(p0, f 0, Q0) procedure EXECUTEPLAN( )

Visit next location x in and observe o. until o 2 x or end of tour. Move to location xt = r. return history encountered .

procedure GENTOUR(p, f, Q)

if f satisﬁes MLB then

f SUBMODULARPATH(g

Q) if maxφ p(φ) 0.5 then

vs SUBMODULARPATH(V

0.5) arg min f , vs(W( 0)) else

SUBMODULARPATH(g

return where = (x0, x1, . . . , xt) and x0 = xt = r

We give the performance guarantees for applying RAC to adaptive stochastic optimization problem on paths that satisfy MLRB and MLB.

Theorem 1. Assume that f is an integer-valued pointwise submodular monotone function. If f satisﬁes MLRB condition, then for any constant > 0 and an instance of adaptive stochastic optimization problem on path optimizing f, RAC computes a policy in polynomial time such that

C( ) = O((log|X|)2+ log Q log K Q)C( )),

where Q and K > 1 are constants that satisﬁes Equation (2). Theorem 2. Assume that prior probability distribution p is represented as non-negative integers with P

φ p(φ) = P and f is an integer-valued pointwise submodular monotone function. If f satisﬁes MLB, then for any constant > 0 and an instance of adaptive stochastic optimization problem on path optimizing f, RAC computes a policy for in polynomial time such that

C( ) = O((log|X|)2+ (log P + log Q) log G)C( ),

where Q = maxφ f(X, φ).

For adaptive stochastic optimization problems on subsets, we achieve tighter approximation bounds by replacing the bound of submodular orienteering with greedy submodular set cover. Theorem 3. Assume f is an integer-valued pointwise submodular and monotone function. If f satisﬁes MLRB condition, then for an instance of adaptive stochastic optimization problem on subsets optimizing f, RAC computes a policy in polynomial time such that

C( ) = 4(ln Q + 1)(log K Q + 1)C( ),

where Q and K > 1 are constants that satisﬁes Equation (2). Theorem 4. Assume f is an integer-valued pointwise submodular and monotone function and δ = minφ p(φ). If f satisﬁes MLB condition, then for an instance of adaptive stochastic optimization problem on subsets optimizing f, RAC computes a policy in polynomial time such that

C( ) = 4(ln 1/δ + ln Q + 2)(log G + 1)C( )),

where Q = maxφ f(X, φ).

7 Application: Noisy Informative Path Planning

In this section, we apply RAC to solve adaptive informative path planning (IPP) problems with noisy observations. We reduce an adaptive noisy IPP problem to an Equivalence Class Determination (ECD) problem [6] and apply RAC to solve it near-optimally using an objective function that satisﬁes MLRB condition. We evaluate this approach on two IPP tasks with noisy observations.

In an informative path planning (IPP) problem, an agent seeks a path to sense and gather information from its environment. An IPP problem is speciﬁed as a tuple I = (X, d, H, ph, O, Zh, r), the deﬁnitions for X, d, O, r are the same as adaptive stochastic optimization problem on path. In addition, there is a ﬁnite set of hypotheses, H, and a prior probability over them, p(h). We also have a set of probabilistic observation functions Zh = {Zx | x 2 X}, with one observation function Zx(h, o) = p(o|x, h) for each location x. The goal of IPP problem is to identify the true hypothesis.

7.1 Equivalence Class Determination Problem

An Equivalence Class Determination (ECD) problem consists of a set of hypotheses H and a set of equivalence classes {H1, H2, . . . , Hm} that partitions H. Its goal is to identify which equivalence class the true hypothesis lies in by moving to locations and making observations with the minimum expected movement cost. ECD problem has been applied to noisy Bayesian active learning to achieve near-optimal performance. Noisy adaptive IPP problem can also be reduced to an ECD instance when it is always possible to identify the true hypothesis in IPP problem.

To differentiate between the equivalence classes, we use the Gibbs error objective function (called the edge-cutting function in [6]). The idea is consider the ambiguities between pairs of hypotheses in different equivalence class, and to visit locations and make observations to disambiguate between them. The set of pairs of hypotheses in different classes is E = [1 i<j m{{h0, h00} : h0 2 Hi, h00 2 Hj}. We disambiguate a pair {h0, h00} when we make an observation o at a location x and either h0 or h00 is inconsistent with the observation, Z0

x(h0, o) = 0 or Z0

x(h00, o) = 0. The set of pairs disambiguated by visiting a location x when hypothesis h 2 H0 is true is given by

Long range sensor detects the survivor in the 3 3 area.

Short range sensor detects the survivor in current grid cell.

Starting location true target location

Figure 1: UAV Search and Rescue

Figure 2: Grasp the cup with a handle top, the side view (left) and the top view (right).

Ex(h) = {{h0, h00} : Z0

x(h, o) = 1, Z0

x(h0, o) = 0 or Z0

x(h00, o) = 0}. We deﬁne a weight function w : E ! R 0 as w({h0, h00}) = p0(h0) p0(h00). We can now deﬁne the Gibbs error objective function: f GE(Y, h) = W([x2Y Ex(h)), where W(E0) = P

e2E0 w(e), Y is the set of location visited and h 2 H0. Proposition 6. The Gibbs error function f GE is pointwise submodular and monotone. In addition, it satisﬁes condition MLRB with constants Q = W(E) = 1 Pm

i=1(p(Hi))2, the total weight of ambiguous pairs of hypotheses, and K = 2.

The ﬁrst step to reduce adaptive noisy IPP instance I to ECD instance E is to create a noiseless IPP problem I0 = (X, d, H0, p0, O, Z0, r) from a noisy IPP instance I = (X, d, H, p, O, Z, r) is by creating a hypothesis for every possible observation vector. Each hypothesis h0 2 H0 is an observation vector h0 = (o1, o2, . . . , o|X|) and the new hypothesis space H0 is H0 = O|X|. Next, for each hy-

pothesis hi 2 H, we create an equivalence class Hi =

(o1, o2, . . . , o|X|)

j=1 Zxj(hi, oj) > 0

that consists of all observation vectors h0 = (o1, o2, . . . , o|X|) 2 H0 that are possible with hypothesis Hi. When we can always identify the true underlying hypothesis h 2 H, the equivalence classes is a partition on the set H0.

7.2 Experiments

We evaluate RAC in simulation on two noisy IPP tasks modiﬁed from [10]. We highlight the modiﬁcations and give the full description in the supplementary material. In a variant of the UAV search and rescue task (see Figure 1), there is a safe zone (marked grey in Figure 1) where the victim is deemed to be safe if we know that he is in it. otherwise we need to know the exact location of the victim. The equivalence classes task are the safe zone and every location outside of it. Furthermore, the long range sensor may report the wrong reading with probability of 0.03.

In a noisy variant of the grasping task, the laser range ﬁnder has a 0.85 chance of detecting the correct discretized value x, 0.05 chance of 1 errors each, and 0.025 chance of 2 errors each. The robot gripper is fairly robust to estimation error of the cup handle s orientation. For each cup, we partition the cup handle orientation into regions of 20 degrees each. We only need to know the region that contains cup handle. The equivalence classes here are the regions. However, it is not always possible to identify the true region due to observation noise. We can still reduce to ECD problem by associating each observation vector to its most likely equivalence class.

We now describe our baselines algorithms. Deﬁne information gain to be reduction in Shannon entropy of the equivalence classes, the information gain (IG) algorithm, greedily picks the location that maximizes the expected information gain, where the expectation is taken over all possible observations at the location. To account for movement cost, the information gain (IG-Cost) algorithm greedily picks the location that maximizes expected information gain per unit movement cost. Both IG and IG-Cost do not reason over the long term but achieve limited adaptivity by replanning in each step. The Sampled-RAId algorithm is as described in [10].

We evaluate IG, IG-Cost, Sampled-RAId,and RAC with version space reduction (RAC-V) and Gibbs error (RAC-GE) objectives. RAC-GE has theoretical performance guarantees for the noisy adaptive

IPP problem. Under the MLRB condition, RAC-V can also be shown to have a similar performance bound. However RAC-GE optimizes the target function directly and we expect that optimizing the target function directly would usually have better performance in practice. Even though the version space reduction function and Gibbs error function are adaptive submodular, the greedy policy in [5] is not applicable as the movement cost per step depends on the paths and is not ﬁxed. If we ignore movement cost, a greedy policy on the version space reduction function is equivalent to generalized binary search, which is equivalent to IG [15] for the UAV task where the prior is uniform and there are two observations.

We set all algorithms to terminate when the Gibbs error of the equivalence classes is less than = 10 5. The Gibbs error corresponds to the exponentiated Rényi entropy (order 2) and also the prediction error of a Gibbs classiﬁer that predicts by sampling a hypothesis from the prior. We run 1000 trials with the true hypothesis sampled randomly from the prior for the UAV search task and 3000 trials for the grasping task as its variance is higher. For Sampled-RAId, we set the number of samples to be three times the number of hypothesis.

For performance comparison, we pick 15 different thresholds γ (starting from 1 10 5 and doubling γ each step) for Gibbs error of the equivalence classes and compute the average cost incurred by each algorithm to reduce Gibbs error to below each threshold level γ. We plot the average cost with 95% conﬁdence interval for the two IPP tasks in Figures 3 and 4. For the grasping task, we omit trials where the minimum Gibbs error possible is greater than γ when we compute the average cost for that speciﬁc γ value. For readability, we omit results due to IG from the plots when it is worse than other algorithms by a large margin, which is all of IG in the grasping task. From our experiments, RAC-GE has the lowest average cost for both tasks at almost every γ. The RAC-V has very close results while the other algorithms, Sampled-RAId, IG-Cost and IG do not perform as well for both the UAV search and grasping task.

10 5 10 4 10 3 10 2 10 1 100

Gibbs Error

RAC-GE RAC-V Sampled-RAId IG

Figure 3: UAV search and rescue

10 5 10 4 10 3 10 2 10 1 100

Gibbs Error

IG-Cost RAC-V Sampled-RAId

Figure 4: Grasping

8 Conclusion

We study approximation algorithms for adaptive stochastic optimization over both sets and paths. We give two conditions on pointwise monotone submodular functions that are useful for understanding the performance of approximation algorithms on these problems: the MLB and the MLRB. Our algorithm, RAC, runs in polynomial time with an approximation ratio that depends on the constants characterizing these two conditions. The results extend known results for adaptive stochastic optimization problems on sets to paths, and enlarges the class of functions known to be efﬁciently approximable for both problems. We apply the algorithm to two adaptive informative path planning applications with promising results.

Acknowledgement This work is supported in part by NUS Ac RF grant R-252-000-587-112, National Research Foundation Singapore through the SMART Phase 2 Pilot Program (Subaward Agreement No. 09), and US Air Force Research Laboratory under agreement number FA238615-1-4010.

[1] Arash Asadpour, Hamid Nazerzadeh, and Amin Saberi. Stochastic submodular maximization.

In Internet and Network Economics, pages 477 489. 2008. [2] Gruia Calinescu and Alexander Zelikovsky. The polymatroid steiner problems. Journal of

Combinatorial Optimization, 9(3):281 294, 2005. [3] Nguyen Viet Cuong, Wee Sun Lee, and Nan Ye. Near-optimal Adaptive Pool-based Active

Learning with General Loss. In Proc. Uncertainty in Artiﬁcial Intelligence, 2014. [4] Nguyen Viet Cuong, Wee Sun Lee, Nan Ye, Kian Ming A. Chai, and Hai Leong Chieu. Active

Learning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion. In Advances in Neural Information Processing Systems (NIPS), 2013. [5] Daniel Golovin and Andreas Krause. Adaptive submodularity: Theory and applications in

active learning and stochastic optimization. J. Artiﬁcial Intelligence Research, 42(1):427 486, 2011. [6] Daniel Golovin, Andreas Krause, and Debajyoti Ray. Near-optimal bayesian active learning

with noisy observations. In Advances in Neural Information Processing Systems (NIPS), pages 766 774, 2010. [7] Andrew Guillory and Jeff Bilmes. Interactive submodular set cover. In International Confer-

ence on Machine Learning (ICML), Haifa, Israel, 2010. [8] Anupam Gupta, Viswanath Nagarajan, and R. Ravi. Approximation Algorithms for Optimal

Decision Trees and Adaptive TSP Problems. In Samson Abramsky, Cyril Gavoille, Claude Kirchner, Friedhelm Meyer auf der Heide, and Paul G. Spirakis, editors, Automata, Languages and Programming, number 6198 in Lecture Notes in Computer Science, pages 690 701. Springer Berlin Heidelberg, January 2010. [9] Leslie Pack Kaelbling, Michael. L Littman, and Anthony R. Cassandra. Planning and acting

in partially observable stochastic domains. Artiﬁcial Intelligence, 101:99 134, January 1998. [10] Zhan Wei Lim, David Hsu, and Wee Sun Lee. Adaptive informative path planning in metric

spaces. In Workshop on the Algorithmic Foundations of Robotics, 2014. [11] George L. Nemhauser, Laurence A. Wolsey, and Marshall L. Fisher. An analysis of approxima-

tions for maximizing submodular set functions I. Mathematical Programming, 14(1):265 294, 1978. [12] Sylvie C. W. Ong, Shao Wei Png, David Hsu, and Wee Sun Lee. Planning under uncertainty

for robotic tasks with mixed observability. Int. J. Robotics Research, 29(8):1053 1068, 2010. [13] David Silver and Joel Veness. Monte-Carlo Planning in Large POMDPs. Advances in Neural

Information Processing Systems (NIPS), 2010. [14] Adhiraj Somani, Nan Ye, David Hsu, and Wee Sun Lee. Despot: Online pomdp planning

with regularization. In Advances in Neural Information Processing Systems (NIPS), pages 1772 1780, 2013. [15] Alice X. Zheng, Irina Rish, and Alina Beygelzimer. Efﬁcient Test Selection in Active Diagno-

sis via Entropy Approximation. Proc. Uncertainty in Artiﬁcial Intelligence, 2005.