# on_expected_value_strong_controllability__e2ea146f.pdf

Journal of Artificial Intelligence Research 78 (2023) 849 - 900 Submitted 01/2023; published 11/2023

On Expected Value Strong Controllability

Niklas T. Lauffer NLAUFFER@BERKELEY.EDU University of California, Berkeley Berkeley, CA 94720

William B. Lassiter WLASSITER@GATECH.EDU Georgia Institute of Technology Atlanta, GA 30332

Jeremy D. Frank JEREMY.D.FRANK@NASA.GOV NASA Ames Research Center Moffett Field, CA 94035

The Probabilistic Simple Temporal Network with Uncertainty (PSTNU) is a variant of the Simple Temporal Network with Uncertainty (STNU) in which known probability distributions govern the timing of uncontrollable timepoints. Previous approaches to solving PSTNUs focus mininizing risk, that is, the probability of violating constraints. These approaches are not applicable in over-constrained controllability problems, when it is certain that all constraints can t be satisfied. We introduce the Weighted Probabilistic Simple Temporal Network with Uncertainty (WPSTNU), which extends the PSTNU by attaching a fixed value to the satisfaction of temporal constraints, and allows the schedule to violate some constraints in order to maximize the expected value of satisfying others. We study the problem of Expected Value Strong Controllability (Ev SC) of WPSTNUs, which seeks a fixed-time schedule maximizing the expected value of satisfied constraints. We solve the Ev SC problem using a mixed integer linear program (MILP) that bounds below the probability of satisfying constraints involving uncontrollable timepoints. While solving MILPs generally takes exponential time, we demonstrate our formulation s effective performance using scheduling problems derived from the HEATlab and MIT ROVERS data sets. We then show how to use this MILP to reschedule during execution, after time has passed and uncertainty is reduced. We describe different fixed-period rescheduling approaches, including time-based and event-based, and report on the most successful strategies compared to the expected value of the fixed schedule produced by the MILP. All of our methods are evaluated on problems with both symmetric and asymmetric (skewed) probability distributions. We show that periodically rescheduling improves the expected value when compared to the fixed schedule, and describe how the benchmark and skewness impact the schedule value improvement. The resulting analysis shows that solving Ev SC problems on WPSTNUs is a viable alternative to solving over-constrained controllability problems.

1. Introduction

Since its introduction by (Vidal & Ghallab, 1996) and (Vidal & Fargier, 1999), there has been considerable research in the area of controllability of temporal networks in the presence of uncertainty. Controllability asks: can events be executed while satisfying temporal constraints in the presence of uncertain outcomes? Many previously studied solutions to this problem use the notion of controllability of Simple Temporal Networks under Uncertainty (STNUs) at their core. An STNU consists of controllable timepoints, representing events under the control of an agent; uncontrollable timepoints, representing events that occur at a bounded but unknown time after execution of controllable

2023 The Authors. Published by AI Access Foundation under Creative Commons Attribution License CC BY 4.0.

LASSITER, LAUFFER, & FRANK

timepoints; and simple temporal constraints between events that bound above or below their separation. The solutions to such problems are strategies to execute all events under the control of the agent that ensure no constraints are violated, regardless of the outcomes of uncertain events. The simplest strategy is a fixed execution time for each controllable timepoint. Not all STNUs can be controlled this way; in some cases, the strategy requires conditioning the time a controllable timepoints is executed in response to the occurrence of an uncontrollable timepoint. More complex problems combine temporal constraints, uncertainty, and preferences. STNUs require all constraints to be satisfied regardless of the exact times when uncontrollable timepoints occur, and do not explicitly represent the probability distribution of uncontrollable timepoint occurrence. Uncertainty can be generalized so that the problem specifies probability distributions of event occurrence; these problems are referred to as Probabilistic Simple Temporal Networks (PSTNs). This leads to new risk-minimization and chance-constrained controllability problems. Minimizing risk means finding strategies that have minimum risk of any constraint violation. Chance constraints, on the other hand, allow some risk, as long as it does not exceed one (or possibly several) thresholds. Finally, some problems incorporate both uncontrollable and probabilistic constraints; these are referred to as PSTNUs. The implicit assumption behind problems studied previously is that it is possible to schedule in such a way that the constraints are satisfied, and the risk is acceptable. What happens when it is almost certain that a constraint will be violated? A review of the recent literature shows that many PSTNU instances in current benchmarks are either not controllable, or are controllable but with high risk. For instance, consider the CAR-SHARING benchmark of (Fang, Yu, & Williams, 2014): only 184 of 1800 instances are strongly controllable, about 10%. For the ROVERS benchmark of (Santana, Vaquero, Toledo, Wang, & Williams, 2016), optimistically, 2840 of 4380 PSTNUs are strongly controllable, but in 911 of these cases, at least one probabilistic duration is squeezed to a single value. A more accurate assessment is that 1929 instances are strongly controllable, or about 44%. Finally, if we consider the HEATlab analyzed in (Lund, Dietrich, Chow, & Boerkoel, 2017), at least 80% of the 540 instances are strongly controllable 1. Current approaches, particularly risk-bounding, do not adequately address the problem of uncontrollable PSTNUs. If the risk bound is too low, no strategy can be produced. If a risky strategy is produced, then an undesirable outcome at execution time is almost certain, and will violate a constraint, causing unexpected execution-time issues (e.g. freezing execution, damaging the system, etc.). Some problem formulations allow constraints to be relaxed using a cost function on the relaxation, in addition to risk bounding. When the existing set of constraints cannot be satisfied to produce control strategies, relaxing some constraints up-front may ensure controllability, but as we will describe further below, these approaches have a variety of drawbacks. An alternative solution to such over-constrained problems is to let the execution strategy try to satisfy as many constraints as possible. If some constraints are more important than others, then a natural optimization criterion for the strategy is to maximize the expected value of satisfied constraints. This new, unexplored problem blends several notions explored in the controllability literature to date. Accepting risk implies accepting outcomes that violate some constraints. Applying preferences to satisfied constraints suggests control of expected schedule quality based on past information and the probability and cost of future constraint violations.

1. Of the 540 HEATlab instances, 436 were solved by at least one of the algorithms tested by (Abrahams, Chu, Diehl, Knittel, Lin, Lloyd, Boerkoel, & Frank, 2019) which is roughly 80%. However, the algorithms in (Abrahams et al., 2019) do not prove problems are not SC.

ON EXPECTED VALUE STRONG CONTROLLABILITY

In this paper we extend PSTNUs to the Weighted Probabilistic Simple Temporal Network with Uncertainty (WPSTNU), and study the Expected Value Strong Controllability (Ev SC) problem on WPSTNUs, that of finding a fixed schedule maximizing the expected value, and compare this problem to previous approaches. The outline of the paper is as follows. We begin with preliminary notation and definitions in 2. We then provide motivation for the expected value formulation, illustrated by running examples in 3. We formally define the WPSTNU and the Ev SC problem in 4. In 5 we describe our solution to the Ev SC problem. We compute a strong lower bound for the probability that a given schedule satisfies a particular constraint on an uncontrollable timepoint, which we call at-risk (AR) constraints. Computing these lower bounds for all AR constraints, in turn, yields a lower bound on the schedule s expected value that we can use as the objective value for a Mixed Integer Linear Program (MILP). Because the satisfaction of certain constraints may be essential to a plan s success, only a preordained subset of the constraints are considered for removal; the rest of the constraints must necessarily be satisfied. The value of these rejectable constraints must be traded against the expected value of satisfying AR requirement constraints. In 6, we evaluate the performance of this formulation on benchmark instances derived from the HEATlab benchmark of (Lund et al., 2017) and the MIT ROVERS benchmark of (Santana et al., 2016). We construct new benchmarks from these instances by adding preferences to constraints, reducing the makespan, and adding more rejectable constraints to force tradeoffs among the constraints that are satisfied. We show that the MILP removes low-value constraints to improve the value of high-value AR constraints consistent with their value, and achieves close to optimal performance, despite the bounding approximation. While solving the MILP is exponential in the number of timepoints in the WPSTNU, we evaluate the empirical performance of Gurobi on the benchmark problems, and show most instances are solvable in under half a second; the hardest problems take under two minutes, which is comparable to previous results. We empirically quantify the error of the expected value produced by this MILP and show it is very small. Our benchmarks include both symmetric and skew probability distributions; to our knowledge, this is the first work to characterize the impact of skewness on any PSTNU variant. In 7, we demonstrate the value of rescheduling periodically during execution, after time has passed and uncertainty is reduced, thereby improving the expected value. We define different fixedperiod rescheduling approaches, including time-based and event-based, and formally prove that these rescheduling policies improve expected value compared to the fixed schedule produced by the MILP. For this study, we use both instances derived from the ROVERS benchmark as well as the HEATlab benchmark, and describe the difference in performance of rescheduling on these two benchmarks. We also compare the difference in rescheduling policy performance on symmetric and skew distributions for the ROVER benchmark, and describe the impact skewness has on improving schedule quality. We show that rescheduling is capable of improving the expected value by restoring as much as a third of the missing value, that is, the difference between the total achievable value and the expected value of the fixed schedule for HEATlab instances, and 10% for ROVERS instances. We also show that problems with skewed probability distributions decrease the restored missing value compared to the restoration of value achieved for problems with normal distributions. Finally, in 8, we review of our findings on this new class of problems, and describe future work.

LASSITER, LAUFFER, & FRANK

2. Notation and Definitions

In this section we will define controllability problems previously considered in the literature, as well as the notation we will use in this paper.

Definition 1 (STN). (Dechter, Meiri, & Pearl, 1991) Simple Temporal Networks (STNs) consist of real-valued timepoints T with domain ti T= R. and constraints c(ti, tj) of the form (tj ti) [lti,tj, uti,tj].

Definition 2 (STNU). (Vidal & Ghallab, 1996) (Vidal & Fargier, 1999) (Muscettola, Morris, & Vidal, 2001) Simple Temporal Networks with Uncertainty (STNUs) consist of:

A set of controllable timepoints A = {a1, a2, . . . ak}, i.e. those assigned by the agent;

A set of uncontrollable timepoints R = {r1, r2, . . . rℓ} i.e. those assigned by the external world whose values v(ri) are unknown prior to execution;

A set C of requirement constraints c(ti, tj), where each c(ti, tj) has the form (tj ti) [lti,tj, uti,tj];

A set G of contingent constraints, where each g(ai, rj) has the form (rj ai) [lai,rj, uai,rj] with ai A, rj R. We refer to ai as the activation timepoint of the contingent constraint. Denote the execution time of ai by x(ai). Uncontrollable timepoint rj occurs at v(rj) = x(ai) + ωj, with ωj lai,rj, uai,rj but ωj, and thus v(rj), are only observed during execution.

An STNU is a 4-tuple <A,R,C,G >. We denote the set of all timepoints as T= A R; the domain of each ti T is R.

STNUs include only contingent constraints on timepoints, without any probability distribution information available; hence the problem requires building a conformant plan that works regardless of the outcomes of all contingent links.

Definition 3 (Strong Controllability). (Vidal & Fargier, 1999) Let U be an STNU. Let Ω= g(ai,rj) [lai,rj, uai,rj] (the cross product of all possible outcomes of all contingent constraints). A schedule s is an assignment of times to ai A. Denote the value of ai in a schedule s by s(ai). U is Strongly Controllable (SC) if there is a schedule s such that for all realizations of uncertainty ω Ω, s satisfies all requirement constraints c(ti, tj).

To illustrate the ingredients of STNUs, consider a problem consisting of constraints over three timepoints, two controllable timepoints a1 and a2, and one uncontrollable timepoint r1. We have one contingent constraint g(a1, r1) with bounds [1, 4], meaning r1 occurs between 1 and 4 time units after a1 executes. We have two requirement constraints, c(a1, a2) with bounds [0, 5], and c(r1, a2) with bounds [0, 3]. Let schedule s assign s(a1)= 0 and s(a2)= 4. This assignment satisfies c(a1, a2) since 4 1 [0, 5]. We observe that v(r1) [1, 4]. Finally, we observe that 4 1 = 3 and 4 4 = 0, proving all possible values v(r1) satisfy c(r1, a2), i.e. s(a2) v(r1) [0, 3]. Thus, our example STNU is strongly controllable. We hereafter refer to requirement constraints between two controllable timepoints as the Simple Temporal Network (STN) constraints, denoted Cs, since they are the type of constraint found in

ON EXPECTED VALUE STRONG CONTROLLABILITY

an STN, and those involving one or more uncontrollable timepoints as At-Risk (AR) constraints, denoted Cu. AR constraints are canonically denoted c(ri, aj), that is, c(ri, aj) bounds s(aj) v(ri) to the range [lri,aj, uri,aj]. If uri,aj > 0, then aj can only be executed after ri; If lri,aj < 0, then aj can be preemptively executed before ri occurs. Modeling uncertainty using STNUs is rather restrictive, for STNUs do not incorporate information about the likelihood of the different outcomes. Suppose in the simple example above we know r1 occurs on average 2.5 time units after a1 with standard deviation of 0.5, distributed normally. STNUs can t express this knowledge, and thus it can t be used in generating schedules. Introducing continuous probabilities into controllability problems comes with some extra complexity; schedules may not account for all outcomes, and thus scheduling problems need to be extended with a notion of scheduling risk. These concepts are introduced in the following definitions.

Definition 4 (PSTN(U)). (Tsamardinos, 2002), (Santana et al., 2016) Let V be a PSTNU. Let <A,R,C,G,> be an STNU. A probabilistic duration constraint d(ai, rj) has the form rj ai = ω Ωai,rj where ai A, rj R, and Ωai,rj is a random variable with probability density function Pai,rj. Let D be a set of probabilistic duration constraints d(ai, rj). A Probabilistic Simple Temporal Network (PSTN) is a 4-tuple <A,R,C,D>. As with STNUs, we refer to ai as the activation timepoint of the duration constraint. A Probabilistic Simple Temporal Network with Uncertainty (PSTNU) is a 5-tuple <A,R,C,G,D>.

In the sequel, we will assume w.l.o.g. that there is a bijection between probabilistic duration constraints and activation timepoints, so that for notational simplicity we can use a single index for the controllable activation timepoint, uncontrollable timepoint, and random variable involved in a probabilistic duration constraint, i.e. d(ai, ri) ri ai Ωi, where Ωi has density function Pi. Similarly, we will assume w.l.o.g. that there is a bijection between contingent links and uncontrollable timepoints, allowing the bounds of g(ai, ri) to be denoted by [lri, uri]. With the introduction of PSTNUs, we note that AR constraints are now split into probabilistic AR constraints, involving uncontrollables in probabilistic duration constraints, and uncertain AR constraints, those involving uncontrollables in contingent links with no probability distribution information. Risk, as introduced by (Fang et al., 2014), describes the probability that, given a schedule or strategy, an outcome ω Ωviolates one or more constraints. Typical approaches transform a PSTNU into an STNU by truncating the probability distributions to create bounded intervals, and then evaluate STNU controllability. To compute the risk of a solution to a PSTNU, we measure how much probability mass on each probabilistic duration is not covered after truncating the probability distributions tails and throwing away the probability distribution, in order to transform it into a contingent link, i.e. transforming d(ai, ri) to g(ai, ri), in a manner similar to (Santana et al., 2016). A smaller interval for the contingent link is more likely to lead to an SC STNU, but incurs more risk. The definitions below bound above the risk, because our definition of PSTNU does not assume that the probabilities Pi(ωi) are mutually independent.

Definition 5. Let ρd:D G transform a probabilistic duration constraint d(ai, ri) into a contingent link g(ai, ri) with bounds [lri, uri] Ωi. Let ρD = d D(ρd), that is, ρD contains one squeezing operation ρd for each probabilistic duration. Let V be a PSTNU. Then ρD(V ) = U where U is an STNU.

Definition 6. Let V be a PSTNU. Let U = ρD(V ) be an STNU derived from V . Let d D and let ρd(d(ai, ri)) = g(ai, ri). Let [lri, uri] be the contingent constraint interval defined by g(ai, ri). The risk of d(ai, ri) relative to ρd, denoted δ(ρd(d(ai, ri))), is

LASSITER, LAUFFER, & FRANK

δ(ρd(d(ai, ri))) = 1 Z uri

The risk of V relative to ρD, denoted δ(V, ρD), is bounded above by

d D (1 δ(ρd(d(ai, ri))))

Definition 7. A PSTNU V is SC with risk if there exists a mapping ρD such that δ(P, ρD) and U = ρD(V ) is SC.

Suppose we have a PSTNU V over three timepoints with a duration constraint d(a1, r1), and two requirement constraints, c(a1, a2) with bounds [0, 5], and c(r1, a2) with bounds [0, 3]. Suppose the probability P1 associated with d(a1, r1) is normally distributed with µ = 2.5, σ = 0.5. Now suppose U = ρD(V ) truncates both tails of P1, with contingent constraint g(a1, r1) with bounds [1, 4], recovering the strongly controllable STNU example above. The risk that the outcome ω = r1 a1 lies outside the bounds 1 ω 4 is 0.0027. The risk varies with both parameters of the normal. For example, increasing variance σ to 1 increases the risk to 0.13362. Similarly, moving the mean or modifying the distribution to be a skewed distribution will change the risk.

3. The Case For Expected Value

Prior to formally introducing the notion of a WPSTNU, we present a pair of examples in this section to motivate the viewing of PSTNs, and controllability problems, through the lens of expected value.

3.1 A Running Example: The Dust Devil

First, consider a planetary exploration rover with two imaging goals: one is a dynamic phenomenon of uncertain duration (a dust devil), and the second is a static target at the same location with a constraint relating to ideal lighting conditions. The dust devil image collection task spans 15 minutes, the subsequent image collection 10 minutes, and the drive to a position from which the dust devil and static target can be imaged takes 45 minutes. We assume the drive starts at time 0. The dust devil is expected to last between 40 and 70 minutes after the drive starts, ideally ending during the first imaging task. The ideal lighting window for the second image collection occurs between 60 and 70 minutes after starting the drive. The PSTN (there are no contingent links in this example) in Figure 1 depicts this scenario. We first observe that the STNU obtained by transforming d(a0, r0) into a contingent constraint g(a0, r0) = [45, 60] is strongly controllable: equating a0 with time 0, the assignment s with s(a1) = s(a2) = 45, s(a3) = s(a4) = 60, and s(a5) = 70 satisfies all STN constraints, and satisfies AR constraint c(r0, a3) for all outcomes 45 v(r0) 60. According to Definition 6, the risk associated with this transformation is simply the probability that v(r0) = s(a0) + ω0 falls outside of this prescribed range. Since there is only one AR constraint in the problem, for the purposes of this discussion, we denote the risk by δ(45) (parameterizing the risk δ with the start time of the projected contingent link) instead of δ(ρd(d(ai, ri))). Thus, the risk is given by

δ(45) = 1 Z 60

ON EXPECTED VALUE STRONG CONTROLLABILITY

g(a0,r0)=[45,60]: Controllable STNU, c(a0,a5) satisﬁed

Ideal Lighting

Dust Devil Duration 45,45

40 50 60 70

Introducing constraint values qc(a0,a5), qc(r0,a3) permits evaluating tradeoﬀ:

reduce risk, reject c(a0,a5)=[60,70]

40 50 60 70

g(a0,r0)=[50,65]: Uncontrollable STNU, c(a0,a5) unsatisﬁed

g(a0,r0)=[50,65], Controllable STNU if c(a0,a5) rejected

qc(a0,a5)<qc(r0,a3) (δ(50)-δ(45))

40 50 60 70

risk: δ(50) < δ(45)

Figure 1: A controllability problem demonstrating the tradeoff of risk for sacrificing constraints. The controllability problem is shown at the top; arcs ai = ri are probabilistic durations, arcs ti tj are requirement constraints. The probability of satisfying c(r0, a3) given two different assignments, s(a2) = 45 and s (a2) = 50, is shown on the left. Sacrificing STN constraint c(a0, a5) can increase expected value if AR constraint c(r0, a3) has a high value, and the risk δ decreases sufficiently, as shown in the bottom box.

Suppose we instead convert d(a0, r0) into g(a0, r0) = [50, 65]. The schedule s with s (a1) = 45, s (a2) = 50, s (a3) = 65, s (a4) = 65, and s (a5) = 75 will assuredly fail to satisfy c(a0, a5), but the risk of failing to satisfy c(r0, a3) is now given by

δ(50) = (1 Z 65

50 P0(ω0)) < δ(45).

The resulting STNU is not strongly controllable, but (as informally illustrated by the shaded region of the probability distribution P0(ω0) in Figure 1), we stand a better chance of our outcome ω0 falling within the prescribed range. Removing constraint c(a0, a5) from the PSTN would result in a strongly controllable STNU with less risk. If maximizing the likelihood of imaging the dissipation of the dust devil is significantly more important to us than capturing the second image under ideal lighting conditions, it may be worth sacrificing the ideal lighting constraint in favor of an increased probability that we successfully image the end of the dust devil. PSTNUs focus on minimizing risk; all requirement constraints must be satisfied. By definition, there is no way to quantify the benefit of satisfying or rejecting requirement constraints in order to change the risk. Assigning a numerical value to each constraint c(ti, tj) would allow us to evaluate the trade-offs between schedules that satisfy, or may satisfy, different sets of constraints. Constraints guaranteed to be satisfied by schedules contribute known value, while constraints that may be satisfied with some probability contribute expected value to a schedule. Only by doing this will we be able to evaluate whether or not it is better to reject the ideal lighting constraint and try to image the dust devil.

LASSITER, LAUFFER, & FRANK

3.2 A Second Example: Trading Risk

0 1 2 3 4 5 6

0 1 2 3 4 5 6

s(a3)=4 minimizes δ(휌d(d(a2,r2)))

0 1 2 3 4 5 6

s (a3)=4.5: δ(휌d(d(a1r1)))=δ(휌d(d(a2,r2)))

Introducing constraint values

qc(r1,a3), qc(r2,a3)

permits evaluating tradeoﬀ:

minimize δ(휌d,d(a2,r2))

qc(r1,a3)<qc(r2,a3)

Schedule s:

Schedule s :

0 1 2 3 4 5 6

Figure 2: A controllability problem demonstrating the tradeoff of AR constraint risks. Schedule s chooses s(a3) = 4 in order to maximize the probability of satisfying AR constraint c(r2, a3). Schedule s chooses s (a3) = 4.5 in order to minimize the risk. The concept of risk fails to properly encapsulate the trade-offs between s and s . However, by introducing constraint valuations, they can be compared using expected value.

The previous example illustrates how sacrificing an STN constraint might increase the expected value of an AR constraint, improving the expected value for the problem as a whole. In order to more closely examine the sorts of problems in which a risk-bounding approach may not produce the most practical or effective schedule, we analyze a second scenario involving a tradeoff between risk for a pair of AR constraints. Consider a situation in which two probabilistic durations d(a1, r1), d(a2, r2) are constrained to start at the same time, and whose uncontrollable endpoints are constrained by a common end timepoint a3 via constraints c(r1, a3) and c(r2, a3). The two uncertain durations are known to be normally distributed with µ = 2 and 3 respectively, and unit variance. We examine two potential schedules for this scenario: a schedule s that attempts to maximize the probability of satisfying AR constraint c(r2, a3) by scheduling a3 at time s(a3) = 4, and a schedule s that minimizes risk by evenly distributing the probability of satisfying either at-risk constraint by scheduling a3 at time s (a3) = 4.5. We now examine these two schedules in terms of risk. The risk for schedule s, given by

1 (1 δ(ρd(d(a1, r1))))(1 δ(ρd(d(a2, r2)))) = 1 (0.4773)(0.6827) = 0.6742,

is greater than that of s , given by

1 (1 δ (ρd(d(a1, r1))))(1 δ (ρd(d(a2, r2)))) = 1 (0.6247)(0.6247) = 0.6098.

However, if c(r2, a3) is more valuable than c(r1, a3), then schedule s may still be preferable. Risk-based approaches would only view schedule s as preferable to s if constraint c(r1, a3) was

ON EXPECTED VALUE STRONG CONTROLLABILITY

relaxed or removed from the problem altogether. In the latter case, the risks of s and s would simply be their probabilities of not satisfying c(r2, a3), given by 1 0.6827 = 0.3173 and 1 0.6247 = 0.3753, respectively. Examples like this show that even for controllable problems, i.e., those for which it is possible to satisfy all constraints, different valuations of constraints may lead to different schedules.

4. The WPSTNU

We now formalize the Weighted Probabilistic Simple Temporal Network with Uncertainty (WPSTNU) by adding constraint valuations qc(ti, tj), to a PSTNU, allowing us to define the Expected Value Strong Controllability (Ev SC) problem on WPSTNUs. While a similar Dynamic Controllability problem can also be formalized, for the remainder of the paper, we will focus on Expected Value Strong Controllability. We revisit our examples and describe how they can be posed and solved as WPSTNUs. We then describe previous work on variants of PSTNUs and discuss why these variants differ from the WPSTNU.

4.1 WPSTNUs and Ev SC: The Definitions

Definition 8 (WPSTNU). Let <A,R,C,G,D> be a PSTNU. We denote the set of AR constraints c(ri, aj) by Cu C and the set of STN constraints c(ai, aj) by Cs C. We further denote the rejectable STN constraints by Cr Cs. Let qc(ti, tj): Cr R+ and let Q be the set of all qc(ti, tj). A Weighted Probabilistic Simple Temporal Network (WPSTNU) is a 6-tuple <A,R,C,G,D,Q>.

Intuitively, qc(ti, tj) is the value of satisfying constraint c(ti, tj) Cr in the WPSTNU.

Definition 9. Let W be a WPSTNU. Let S be the set of all schedules, and let s S be a schedule satisfying all contingent constraints g G and all non-rejectable STN constraints in Cs \ Cr. Let Ω= iΩi (the cross product of all possible outcomes of all probabilistic duration constraints) with joint density function P and let ω Ωbe a joint outcome of all probabilistic durations. Let

σ(c(ti, tj), s, ω) =

( 1 if c(ti, tj) is satisfied by (s, ω), 0 otherwise,

Then the value of a schedule s given outcome ω Ωis

f(s, ω) = X

c C qc(ti, tj) σ(c(ti, tj), s, ω),

and the expected value of s is

ω Ω f(s, ω)P(ω).

The Expected Value Strong Controllability (Ev SC) problem defined over WPSTNU U is to find s S satisfying all contingent constraints and maximizing g(s).

In the definition above, σ determines whether a constraint c(ti, tj) is satisfied by schedule s and outcome ω. Satisfied constraints c(ti, tj) accrue value qc(ti, tj) in f(s, ω). The expected value of a schedule incorporates the probability of outcomes that influence the satisfaction of constraints on uncontrollable timepoints, e.g., qc(ri, aj). We will occasionally abuse notation and write

LASSITER, LAUFFER, & FRANK

σ(c(ai, aj), s) when σ is used to determine satisfaction of an STN constraint, and we don t need to refer to ω. The risk minimization setting assumes that violation of any single constraint would lead to failure in the schedule execution. By contrast, the Ev SC problem inherently assumes that violation of certain constraints, while not desirable, would not disturb the execution of the remainder of the schedule. For a fixed-time schedule s, an STN constraint in Cs either is satisfied and provides full value, or rejected and provides no value. By contrast, an AR constraint c(ri, aj) only has expected value prior to execution time. A disproportionately large assigned value qc(ti, tj) incentivizes satisfaction of STN constraints, or ensures a high probability of satisfying AR constraints. STN constraints that are critical to the integrity of the whole schedule (such as a drive for a rover) can be made non-rejectable.

4.2 Revisiting the Examples

We now look deeper at the fundamental tradeoff in Ev SC: sacrificing a constraint to improve the overall expected value of a schedule. In Figure 1, there is only AR constraint, namely c(r0, a3). As above, assume s(a0) = s (a0) = 0, and let s be a schedule in which s(a2) = 45 and s be a schedule in which s (a2) = 50. As noted previously, s satisfies all the STN constraints, but the range of outcomes satisfying c(r0, a3) is 45 < v(r0) 60, which has low probability. Schedule s violates a single constraint, namely, c(a0, a5), but changes the range of outcomes satisfying c(r0, a3) to 50 < v(r1) 65, increasing the probability. Committing to a schedule up-front that violates c(a0, a5) lets us increase the probability c(r0, a3) is satisfied, potentially increasing the expected value of the schedule. In order to make violating c(a0, a5) maximize the expected value, we would need the relative values of qc(r0, a3) and qc(a0, a5) to satisfy the inequality:

qc(r0, a3) Z 60

45 P0(ω0) + qc(a0, a5) < qc(r0, a3) Z 65

qc(a0, a5) < qc(r0, a3) Z 65

50 P0(ω0) Z 60

45 P0(ω0) .

Denote the difference in risk by

50 P0(ω0) Z 60

45 P0(ω0) ,

Then we have

qc(a0, a5) < qc(r0, a3).

If is small , the inequality above is only satisfied if qc(r0, a3) is large . Put another way, qc(r0, a3) needs to be at least a factor of 1 larger than qc(a0, a5) in order to justify violating c(a0, a5). The example in Figure 2 shows that PSTNUs cannot distinguish between multiple sources of risk. Approaching this problem from the perspective of expected value allows us to consider this trade-off in a more nuanced way. If we prefer satisfying constraint c(r2, a3) without altogether

ON EXPECTED VALUE STRONG CONTROLLABILITY

(implicitly) throwing out c(r1, a3), we could assign qc(r2, a3) = 3 and qc(r1, a3) = 1, respectively, so that the expected value of schedule s, given by

g(s) = 3(0.6827) + 1(0.4773) = 2.5253,

would make it preferable to schedule s with expected value

g(s ) = 3(0.6247) + 1(0.6247) = 2.4986.

4.3 Previous Work

In our first example, we would like to evaluate the trading of satisfaction of the lighting constraint c(a0, a5) with satisfaction of the risky dust devil observation constraint, c(r0, a3). The right strategy depends on the relative importance of satisfying c(r0, a3) and c(a0, a5), and the probability of satisfying c(r0, a3), which requires formulating the expected value of a schedule or strategy. In our second example, the addition of value on constraints transforms the tradeoff of risk, which might otherwise not favor one AR constraint over another, into an expected value problem. As we now explain, these types of tradeoffs are not easily captured by previous work on overconstrained PSTNUs, due to either the lack of constraint value or the way constraint values are used; nor is it easily captured by previously developed formalisms employing both constraints and preferences, because they lack explicit representation of probability. As noted in the definitions in 2, PSTNs were originally introduced by (Tsamardinos, 2002), with empirical studies of different algorithms performed in (Santana et al., 2016), and (Brooks, Reed, Gruver, & Boerkoel, 2015). The chance-constrained PSTN (called cc-p STPs in (Fang et al., 2014)) constrains risk instead of minimizing it. A single constraint on risk is introduced in (Fang et al., 2014); (Wang & Williams, 2015) introduces a cc-p STP variant in which there can be many risk constraints. When the risk bounds can t be satisfied, the resulting problem instance is deemed unsolvable. One previously explored approach for over-constrained PSTNs is to search over relaxations for a problem that can be transformed into a controllable STNU with some bounded risk; (Yu, Fang, & Williams, 2015; Yu, Williams, Fang, Cui, & Haslum, 2017) use this approach for over-constrained cc-p STPs. The search is guided by the costs of relaxations of either the requirement constraints or the chance constraint. Each relaxation has a cost; the approach involves finding a minimum-cost relaxation that leads to controllability of the underlying STNU. The WPSTNU is more limited than than relaxable cc-p STP, in that only rejection of requirement constraints is permitted, and constraints have scalar value instead of more general relaxation costs. In Figure 1, a relaxed constraint c(a0, a5) = [60, 75] (not shown) would lead to a controllable STNU with s(a2)= 55. Applying the notion of relaxable cc-PSTPs to Figure 2, one could either permit higher risk or widen the bounds of the constraints, and impose a larger cost for relaxation of (say) c(r1, a3) than c(r1, a3). However, the cc-PTSP requires both (relaxed) constraints, including the risk bound, to be satisfied at scheduling time. As we saw in Figure 1, the WPSTNU permits the expression of the expected value proposition by combining the scalar benefit of satisfying constraints with the calculated probability of success. We also observe that there is no easy way to capture the tradeoff in expected value using relaxation costs. To see why, observe that all risk is treated equally by the relaxable cc-PSTP approach; further, the relaxation costs to fix violations and risk allocation are treated separately by the solver. Thus, the WPSTNU complements the relaxable cc-p STP approach to handling over-constrained PSTNs.

LASSITER, LAUFFER, & FRANK

STN STNU PSTNU cc-p STP

STPPU WPSTNU

Probabilistic

Preferences

Preferences

Projections

Chance Constraints

costs Preferences

Probabilistic durations

Projections

Disjunction

Figure 3: The relationship between previous approaches to over-constrained controllability problems and WPSTNUs.

A related approach is the Controllable Conditional Temporal Problem with Uncertainty (CCTPU) of (Yu et al., 2015), which combines STNUs (without probabilities) with costs of relaxations, rewards for assignments, and conditional generation of events and constraints based on non-temporal finite-domain variables. WPSTNUs are similar to the CCTPU, in that we can choose which constraints to satisfy. WPSTNUs are more limited than CCTPUs in that every timepoint of a WPSTNU must be scheduled, relaxations are limited to the rejection of constraints, and WPSTNUs only represent expected value, not a combined cost-benefits analysis. WPSTNUs are more general than CCTPUs in that they include probabilities, not just qualitative uncertainty of uncontrollable events. The approaches above use relaxation costs as a means to enable controllability and satisfaction of risk bounds. Alternatively, as we do with the WPSTNU, the problem can be posed by using preferences on degree of satisfaction of STN constraints, which could then be traded against the expected value of satisfying AR constraints. This could be done using simple semi-convex preference functions, combined with min , to achieve tractability, as is done with the Simple Temporal Problem with Preferences and Uncerainty (STPPU) (Rossi, Venable, & Yorke-Smith, 2006). However, the assumptions that preserve tractability are too limiting; in particular, the min function will report the worst preference achieved for any constraint, which could be 0 (representing a violated constraint). WPSTNUs are a variant of the Disjunctive Temporal Problem with Preferences (DTPP) (Peinter, Moffitt, & Pollack, 2005), in which not all constraints can be satisfied, and the best set of constraints to satisfy must be found by search. The WPSTNU is a strict generalization of the DTPP; while the value of satisfying c(ai, aj) is captured by qc(ai, aj), the expected value of satisfying c(ri, aj) is a nontrivial function of timepoint assignments, rather than a constant associated with the disjunctive decisions. Each requirement constraint can be expressed as a disjunction where satisfying the trivial constraint has zero value and satisfying the original constraint has value qc(ti, tj). The expected value formulation is common in Markov Decision Processes (MDPs) and their numerous variants. While temporal MDPs have been considered, the continuous time nature of the state space precludes using formulations such as time-dependent MDPs (Boyan & Littman, 2000); the desire to express state spaces representing violated constraints makes other time-based MDP approaches, e.g. (Weld & Mausam, 2006), inappropriate. Figure 3 describes the relationship between previous approaches to over-constrained problems on temporal networks with uncertainty, and the newly defined WPSTNU.

ON EXPECTED VALUE STRONG CONTROLLABILITY

5. Solving the Ev SC Problem

We now describe how we will solve the Ev SC problem on WPSTNUs. Central to Ev SC is the ability to trade (possibly expected) value between constraints. As with previous approaches to PSTNs, the WPSTNU can be formulated as a nonlinear optimization problem:

max s g(s) = max s

c Cs σ(c(ai, aj), s, ω)qc(ri, aj) + X

c Cu σ(c(ri, aj)s, ω)qc(ri, aj)

(1) Since the value of σ(c(ai, aj)s, ω) only depends on s and not ω, this simplifies to:

c Cs σ(c(ai, aj)s)qc(ai, aj) + Z

c Cu σ(c(ri, aj)s, ω)qc(ri, aj)

(Fang et al., 2014) and (Wang & Williams, 2015) use nonlinear solvers to handle the allocation of risk. Such solvers are often slow and prone to numerical instability. Inspired by previous MILP formulations ((Santana et al., 2016) and (Lund et al., 2017)), we will solve the Ev SC problem by building a MILP whose solution is a schedule that is guaranteed to bound below the expected value of the optimal schedule. We first show how to process the probability distributions to build a piecewise linear bound on the probabilities of satisfying AR constraints. We then describe the rest of the MILP, and provide a bound on the MILP quality.

5.1 Bounding The Probability of Satisfaction

Clearly if there is no schedule satisfying all non-rejectable constraints in Cs \ Cr then maximizing the expected value is pointless, so we assume there is such a schedule (which can be checked easily). Solving an Ev SC problem must strike a balance between satisfying the STN constraints in Cr (thus surely obtaining their value) and maximizing the chances of satisfying high-value (but uncertain) AR constraints. An optimal schedule assigns timepoints to maximize the value we expect to obtain from the constraints in Cu, and the guaranteed value of the satisfied subset C r Cr of rejectable constraints. We need to explicitly represent the probability that an uncontrollable event s actual time of occurrence, which is v(ri) = s(ai) + ωi, ωi Ωi, leads to a violation of AR constraint c(ri, aj) with bounds [lri,aj, uri,aj]. This means we can t directly use δ(ρd(d(ai, ri))) from Definition 6, because this definition only computes the risk by measuring how much of the probability mass of d(ai, ri) is covered by g(ai, ri) = ρd(d(ai, ri)). When computing the expected values, it is helpful to imagine a triangle consisting of a (possibly implied) STN constraint c(ai, aj), a probabilistic duration constraint d(ai, ri), and an AR constraint c(ri, aj) as shown in Figure 4. Given a schedule s, regardless of whether or not s satisfies c(ai, aj), outcome v(ri) = s(ai) + ωi satisfies the constraint c(ri, aj) if

lri,aj s(aj) v(ri) uri,aj.

After substitution, lri,aj s(aj) s(ai) ωi uri,aj.

Because v(ri) = s(ai) + ωi, the later v(ri) occurs, the smaller s(aj) s(ai) ωi is, and vice versa, the earlier v(ri) occurs, the larger s(aj) s(ai) ωi becomes. Isolating (positive) ωi inside the

LASSITER, LAUFFER, & FRANK

Figure 4: Computing expected value for an AR constraint. The triangle involving ai, ri and aj is formed by a duration constraint d(ai, ri), an STN constraint c(ai, aj), and an AR constraint c(ri, aj). Assignments to controllable timepoints s(ai) and s(aj) may or may not satisfy STN constraint c(ai, aj) (top). The range of probability covered is defined by the bounds on the AR constraint,[lri,aj, uri,aj], and the temporal distance between assignments s(aj) s(ai). The first assignment s (middle) positions the AR constraint range over the mode of the probability density function, while the second assignment s (bottom) pushes the range away from the mode due to a smaller distance s (aj) s (ai).

inequalities yields:

s(aj) s(ai) uri,aj ωi s(aj) s(ai) lri,aj.

Given schedule s and thus s(ai) and s(aj), the probability of satisfying the constraint c(ri, aj), and obtaining value qc(ri, aj), is

pij(s) = Z s(aj) s(ai) lri,aj

s(aj) s(ai) uri,aj Pi(ωi).

The probability of satisfying AR constraint c(ri, aj) depends on where the range of ωi satisfying the constraint, namely [s(aj) s(ai) uri,aj, s(aj) s(ai) lri,aj], falls on the probability distribution Pi(ωi). An outcome may be unlucky, either violating the lower bound lri,aj because ωi is too large, or violating the upper bound uri,aj because ωi is too small. The size of this range of outcomes is defined by the AR constraint bounds, [lri,aj, uri,aj], which is independent of the actual schedule. However, the difference s(aj) s(ai), which is a function of the schedule, determines what portion of Pi(ωi) is covered, as shown in Figure 4. If the mode of Pi(ωi) is in the center of this interval, then probability pij(s) will be high or maximal; if the mode is near one side or the other, or outside the interval, the probability will be low. In general, we must search over many possible schedules, meaning s(aj) s(ai) will vary, and so will the value of the integral defining the probability of success. For a probabilistic duration d(ai, ri) and AR constraint c(ri, aj) the function

Fij(x) = Z x lri,aj

x uri,aj Pi(ωi),

ON EXPECTED VALUE STRONG CONTROLLABILITY

represents all possible probabilities of not satisfying c(ri, aj) that might arise as a result of the choices of s(ai), s(aj) during search. Fij(x) has as its only free variable the distance x = aj ai; the bounds from c(ri, aj) are constant parameters. If Pi(ωi) is unimodal, then Fij(x) is unimodal but not necessarily concave. We note that the term Z

c Cu σ(c(ri, aj), s, ω)qc(ri, aj)

from Equation 2 can be written

c Cu σ(c(ri, aj), s, ω)qc(ri, aj)P(ωi)

because the probability of obtaining qc(ri, aj) depends only on ωi. But as we just showed, the bounds of integration cover all of the outcomes of Pi(ωi) for which σ(c(ri, aj), s, ω) = 1. That means this term can now be written

Z s(aj) s(ai) lri,aj

s(aj) s(ai) uri,aj qc(ri, aj)Pi(ωi)

The above term, in turn, can be expressed using our newly defined function Fij(x):

c Cu Fij(s(aj) s(ai))qc(ri, aj)

which means we can rewrite the expected value formula in Equation 2 as:

c Cs σ(c(ai, aj), s)qc(ai, aj) + X

c Cu Fij(s(aj) s(ai))qc(ri, aj). (3)

In the sequel, we assume that Pi follows a normal, uniform, or beta (with α, β 1) distribution, although the formulation may hold for other unimodal distributions with concavity about their mode. We would like to solve the maximum expected value problem using a MILP. Given the nonlinearity of Fij(x), we must use a linear approximation of Fij(x) in order to use a MILP solver. We choose to approximate from below to obtain a conservative estimate on the probability of constraint satisfaction, and by extension, expected value. While Fij(x) is not concave over its entire range, we can bound its concave region (including its mode) from below with a series of yij linear inequalities defined by functions mk ijx + ck ij, k = 1, . . . , yij that underestimate the probability of obtaining value qc(ri, aj), and from the left and right with appropriately chosen values αij and αij. The construction of this piecewise linear approximation, shown in Figure 5, is accomplished as follows. We first evaluate Fij(x) at a given number of equally spaced x-values surrounding its mode (the more evaluations, the better the approximation) and draw line segments between consecutive points. Then, starting from the left and moving right, we choose the left endpoint of the first line segment whose slope is not larger than that of its predecessor, and, starting from the right and moving left, we choose the right endpoint of the first line segment whose slope is not smaller than that of its predecessor. These points will serve as our approximate left and right inflection points,

LASSITER, LAUFFER, & FRANK

and we use the x-intercepts of the lines tangent to Fij at these points as our αij and αij values. Furthermore, the equations of the lines defining the segments found between these points together with those two tangent lines will serve as our set of bounding functions mk ijx + ck ij. With a fine enough discretization, these functions collectively define a good lower approximation of the concave part of Fij. We must approximate |Cu| such functions Fij, one per AR constraint. We must perform integrals to construct each of their approximations. Once we have decided how many points each piecewise linear approximation will have, however, we can limit the number of integrations. We also have the benefit of doing definite integrals, which is often easy for nice distribution families (e.g., normal distributions). Finally, our approach handles all the integrations offline as pre-processing steps prior to solving the MILP, instead of online, which is the approach taken by (Fang et al., 2014).

5.2 A MILP for Ev SC

Our MILP for Ev SC is shown in Figure 6. There are |A| continuous variables ai representing scheduled times s(ai) for controllable timepoints, |Cu| continuous variables λij representing a lower bound on the probability of satisfying AR constraint c(ri, aj), |Cu| binary variables ξij used to track whether or not aj ai falls within the concave portion of Fij, and |Cr| binary variables βij used to track whether or not the schedule satisfies each rejectable constraint. The constants (indicated in bold) are constraint values qc(ti, tj), requirement constraint bounds [lti,tj, uti,tj], M, αij and αij, and the slopes mk ij and intercepts ck ij of the piecewise linear approximations of functions Fij. Constraints (1) ground the schedule s origin point a0 at time 0, and constraints (2) and (3) enforce non-rejectable STN constraints. For each STN constraint in Cr, the binary variable βij is used in the big-M linear constraints (4) and (5); the constant M > 0 is sufficiently large to render (4) and (5) non-binding when βij = 0 (i.e. the STN constraint is turned off ). Linear constraints (6) and (7) function in much the same way for each AR constraint in Cu; the binary variable ξij may only be set to 1 if aj ai falls within the good (i.e. concave) region of Fij. When aj ai falls outside of the good region, λij is forced to 0 by its upper bound ξij with linear constraint (8). Linear Constraints (9) bound λij above by our approximation for Fij; the big-M term must also be included to make these constraints non-binding when aj ai falls outside of the good region, since

Figure 5: Bounding the probability to construct a MILP.

ON EXPECTED VALUE STRONG CONTROLLABILITY

Cu λijqc(ri, aj) + X

Cr βijqc(ai, aj)

s.t. a0 = 0 (1)

aj ai uai,aj c(ai, aj) Cs \ Cr (2)

aj ai lai,aj c(ai, aj) Cs \ Cr (3)

aj ai uai,aj + M(1 βij) c(ai, aj) Cr (4)

aj ai lai,aj M(1 βij) c(ai, aj) Cr (5)

aj ai αij + M(1 ξij) c(ri, aj) Cu (6)

aj ai αij M(1 ξij) c(ri, aj) Cu (7)

λij ξij c(ri, aj) Cu (8)

λij mk ij(aj ai) + ck ij + M(1 ξij) c(ri, aj) Cu k = 1, . . . , yij (9)

βij {0, 1} c(ai, aj) Cr (10)

ξij {0, 1} c(ri, aj) Cu (11)

Figure 6: MILP maximizing the expected value of a consistent WPSTNU.

otherwise λij would potentially be forced below 0, degrading the accuracy of our expected value estimate. Finally, we explain how the objective function bounds below the schedule expected value. Given a rejectable STN constraint c(ai, aj), we obtain value qc(ai, aj) whenever aj ai falls within its bounds (and thus βij = 1); given an AR constraint c(ri, aj), we expect to receive value qc(ri, aj)Fij(aj ai). Since, by construction of the MILP, λij is a lower bound on Fij(aj ai), the MILP objective value will be a lower bound on the true expected value. Posing the objective function in this way highlights the trade-off between satisfying rejectable STN constraints and ensuring high probabilities of satisfaction for AR constraints.

5.3 Quality of the MILP Solution

As described in the previous section, the piecewise linear functions in constraints (9) and the binary variables ξij together serve to bound the true probability of the schedule satisfying a given AR constraint from below. Thus, the optimal solution to the MILP will generally be lower than the true expected value. Note that error in the approximation will generally occur below αij or above αij, but it could also occur near the mode of Fij(x) if it is approximated by only a small number of pieces. To see that nontrivial error in solutions can occur, we observe that if the piecewise linear approximation is flat (slope 0) near the maximum of Fij(x), leading to the same probability λij in many MILP solutions, the true curve Fij(x) will yield better solutions only found by nonlinear optimization starting at the arbitrary solution found using the piecewise linear approximation. Let

ˆGij = max x {Fij(x) (mk ijx + ck ij)},

LASSITER, LAUFFER, & FRANK

be the largest difference between the true value of Fij(x) and its piecewise linear lower bound. The worst-case error of any solution to a consistent WPSTNU is no more than

Cu qc(ri, aj) ˆGij.

Given a schedule s, let λs ij be the approximated probabilities of success for the schedule from the MILP solution. The error ˆϵ(s) of this schedule is

Cu qc(ri, aj) Fij (s(aj) s(ai)) λs ij ,

and trivially ˆϵ(s) ˆϵ. Suppose that the true optimal schedule is m with value M, and let ˆ M = M ˆϵ(m), that is, the objective value of schedule m according to the MILP. Suppose b is an optimal schedule according to the MILP with objective value ˆB. We thus have the following lemma:

Lemma 1. Given a WPSTNU whose true optimal schedule m has expected value M and whose maximum MILP formulation error is ˆϵ, let b be an optimal solution to the MILP formulation with objective value ˆB. Then ˆB M ˆB + ˆϵ.

Proof. Since each λij value is bounded above by Fij, we have ˆB M. Since b is the optimal schedule according to the MILP, ˆ M ˆB. We also know M = ˆ M +ˆϵ(m) and thus, M ˆB +ˆϵ(m). Finally, ˆϵ(m) ˆϵ, so M ˆB + ˆϵ.

Computing the worst-case error ˆϵ for a schedule requires computing ˆGij for all AR constraints c(ri, aj). The piecewise linear approximation inside the concave region can be made good enough that the error Fij(x) (mk ijx+ck ij) is maximized at the tails, and therefore we only need to compute the difference at αij and αij to find ˆGij.

5.4 Complexity, Soundness, and Incompleteness

The number of variables and constraints employed by a MILP formulation is often used as indicators of how hard it is to solve. Our MILP involves |A| + |Cu| continuous and |Cu| + |Cr| binary variables. In terms of constraints, it has 2|Cs| of them enforcing the lower and upper bounds of STN constraints, 2|Cu| constraints checking whether or not aj ai values fall within their good regions, |Cu| constraints bounding λ values from above within their bad regions, and no more than y |Cu| constraints bounding λij values from above within their good regions, where y = maxc(ri,aj) yij, for a total of no more than 2|Cs| + (3 + y )|Cu| constraints. Solving MILPs is known to be NPcomplete, but it is difficult to formally characterize the hardness of solving this MILP due to the plethora of heuristics and pre-processing techniques utilized by commercial solvers like Gurobi (Gurobi Optimization, LLC, 2022), our solver of choice. Solve time depends not only on the numbers of (binary) variables and constraints in the formulation, but also on the tightness of the lower bound provided by solving its LP relaxation and the geometry of its feasible region, among other factors. Nevertheless, we have the following result:

Theorem 1. Given a WPSTNU W, with (unknown) maximum expected value M, a solution (b, ˆB) to the MILP is found in O(2|Cu|+|Cr|LP(|T|)) (LP(n) is the complexity of solving a linear program over n variables) time such that ˆB M ˆB + ˆϵ.

ON EXPECTED VALUE STRONG CONTROLLABILITY

Thus, the exponential search described above is sound in that it will return a feasible schedule bounding below the maximum expected value, and incomplete insofar as it will return an optimal solution to the MILP given our piecewise linear approximation of the distributions of probabilistic durations of the WPSTNU, which nevertheless is a lower bound on the optimal schedule of the WPSTNU.

5.5 Limitations and Advantages of the MILP Formulation

We formulate the MILP in 5.1 without explicitly assuming that any pair of probabilities Pi(ωi) and Pj(ωj) in a WPSTNU are mutually independent. If the probabilistic durations are not independent, then the rewriting of the objective, and the approximation, depend on marginalizing out Pi(ωi) and Pj(ωj), and do not explicitly use the joint probability Pi,j(ωi, ωj). The allocation of risk will generally increase the optimality gap. To see this, note that a schedule captures risk in an n-dimensional box, while correlated risk distributions could be covered by other shapes with less restrictive constraints. In our evaluations, however, all Pi(ωi) and Pj(ωj) in a WPSTNU are mutually independent. Allowing requirement constraints on two uncontrollables, c(ri, rj), requires generalizing the functions Fij(x) to Z x lri,rj

x uri,rj P(ωj ωi).

Even if Pi(ωi) and Pj(ωj) are unimodal, the composition, in general, is not (Ibragimov, 1956), and thus requirement constraints c(ri, rj) are not permitted in our WPSTNUs with the exception of uncontrollables with normal distributions. The sum, and thus difference, of two independent normally distributed variables is a normal, and thus unimodal; under these and similar conditions the formulation provided works and constraints c(ri, rj) are permitted. The same proviso applies when reformulating linked probabilistic duration constraints d(ai, ri) and d(ri, rj) 2. We must replace d(ai, ri) and d(ri, rj) with d(ai, rk) such that Pk(ωk) = Pi+j(ωk). This is only possible when random durations are independent, and it s only easy to find Pi+j for certain kinds of distributions, e.g. normal but not beta distributions. As noted above, this means the MILP solution will underestimate the true expected value by a larger margin, increasing the error. We also note that while our function bounding the probability of satisfaction from below does not define a concave region over its entire domain (and thus requires that a binary in-out variable be added), it gives us the power to consider situations in which all possible outcomes v(ri) for an uncontrollable must fall entirely on one side of its mode. This was one of the difficulties in dealing with unimodal distributions that the authors in (Santana et al., 2016) did not address, as their choice of functions used to approximate risk required that the bounds for acceptable outcomes for an uncontrollable fall on either side of its mode. Our increased coverage comes at a price: the PARIS algorithm in (Santana et al., 2016) is polynomial-time, while solving the MILP is worst-case exponential, even when Cr = 3.

2. Allowed by HEATlab instances (Lund et al., 2017) but not (Tsamardinos, 2002) and thus not in ROVERS instances (Santana et al., 2016). 3. In practice, one probably would not want to satisfy a high-valued AR constraint with a probability of < 50%, but the WPSTNU provides the opportunity to discover this feature of a schedule, and evaluate value changes to the problem instance that would further increase the probability of satisfaction.

LASSITER, LAUFFER, & FRANK

6. Empirical Evaluation

We evaluate the correctness and computational effectiveness of our approach on the PSTNU instances of the ROVERS dataset generated by (Santana et al., 2016). Each instance of ROVERS features between 2 and 10 rovers performing between 1 and 10 planetary exploration tasks in parallel across a scheduling horizon before transmitting data within a shared communication window. Tasks to be completed by a single rover are generally modeled using contingent or probabilistic duration constraints; controllable timepoint ai represents the start of task i and uncontrollable timepoint ri represents its end. If task j follows task i, an intra-agent AR precedence constraint exists between ri and aj. Coordination between rovers is modeled by AR constraints c(ri, aj), generally starting from uncontrollable timepoints, but also occasionally ending with them instead i.e. the noncanonical form c(ai, rj). These inter-agent constraints are only found within the communication window at the end of the schedule. We also evaluate our approach using HEATlab instances generated by (Lund et al., 2017). These instances feature 20 timepoint variables divided among 2 to 4 agents and 20 35 constraints depending on the selected interagent constraint density, of which up to 15 are contingent. Requirement constraints were set with no upper bound, and the total time (makespan) given to complete each schedule was the midpoint between the minimum and maximum possible times of the critical path through the PSTN. The interagent constraint density, i.e. the proportion of constraints between agents sequences of actions, is variable. The degree of synchronization represents the tightness of the bounds on interagent constraints, which were set to [0, nσ] with n = 1, 2, 4 and σ [0, 5]. We modify the original benchmarks by adding different values qc(ti, tj) to the constraints, and stipulating some STN constraints as mandatory, as explained below. We also add additional constraints to make the instances more difficult, forcing more tradeoffs. We open this section with some analysis of the benchmarks, then describe our results. Our piecewise linear approximation of distribution functions and MILP were implemented in Python, and the MILP was solved using Gurobi 8.1.1. The piecewise linear approx of Fij used 50 pieces.

6.1 Checking for Strong Controllability

The best expected value for an instance of Ev SC will often depend on whether all of the STN constraints (rejectable and otherwise) can be satisfied. If this is not the case, then some constraint in Cr must be discarded; otherwise, whether to discard a constraint or not depends on the relative constraint values, as described in the introduction. As noted in (Fang et al., 2014), even such strongly controllable instances may incur 100% risk. If an Ev SC instance is controllable , an AR constraint may be implicitly rejected to maximize the expected value of other AR constraints, but we generally expect different performance of the MILP on these two classes of instances. Note that the original ROVERS benchmark instances are PSTNUs for which all STN constraints must be satisfied. We characterize the MIT ROVERS benchmarks to determine if they are strongly controllable , by which we mean the STN constraints can all be satisfied with some minimum risk. Doing so allows us to compare the performance of our MILP based on the characteristics of the problem instances. Our check for strong controllability relies on the following lemma, adapted from the SC reduction for STNUs appearing in (Vidal & Fargier, 1999):

ON EXPECTED VALUE STRONG CONTROLLABILITY

Lemma 2. Given a PSTNU, a contingent link g(ai, ri) with bounds [lri, uri] and an AR constraint c(ri, aj) with bounds [lri,aj, uri,aj] , a schedule s is guaranteed to satisfy c(ri, aj) if and only if s(aj) s(ai) [uri + lri,aj, lri + uri,aj].

Proof. We re-express v(ri) as s(ai)+ωi, ωi [lri, uri] so that in order to satisfy c(ri, aj), we must have

s(aj) (s(ai) + ωi) lri,aj

s(aj) s(ai) ωi + lri,aj

s(aj) s(ai) uri + lri,aj.

Similarly, we must have

s(aj) (s(ai) + ωi) uri,aj

s(aj) s(ai) ωi + uri,aj

s(aj) s(ai) lri + uri,aj.

With this lemma in mind, we check for strong controllability of a PSTNU using the PSTN induced by removing all uncertain contingent links g(ai, ri) and replacing their associated AR constraints c(ri, aj) (or c(aj, ri)) with non-rejectable requirement constraints c(ai, aj) whose bounds correspond to those of the lemma. If the resulting MILP admits one or more feasible solutions, the original PSTNU is strongly controllable; otherwise it is not, meaning at least one STN constraint must be discarded. Moreover, the schedule obtained by solving this MILP to optimality is the best strong schedule in terms of the value it expects to gain from satisfying probabilistic AR constraints, while ensuring all other constraints are satisfied. This may not be the optimal schedule when value is added and STN constraints can be rejected. Using this check, we found a schedule for 3141 of the 4380 ROVERS instances. In 1067 of these optimal strong schedules, there was at least one AR constraint with λij = 0, which implies no chance of satisfaction (i.e. the schedule had 100% risk). These numbers are higher than the number of controllable instances reported in (Santana et al., 2016) because our approach to truncating bounds on the probabilistic durations does not require them to straddle the mode of the probability distributions of the probabilistic durations (as discussed in 5.1).

6.2 Computing Minimal Makespan

Given a WPSTNU instance <A,R,C,G,D,Q>, we determined the minimum makespan required to achieve maximum expected value by first computing the maximum expected value ˆB using the MILP presented in 5.1 and then solving the modified problem

min ah s.t. (1) (11) X

Cu λijqc(ri, aj) + X

Cr βijqc(ai, aj) ˆB,

LASSITER, LAUFFER, & FRANK

where ah is the variable representing the last scheduled timepoint of the WPSTNU (the horizon), which may be artificially added if one does not naturally exist. Let the optimal makespan value we obtain from the MILP above be m . We will subsequently use fractions of m to squeeze schedule makespan by adding the constraint ah αm for some α < 1 to the original MILP formulation. Adding these constraints will force additional tradeoffs in expected value in our modified benchmark problems. We can do this both for strongly controllable problems as described above, and those where we already know some constraints must be rejected; we will not distinguish performance on the two types of instances when reducing the makespan.

6.3 Creating WPSTNU Benchmarks

In this section we present a series of empirical studies of our Ev SC algorithm on WPSTNUs generated from the MIT ROVERS benchmark. We first replaced pairs of contingent links and AR constraints with a single STN constraint according to Lemma 2 (same as in the controllability check). We then classified all inter-agent STN constraints as rejectable and all others as non-rejectable, allowing us to find expected value-optimal schedules for the non-SC instances described in 6.1. No lower bounds were imposed on the λij values for the probabilistic AR constraints (meaning 100% risk, and zero expected value contribution, is a valid solution to these problems). Because individual rovers (agents) within the ROVERS instances have sequences of alternating controllable and uncontrollable timepoints in the form g(ai, ri),c(ri, aj), the Ev SC MILP can implicitly reject contingent constraint g(ai, ri) by sacrificing the value on AR constraint c(ri, aj) in order to schedule timepoint aj before timepoint ai, which violates the natural temporal ordering of a single rover s tasks. In order to prevent such illogical scheduling, we added an unrejectable, zero valued requirement constraint c(ai, aj) with bounds [0, ) between such sequences of controllable timepoints. This enforces the precedence between the intra-agent controllable timepoints, even when the AR constraints associated with contingent edges are implicitly sacrificed by the scheduler. We created new derived benchmarks from the ROVERS instances in stages. First, we describe changes to makespan and the number and type of constraints added to the original ROVERS instances. Next, we describe different value schemes, corresponding to how we add values qc to the constraints in the ROVERS benchmarks. The combination of structural modifications and value schemes is used to create many different derived benchmarks. Our first modification is to reduce the makespan by up to 50% compared to the minimum, as described above. In our second modification, we prioritized coordination, so inter-agent constraints are valued more highly than intra-agent constraints. In order to drive increased search costs by introducing more choices, we modified the original benchmark by adding low-value inter-agent constraints that must be traded off against the high-value intra-agent constraints. Allowing interagent constraints to be removed lets us explore trade-offs between individual rovers completing their assigned tasks, and forcing pairs or groups of rovers to cooperate with one another. The new constraints were realized as rejectable STN constraints of the form c(ai, aj) = [ 5, 5]. Our decision to use this comparatively small temporal range was due to the arbitrariness of some of these random constraints with respect to the originally constructed instances. For our third modification, in order to explore the consequences of tight makespan constraints, we reversed the prioritization, making the intra-agent AR constraints more valuable than inter-agent constraints. To explore the impact of skew probability distributions, we modified the original benchmark to make probabilistic AR constraints more valuable than uncertain AR constraints; STN constraints remain low value. To implement the

ON EXPECTED VALUE STRONG CONTROLLABILITY

Value Value of constraints (qc) scheme IRAo IRAn ITA 1 5 5 1 2 1 - 3 3 5 1 1 Value Value of constraints (qc) scheme p AR u AR STN 4 3 1 1

Figure 7: Value schemes used for all experiments. For the first three value schemes, IRAo refers to inter-agent constraints in the original instance, IRAn refers to new, randomly added inter-agent constraints, ITA refers to intra-agent AR constraints. For value scheme 4, p AR refers to probabilistic AR constraints, and u AR refers to uncertain AR constraints.

Derived Benchmark Distribution Added Constraints Makespan Constraint Value Scheme 1 normal 0 none 1 2 normal 0 50% 100% 2 3 normal 0 50 100% 3 4 beta 0 100% 4

Figure 8: Parameters of the ROVERS derived benchmarks used to evaluate the Strong Controllability MILP.

removability of AR constraints, we ensured the STN constraints created using Lemma 2 had the correct value and were labeled rejectable. This scheme differs from the previous benchmarks in that the agent structure of the problem is not used to drive constraint value. Each derived benchmark is accompanied by a value scheme to augment the ROVERS instances with constraint qualities qc(ti, tj). The four different constraint value schemes are described below:

1. All inter-agent constraints, both STN and AR, are more valuable (qc = 5) than their intraagent counterparts (qc = 1).

2. Inter-agent constraints, both STN and AR, have a mix of values (qc {1, 5}); constraints in the original problem (IRAo) have value 5, constraints not in the original problem (IRAn) have value 1. Intra-agent constraints are valued at 1.

3. Intra-agent AR constraints (ITA) are more valuable (qc = 3) than inter-agent constraints (qc = 1).

4. Probabilistic AR constraints (p AR) are more valuable (qc = 3) than uncertain AR (u AR) constraints (qc = 1). Since we replace uncertain links with STN constraints, those replaced STN constraints are rejectable and have qc = 1. All other inter-agent STN constraints were given qc = 1.

LASSITER, LAUFFER, & FRANK

Number of Uncontrollable Durations

Gurobi Runtime (s)

(a) Solve times and proportions of total value expected to be obtained for strongly controllable instances on derived benchmark 1.

Number of Uncontrollable Durations

Gurobi Runtime (s)

Proportion of Total Value

(b) Solve times and proportions of total value expected to be obtained for non-strongly controllable instances on derived benchmark 1.

Figure 9: Solve times and proportion of total expected value for both controllable and uncontrollable instances from ROVERS derived benchmark 1. The color gradient applies to both figures.

The value schemes are summarized in Figure 7. Our derived benchmarks for the ROVERS instance each use one of the four value schemes, in conjunction with either reduction in makespan, addition of new constraints, or exchanging normal distributions for beta distributions. The derived benchmarks are shown in Figure 8.

6.4 Evaluating the MILP: ROVERS

We first report results on derived benchmark 1. All benchmarks are run on a Linux laptop with an Intel 4-core i7-8550U CPU with 16 Gb of RAM. Figures 9a and 9b plot runtime versus the number of uncontrollable durations; the color map indicates the proportion of total value achieved. Figures 9a shows results for the strongly controllable instances of the ROVERS benchmark, and Figure 9b shows results for the uncontrollable instances, where controllable and non-controllable instances are checked as described in 6.1. The Ev SC MILP produced a schedule expected to obtain nearly all (> 95%) of the possible value from AR constraints for most of the ROVERS instances, even those that aren t strongly controllable, with solve times under 0.5 seconds. Note the solve times y-axis ranges in Figures 9a and 9b differ due to a few outliers in Figures 9a. Solve times are comparable to those reported by (Santana et al., 2016) when solving the easier risk minimization version of the original ROVERS benchmark. In order to assess the quality of our MILP solution, we calculate the error as described in 5.3. We report the quality of solution ˆB B, where B is the true expected value (B = ˆB + ϵ(b)) of the

optimal schedule returned by the MILP, as a proxy for the error relative to the true quality ˆB M . For

derived benchmark 1, the mean value of ˆB B was 0.999999668 and the minimum (maximum error) was 0.999992589. Unsurprisingly, optimal schedules for strongly controllable instances obtained a slightly higher proportion of their total value than non-strongly controllable ones.

ON EXPECTED VALUE STRONG CONTROLLABILITY

0.5 0.6 0.7 0.8 0.9 1 0.8

Makespan Scale

Proportion of Total Value

(a) Proportions of total value when limiting makespan over 876 (1 in every 5) instances on derived benchmark 2 as makespan decreased from 100% to 50% of minimum.

0 20 40 0.6

Number of Added Constraints

Proportion of Total Value

Proportion of Constraints Removed

(b) Proportions of total value and proportions of constraints removed as a function of the number of randomly added concurrency constraints on derived benchmark 3.

20 40 60 80 10 2

Number of Removable Constraints

Gurobi Runtime (s)

Proportion of Constraints Removed

(c) Solve times and proportions of constraints removed as a function of the total number of rejectable constraints (original plus added) on derived benchmark 3.

-0.83 0.0 0.83 1.04 1.14 1.19

(8,2) (3,3) (2,8) (2,14) (2,20) (2,26)

Proportion of Total Value

Distribution parameters (α, β)

(d) Proportions of total value as a function of contingent edge distribution skewness over a randomly selected subset of derived benchmark 4.

Figure 10: Performance of the MILP solver on ROVERS derived benchmarks 2,3 and 4.

In order to further explore the aforementioned trade-offs, we now discuss results on the remaining derived benchmarks. Derived benchmark 2 forces more tradeoffs between rejectable inter-agent constraints and probabilities of AR constraint satisfaction by artificially limiting schedule time to completion (makespan) while encouraging the preservation of intra-agent constraints. Figure 10a plots the impact of incrementally cutting allowed makespan from 100% down to 50% of its minimal value, as computed using the MILP described in 6.2, over a representative subset (one in every five) of the instances. It is interesting to note that even severely limiting schedule makespan to half of its minimal value did not have a major effect on optimal expected value for many of the tested instances; often, 90% of the expected value can still be achieved. Runtime on these problems remained low, mostly under one second.

LASSITER, LAUFFER, & FRANK

-0.83 -0.77 -0.69 -0.6 -0.47 0.0 0.47 0.6 0.69 0.77 0.83 Skewness

Solution Quality

Beta distribution parameters ( , )

Figure 11: Solution quality as a function of skewness averaged over derived benchmark 4.

Derived benchmark 3 randomly introduces additional inter-agent concurrency constraints to simulate higher levels of coordination between rovers. The largest ROVERS problem might have 10 rovers and on average 5 activities each with 10 coordination constraints at the end for about 60 constraints; we add between 0 and 50 extra rejectable constraints. Figure 10b shows how the number of added concurrency constraints impacted expected value for the same subset of instances examined in Figure 10a. The color gradient shows the proportion of rejectable requirement constraints Cr (both added and original) rejected in the optimal schedule, presumably in the interest of increasing the probabilities of satisfying AR constraints (due to their low value). Adding these constraints reduces the percentage of total value achieved, which is not surprising; at 50 added constraints, the proportion of total value achieved drops below 80%. Adding these constraints also significantly increased solve times; Figure 10c plots solve time as a function of the number of rejectable constraints for the same subset of instances, showing that the Ev SC MILP becomes much more difficult to solve as more rejectable constraints encouraging coordination are included. Still, all problems are solved in under two minutes. In derived benchmark 4, we experimented with converting probabilistic edges with normal (µ, σ) distribution into beta distribution with lower and upper bounds of x 2σ and x + 2σ, respectively, where the center x is chosen so that the beta distribution has mean µ. The parameters α and β are chosen to skew all distributions to the left or right. Because the beta distribution functions we chose are unimodal with a large convex region, our linear lower bounding strategy was still applicable. Figure 10d plots change in expected value when varying α and β parameters on derived benchmark 4. Recall this benchmark values probabilistic AR constraints over all other constraints. As evidenced by the figure, the proportion of total value is larger for problems in which the beta distributions skewed right (low durations are more likely). Because the ROVERS instances generally have more lower-bound constraints than upper-bound constraints, it is unsurprising that as probabilistic durations tend toward shorter outcomes, the optimal schedule is able to account for more probability mass and therefore obtain greater expected value. The typical loss of percentage of total value for the randomly selected problems due to skew ranges from 2% 3% (skew right) to 6% 8% (skew left).

ON EXPECTED VALUE STRONG CONTROLLABILITY

Derived Benchmark Distribution Added Constraints Makespan Constraint Value Scheme 5 normal 0 none 1 6 normal 0 50% 100% 2 7 normal 0 30 100% 1 8 normal 0 30 100% 3

Figure 12: Parameters of the HEATLab derived benchmarks used to evaluate the Strong Controllability MILP.

We again used derived benchmark 4 to evaluate the impact of skewness on solution quality. As we did with normal distributions, we evaluate ˆB

B with B being the true expected value of b over problem instances with skew distributions. Over the skew distribution instances, the error incurred in the objective value because of λ underestimates gives a solution quality with a mean ˆB

B of 0.9999999 and a minimum of 0.99999984. Figure 11 shows the average error on derived benchmark 4 as a function of skewness. The addition of skewness does not impact the error of our solutions. While the piecewise linear bounds will certainly truncate the tails of Fij, these results suggest that the probabilities found for the ROVERS benchmarks are well inside the concave region of Fij where we expect to see low error due to our approximation.

6.5 Evaluating the MILP: HEATlab

We perform an evaluation on the HEATlab dataset (Lund et al., 2017) analogous to the evaluation discussed in 6.1. Derived benchmarks for the HEATLab are shown in Figure 12. Derived benchmark 5 is the original HEATLab dataset with value scheme 1. Derived benchmark 6 crushes the makespan, but uses value scheme 2. Derived benchmarks 7 and 8 use different value schemes with added intra-agent constraints, for reasons explained below. The HEATlab instances have chained uncontrollable durations, making it difficult to evaluate the implications of beta distributions on the results, since beta distributions are not closed under summation; therefore we do not experiment with beta distributions for the HEATlab dataset. Figures 13a and 13b plot runtime versus the number of uncontrollable durations for strongly controllable and uncontrollable instances of derived benchmark 5, respectively. The controllability of these instances was checked as described in 3. Since HEATlab instances tend to be much smaller than ROVERS instances, the runtime for computing optimal solutions for HEATlab instances tends to much lower. The runtime for computing optimal solutions using our algorithm is about an order of magnitude faster than the SREA algorithm (Lund et al., 2017). As demonstrated by the difference in color gradient scale between Figure 9 and 13, optimal solutions for the HEATlab instances achieve significantly lower values than ROVERS instances. Over derived benchmark 5, error incurred in the objective value because of λ underestimates gives a solution quality with a mean ˆB

B of 0.9997 and a minimum (maximum error) of 0.9925. Figure 14a plots the impact of incrementally cutting the allowed makespan from 100% down to 50% of its minimal value, as computed using the MILP described in 6.2. Crushing makespan over the HEATlab dataset seems to have a similar effect on both HEATLab and ROVERS datasets. One major difference between the two datasets seems to come from the fact that HEATlab instances

LASSITER, LAUFFER, & FRANK

2 4 6 8 10 12

Number of Uncontrollable Durations

Gurobi Runtime (s)

(a) Solve times and proportions of total value expected to be obtained for strongly controllable instances on derived benchmark 5.

4 6 8 10 12

Number of Uncontrollable Durations

Gurobi Runtime (s)

Proportion of Total Value

(b) Solve times and proportions of total value expected to be obtained for non-strongly controllable instances on derived benchmark 5.

Figure 13: Solve times and proportion of total expected value for both controllable and uncontrollable instances from HEATlab derived benchmark 5. The color gradient applies to both figures.

achieve a significantly lower proportion of total value than ROVERS instances. Even accounting for this, it appears crushing the makespan significantly reduces the achievable value when compared to the ROVERS benchmark. Crushing the makespan reduces the % of total value to 40 90% of the max value when the makespan is reduced by 50% at worst, when compared to ROVERS, where the worst case is 80%. Similar to derived benchmark 3, derived benchmark 7 introduces additional inter-agent concurrency constraints using value scheme 1. Figure 14b shows the results. We remark that added constraint bounds [ 5, 5] are actually loose constraints for HEATlab, as opposed to tight constraints for ROVERS, since the makespan of HEATlab instances tends to be quite small (between 10 30). We see that the proportion of total value is quite variable, with the worst case again lower than we found for ROVERS instances. We note there is only a weak dependence of the worst-case proportion of total value achieved on the number of constraints added. Curiously, it appears that more total value is achieved in the worst case as more constraints are added; this may be because the constraints are comparatively loose, and thus are more likely to be satisfied. Finally, we note there is little variation in the number of rejected constraints regardless of the number of constraints added to HEATlab problems, which differs from the behavior of ROVERS problems with added constraints. Figure 14c shows runtime variation in the presence of added constraints on derived benchmark 7. Here, we see that most runtimes are below 0.2 seconds, but that the worst-case runtime grows as more constraints are added. Again, the runtime impact of adding more constraints is weaker than we saw for ROVERS benchmarks with added constraints. We also generated derived benchmark 8, a slightly modified version of derived benchmark 7 that uses value scheme 3. This value scheme reduces the value of added interagent constraints from 5 to 1 and the resulting plot in Figure 14d shows minimal correlation between the number of added constraints and the proportion of total value. This result corroborates the hypothesis that the extra proportion of total value achieved in Figure 14b likely comes from the relative looseness of the

ON EXPECTED VALUE STRONG CONTROLLABILITY

added constraints on the HEATLab benchmark, making them easy to satisfy. Runtime for derived benchmark 8 was comparable to that for derived benchmark 7 and thus is not shown.

0.5 0.6 0.7 0.8 0.9 1

Makespan Scale

Proportion of Total Value

(a) Proportions of total value when limiting makespan over a randomly chosen half (for legibility) of the instances on derived benchmark 6 as makespan decreased from 100% to 50% of minimum.

Number of Added Constraints

Proportion of Total Value

(b) Proportions of total value as a function of the number of randomly added concurrency constraints on derived benchmark 7.

Number of Removable Constraints

Gurobi Runtime (s)

(c) Solve times as a function of the total number of rejectable constraints (original plus added) on derived benchmark 7.

0 10 20 30 0.2

Number of Added Constraints

Proportion of Total Value

(d) Proportions of total value as a function of the number of randomly added concurrency constraints on derived benchmark 8.

Figure 14: Performance of the MILP solver on HEATlab derived benchmarks 6,7, and 8. The heatmaps are hidden from plots 14b and 14c since the proportion of constraints removed from these benchmarks tended to be very low with a few outliers that did not show any meaningful relationships. An analogous plot for 10d, proportion of total value as a function of skewness, is not provided because the HEATlab instances contain sequences of probabilistic edges, which are difficult to evaluate since beta distributions are not closed under summation.

7. Rescheduling Approaches

We have so far focused our attention on generating fixed schedules, the analog of strong controllability (SC) in STNUs. Simply executing such a schedule does not take advantage of information gained during execution, which could improve the expected value. In order to motivate the benefits

LASSITER, LAUFFER, & FRANK

40 50 60 70

Ideal Lighting

Dust Devil Duration 45,45

Take-Image a1 0,

40 50 60 70

40 50 60 70

risk: δ(50) < δ(45)

qc(a0,a5)<qc(r0,a3) (δ(50) - δ(45))

Before execution, optimal schedule:

s(a2)=50, s(a3)=65 reject c(a0,a5)=[60,70]

Rescheduling with new information v(r0)=45:

s(a2)=45, s(a3)=60 satisfy c(a0,a5)=[60,70]

v(r0)=45: can now satisfy c(r0,a3)

Figure 15: Rescheduling in presence of new information during execution can improve schedule expected value. Before execution, the ideal lighting constraint for the second image was rejected on the expectation that the higher (expected) value dust devil would be observed. When new information becomes available, changing the schedule can satisfy both the ideal lighting constraint and observe the dust devil.

of rescheduling during execution in the setting of WPSTNUs, we revisit the dust-devil example from Figure 1. Figure 15 illustrates rescheduling in the presence of new information. As before, let schedule s be the optimal fixed schedule with s (a2) = 50 that violates a single constraint, namely, c(a0, a5), in order to maximize the probability, and thus the expected value, of c(r0, a3). Unlikely though it may be, if the dust devil ends at just the right time, i.e. v(r0) = 45, a schedule can satisfy both c(r0, a3) and c(a4, a5). Even though the mandatory drive time is 45, followed by the mandatory take-image time of 15, and since c(r0, a3) [0, 15], it is possible to take the dust-devil image by scheduling s(a2)= 45, and s(a3)= 60, barely satisfying c(r0, a3). Further, it is possible to schedule s(a5)= 70, and therefore satisfy the ideal lighting constraint c(a0, a5) on the second image acquisition task. By contrast, suppose v(r0) = 44. Because the dust devil now ends one time unit before the drive ends, and because c(r0, a3) [0, 15], it is no longer possible to satisfy c(r0, a3); a3 60 for all schedules since the drive is mandatory, and 60 44 = 16. However, it is still possible to schedule s(a5)= 70, and therefore satisfy c(a0, a5), which still requires scheduling s(a2)= 45. Sticking to the original schedule s would satisfy neither constraint c(r0, a3) nor c(a0, a5). More intriguing rescheduling opportunities include rescheduling after the passage of time but without observing events, and rescaling the probability distributions of unobserved uncontrollable timepoints, to recompute expected value tradeoffs. While there is certainly an analog of dynamic controllability (DC) for WPSTNUs i.e. the Expected Value Dynamic Controllability (Ev DC) problem, it is not clear how best to pose and solve such a problem. In lieu of achieving this goal, we take inspiration from (Lund et al., 2017) and (Abrahams et al., 2019), who employ rescheduling to improve solvability rates of PSTNs. Reschedul-

ON EXPECTED VALUE STRONG CONTROLLABILITY

ing involves execution of a schedule s, observing uncontrollable event outcomes, and re-posing and re-solving PSTNs to satisfy constraints, or reduce or redistribute risk. During execution of an STNU, a controllable timepoint ai is called enabled at time t if executing it at t does not violate any constraints; that is, t is later than the earliest start time of ai and all aj that must precede ai have executed. An uncontrollable timepoint ri from a duration constraint d(ai, ri) or contingent constraint g(ai, ri) is called enabled after controllable ai is executed. When PSTNs are transformed into STNUs, enablement is used as a trigger for rescheduling. DREA (Lund et al., 2017) reschedules after any uncontrollable timepoints become enabled, or are observed, and accepts the new schedule unconditionally. By contrast, the approaches described in (Abrahams et al., 2019) reschedule and accept the newly generated schedule under conditions designed to reduce DREA s schedule invocations and communications, and hence resource usage, at the expense of reduced success rate for PSTNs. Rescheduling from scratch can be time and resource intensive. An alternative approach to responding to new information is to use a low-order computation time algorithm to select the next timepoint to execute, rather than reschedule. (Saint-Guillain, Vaquero, Agrawal, & Chien, 2020) propose generalizing such approaches to selecting the next timepoint to execute, referring to them as Dispatch Protocols (DPs), and describe methods to measure the risk for PSTNs of low computation time DPs that occupy a mid-point between SC and DC. In this paper, we explore a class of rescheduling approaches during execution to improve expected value for WPSTNUs. The interleaving of execution and rescheduling generally proceeds as follows:

1. Execution begins with a schedule s.

2. Over time, controllable timepoints are executed according to their scheduled times s(ai), and uncontrollable event outcomes v(ri) may be observed.

3. The information gleaned during the execution of the schedule is used to determine whether or not to reschedule. If rescheduling occurs, a new schedule s is generated.

4. The current schedule s and new schedule s are evaluated to determine whether or not to continue executing s or switch to the new schedule s .

In this paper, we limit our investigation to fixed-period rescheduling approaches. Fixed-period approaches are simple to implement, but we show that they nevertheless can led to significant improvement in expected value when compared to executing the fixed SC schedule. We distinguish time-based approaches, which reschedule after a fixed amount of time has passed, regardless of the number or type of timepoints that are executed or observed, and event-based approaches, which reschedule after a fixed number of timepoint events are observed. Our interest is in understanding the improvement in expected value resulting from rescheduling methods and frequency. Each method has pros and cons, as we now show. Event-based rescheduling permits less frequent re-normaliziation of the probability of uncertain durations than time-based rescheduling. Notably, enabled uncontrollables (those whose controllable timepoint has been executed) will only be re-normalized when executing a scheduled controllable, or when another enabled uncontrollable ends. Time-based approaches allow for re-normalizing probabilities of enabled uncontrollable events at regular intervals, increasing responsiveness to changes in the expected value of AR constraints. However, fixed-period time-based rescheduling cannot instantaneously react when

LASSITER, LAUFFER, & FRANK

controllables are enabled or uncontrollables occur. If the period between rescheduling is too long, then opportunities to reschedule will be lost, resulting in lower expected value. Short rescheduling periods mitigate this, but are inefficient. Event-based approaches have a rough bound on the number of times new schedules can be generated, namely, |T| = |A| + |R|. This bound is imprecise because we allow a controllable that is enabled, and about to execute, to be postponed by rescheduling. This means each controllable could be enabled multiple times prior to execution. Time-based rescheduling approaches, by contrast, permit an amount of rescheduling proportional to the scheduling horizon, which will generally exceed the number of timepoints. This mix of advantages and liabilities of the approaches makes it unclear which class of approach is superior. Previous approaches to rescheduling either unconditionally accept the newly generated schedule (Lund et al., 2017), or accept a new schedule for PSTNUs only if its risk is sufficiently reduced compared to the risk of the current schedule (Abrahams et al., 2019). As we will see, the MILP approximation introduced in 6 also leads to the need to evaluate a new schedule for WPSTNUs prior to accepting it, because it might not be provably better than the current schedule.

7.1 Definitions

We formalize the elements of rescheduling approaches for WPSTNUs as follows. We first define fixed-period rescheduling policies, whose role is to decide whether to reschedule or not. Timebased fixed-period rescheduling policies are simple to characterize: we reschedule every k time units, regardless of what events occur. Fixed-period event-based policies require more careful definition of the events defining the rescheduling period. We define the events over which fixed-period rescheduling is characterized. Once rescheduling policies are described, we define selection criteria, the conditions under which we switch from the current schedule to a newly generated one. With this formal foundation, we can then define rescheduling approaches as a combination of a scheduler, rescheduling policy, and a selection criterion. We begin with fixed-period time-based rescheduling policies:

Definition 10. Given a WPSTNU W and a schedule s. We refer to the set of possible unit clock ticks by Γ (essentially, Γ Z+). A fixed-period time-based rescheduling policy πΓ,k reschedules every k unit clock ticks.

Note that time-based policies reschedule regardless of whether an event has occurred or is slated to occur at t. We can rescale time to admit arbitrarily small rescheduling periods k and avoid the difficulties of real-valued time. We now define event-based rescheduling policies. To do so, we generalize enablement of controllable timepoints to permit violation of constraints, which means we can use a simpler definition than that of (Lund et al., 2017); enablement of uncontrollables remains unchanged. Enablement of an uncontrollable timepoint ri only happens once, when ai is executed. As we will see below, enablement of the same controllable timepoint ai may happen multiple times during execution if ai is postponed by rescheduling.

Definition 11. Given a WPSTNU W and a schedule s. Let t be the current time during the execution of the schedule. Controllable timepoint ai is enabled at time t if t = s(ai). Uncontrollable timepoint ri in contingent constraint g(ai, ri) or duration constraint d(ai, ri) is enabled if ai was executed at time u t and has not been observed.

ON EXPECTED VALUE STRONG CONTROLLABILITY

Definition 12. Given a WPSTNU W. Define the set of all timepoint events E that can occur during execution and rescheduling of W as E = {n(ai) : ai A} {v(rj) : rj R}, where n(ai) denotes enablement of controllable timepoint ai and as before, v(rj) denotes observation of uncontrollable timepoint rj. Let X denote an event sequence of events in E, and denote the set of all event sequences X over E by E. An event-based rescheduling policy is a Boolean function π : E { , }.

Referring again to the Dust Devil example in Figure 15, the set of timepoint events is E = {n(ai), i 1 . . . 5 v(r1)}. Recall optimal schedule s for the initial Dust Devil problem before execution assigns s(a1)= 0 and s(a1)= 50. If r1 is observed at t = 51, the sequence ending when r1 is observed is X = {n(a1), n(a2), v(r1)}. An event-based rescheduling policy indicates whether rescheduling should occur after having observed an event sequence during the execution of the schedule. The time t of the last event in the sequence isn t used in defining the rescheduling policies. Note that a sequence is not just a perturbation of a (subset of) events in E, because controllable timepoints can be enabled multiple times if they are postponed by rescheduling. We don t define event-based rescheduling policies based on enablement of uncontrollable timepoints, because Definition 11 ensures that the execution of controllables enables uncontrollables. Finally, we don t include the execution of timepoints in the set of events, because controllable timepoints are executed immediately when they are enabled, unless the rescheduler moves them. We now define two fixed-period event-based rescheduling policies πT,k and πR,k:

Definition 13. Given a WPSTNU W and a schedule s. Let E be the set of events for W. Let X be a sequence of events as defined above, and let N(X) and V (X) denote the numbers of controllable timepoint enablements and uncontrollable timepoint observations, respectively, included in X.

Define the fixed-period event-based rescheduling policy πT,k(X) as follows: πT,k(X) = if (N(X) + V (X)) mod k = 0, πT,k(X) = otherwise.

Rescheduling occurs if πT,k(X) = . The subscript T emphasizes that rescheduling can occur at an event involving any timepoint in T = A R.

Let e be the last event in X. Define the fixed-period event-based rescheduling policy πR,k(X) as follows: πR,k(X) = if (V (X) mod k = 0) ( ri|e = v(ri)), πR,k(X) = otherwise.

Under πR,k, we reschedule if πR,k(X) = . The subscript R emphasizes that rescheduling occurs only at events involving timepoints in R.

The intuition behind fixed-period event-based policies is that rescheduling occurs every k events. Unlike fixed-period time-based policies, rescheduling only occurs in response to events. Policy πR,k includes the condition that the last event e in the sequence is an observation of an uncontrollable to ensure rescheduling doesn t occur every time a controllable is executed after (V (X) mod k = 0). Referring again to the Dust Devil example in Figure 15, consider sequence X = {n(a1), n(a2), v(r1)}.

LASSITER, LAUFFER, & FRANK

Under policy πR,1, rescheduling would occur after this sequence is observed because V (X) = 1, 1 mod 1 = 0, and the last event is an observation of an uncontrollable timepoint. Under policy πT,3 rescheduling would also occur after this sequence is observed because N(X) + V (X) = 3, and 3 mod 3 = 0. Under policy πT,1, rescheduling would occur after this sequence is observed because N(X) + V (X) = 3 and 3 mod 1 = 0. Finally, under policy πT,2, rescheduling would not occur after this sequence is observed, because N(X) + V (X) = 3, and 3 mod 2 = 1; however, rescheduling would occur under πT,2 after observing sequence X = {n(a1), n(a2)}. As mentioned previously, rescheduling at time t is performed before any events for which s(ai) = t are executed for all rescheduling policies we consider, and could postpone events for which s(ai) = t. This means event-based rescheduling could occur more than |T|

k times. Also note rescheduling is assumed to be instantaneous , meaning that even if s(ai) = u > t, the current event can be rescheduled to the current time, i.e. s(ai) = t, and immediately executed. Finally, suppose the event count is j < k, but more than k j events occur simultaneously. Our approach is to reschedule when the event count reaches k; events occurring at exactly the same time are selected arbitrarily to reach k events prior to rescheduling. In large problems and dense time real systems, especially with many uncontrollable events, this corner case is unlikely to occur. Due to the non-determinism of uncontrollable event occurrences, we believe a more elaborate deterministic strategy is unlikely to lead to substantively different results than those we report below. For this paper, the scheduling algorithm is the MILP previously presented in 6, which will be used both to generate the initial schedule, and to reschedule during execution; we denote this MILP approach by M. In the next section we describe the design of the selection criteria, but provide our notation here:

Definition 14. Given a WPSTNU W, a current schedule s, and a proposed schedule s . A selection criterion χ : S S S, where S is the set of all possible schedules, chooses one of the two schedules s, s to continue executing.

Fixed-period rescheduling approaches are thus parameterized by a scheduling algorithm M, a fixed-period rescheduling policy πY,k with Y {Γ, T, R}, and a selection criterion χ, and are denoted R = (M, πY,k, χ). After an initial schedule s is generated by M, execution of the schedule commences. While executing s, the policy πΓ,k(X) tracks time; the policies πT,k(X) and πR,k(X) count subsets of events. If enough time has passed, or enough events have occurred, then rescheduling using M is invoked, and a new schedule s is generated based on the current time, executed controllables, and uncontrollable occurrences. Once the new schedule is generated, the selection criterion χ(s, s ) determines whether to continue executing s, or replace s with s , after which execution resumes. Now that we have defined rescheduling approaches, we turn to the expected value proposition. From Definition 9, we know how to evaluate the value of a fixed schedule s and a particular outcome ω:

f(s, ω) = X

c C qc(ti, tj) σ(c(ti, tj), s, ω).

This definition can be extended to capture the expected value of a rescheduling approach R for a particular outcome, f(R, ω). Rescheduling approach R will, ultimately, produce some as-executed schedule se, which depends on ω, but otherwise, the value function is well-defined, i.e.,

ON EXPECTED VALUE STRONG CONTROLLABILITY

f(R, ω) = X

c C qc(ti, tj) σ(c(ti, tj), se, ω).

This, in turn, means that the expected value g(R) is also well-defined, leading to the following:

Definition 15. Let W be a WPSTNU. A rescheduling approach for W is a 3-tuple R = (M, π, χ). Denote the value obtained by R given a particular vector ω Ωof outcomes of uncontrollable timepoints by f(R, ω). Then the value obtained by R has expectation g(R) = R

ω Ωf(R, ω)P(ω).

We now compare the Dynamic Robust Execution Algorithm (DREA) of (Lund et al., 2017) to our rescheduling approaches. We observe that DREA s policy is reminiscent of rescheduling policy πT,1 applied to PSTNUs. DREA reschedules when uncontrollable events are first enabled or are observed, or when a controllable executes while another uncontrollable is enabled but hasn t yet been observed. DREA does not reschedule if a controllable is executed but no uncontrollables are enabled. By contrast, πT,1 reschedules when any controllable is enabled or uncontrollable timepoint is observed. DREA reschedules only after executing controllable timepoints, without the option to postpone them. DREA s scheduler is an LP, and the selection criterion is simply to accept the new schedule, while our rescheduling approaches use a selection criterion to choose whether to keep a schedule or not.

7.2 Designing a Selection Criterion

In this section we explain how to design a selection criterion χ that is guaranteed to improve the expected value. To set the stage for this selection criterion, we start with a simple example. Let s be an arbitrary schedule generated for WPSTNU W prior to execution. Consider a rescheduling approach R = (M, πt, χ) where scheduling policy πt employs a scheduler M to suggest an updated schedule st exactly once at an arbitrary time t, and a selection criterion χ that decides whether to switch the schedule from s to st. We observe that the sequence X incorporates all timepoint executions and observations that occur before t; implicit in X are constraints on executed controllables, observed uncontrollables, and most importantly, enabled uncontrollables that have not yet been observed. We can transform the original WPSTNU W into a new WPSTNU W by enforcing all the constraints on controllable and uncontrollable timepoints arising from X, and renormalizing the probability distributions P(Ωi) on unobserved uncontrollables. Once this is done, we can build the new schedule st for W . We will denote by Ωt the cross product of uncontrollable outcomes remaining unobserved after t with probability distribution P(Ωt). We can now evaluate the true expected value g(s) = R

ω Ωt f(s, ω)P(ωt) of the suffix of the original schedule s, and similarly the true expected value of the new schedule g(st), and choose the better of the two schedules. The selection criterion χ that implements the process described above is:

χEV (s, st) =

( st if g(st) > g(s), s otherwise.

Computing tails of normal and beta distributions is all that is required to compute the expected value; this is not computationally expensive, and computing the true expected value ensures the selection criterion always chooses the best plan. Since our rescheduling policy πt adheres to schedule s up to time t, and then either sticks with s or switches to a schedule with a higher expected value conditioned on X for an arbitrary set of uncontrollable outcomes ω Ωt, it must be that

LASSITER, LAUFFER, & FRANK

g(R) g(s). The same argument can be extended to say that a policy π that reschedules arbitrarily many times, always checking to ensure the true expected value of the new schedule improves on the recomputed true expected value of the current schedule on the new problem W before switching, cannot lead to a reduction in expected value relative to the initial schedule s. This proves the following result:

Proposition 1. Let W be a WPSTNU and let s be a schedule for W. Let π be a rescheduling policy. Let χEV be the selection criterion defined above. Let R = (M, π, χEV ) be a rescheduling approach for W. Then g(R) g(s).

Notice Proposition 1 is very general. Any rescheduling approach using χEV is guaranteed to ensure improvement in expected value, not just the fixed-period rescheduling approaches we focus on in this paper.

Algorithm 1: Simulated Rescheduling Approaches

Input : WPSTNU W Input : Rescheduling approach R = (M, πY,k, χEV ) MILP rescheduler M, event type Y {T, R, Γ}, rescheduling period k, selection criterion χEV Var : Schedule s Var : MILP representation P W of W R R ; Copy R to track observed uncontrollables during execution A A ; Copy A to track executed controllables during execution T A R ; Tracking executed and occurred timepoints v Realize(R) ; Randomly generate uncontrollable durations t 0 ; Placeholder for current time τk NULL ; Placeholder for current timepoint P W P W ; Placeholder for progressed MILP Ωt iΩi ; Placeholder for renormalized joint probability dist. s Solve(P W ) ; Generate initial optimal SC schedule while T = do

if (X = T) (X = R) then Rescheduling approaches πT,k and πR,k τk Get Next Timepoint(t, s, v, πY,k) ; t Get Time(τk, s, v) ;

t t + k ; Rescheduling approaches πΓ,k

(P W , A ) Update Controllables(P W , A , s, t) ; (P W , R ) Update Uncontrollables(P W , R , s, v, t) ; (P W , Ωt) Update Dists(P W , R R , s, v, t) ; st Solve(P W ) ; if g(st) > g(s) then Follow selection criterion χEV s st ;

(P W , A ) Update Current Controllables(A , s, t) ; T A R ;

return s ; The as-executed schedule

ON EXPECTED VALUE STRONG CONTROLLABILITY

7.3 Rescheduling to Improve Expected Value

To implement our rescheduling approaches, we will rebuild the MILP for W conditioned on X and re-solve it, to produce our proposed new schedule st. The process for updating the MILP by incorporating X is described in detail in the next section. Typically, the schedule can be improved after incorporation of constraints derived from X, so we expect the new MILP solution to reflect this improvement. Since the objective value of our MILP is a strong proxy for expected value, as shown in 6.4, we expect st will nearly always be chosen over s, except in rare cases where the under-approximation error for st so far exceeds that of s that the relationship between their MILP objective values is opposite that of their expected values. Nevertheless, we still explicitly evaluate the true expected value of the schedule resulting from the MILP solution in accordance with χEV . Algorithm 1 shows our process for simulating rescheduling approaches for WPSTNUs. We compute the realization of all probabilistic durations up front but only use their values at the time of their realization. Contingent constraints cannot be accurately simulated since they lack probabilistic information of how nature chooses durations during execution. In order to simulate instances with contingent constraints, they are first converted into requirement constraints using Lemma 2. Therefore, a contingent constraint is only satisfied during simulation if the timepoints satisfy every outcome for the duration of the contingent constraint. After initialization, the algorithm determines how much time passes before rescheduling occurs. To simulate event-based approaches, function Get Next Timepoint exploits the up-front generation of uncontrollable event times, and returns the kth timepoint event into the future, and function Get Time returns the time this timepoint is either scheduled or occurs. Simulating time-based approaches simpler, requiring only extraction of k from the rescheduling policy description πY,k, and subsequent arithmetic. The two functions Update Uncontrollables and Update Controllables each make various modifications to the MILP to incorporate the passage of time prior to rescheduling by solving the updated MILP. They also update the sets of timepoints (A , R ) that have been executed or occurred, to drive the outer loop of the rescheduling simulation. Update Controllables updates the MILP by fixing the time of executed controllable timepoints and preventing unexecuted timepoints from being scheduled in the past. It does this by constraining the value of the MILP variable corresponding to a controllable timepoint ai to its execution time s(ai) in schedule s. This operation is performed on each controllable timepoint that was executed since the previous reschedule, except for those scheduled to execute at t, allowing these timepoints to be rescheduled to a time in the future, as described in 7. Update Controllables also sets the lower bounds of each MILP variable corresponding to the remaining unexecuted controllable timepoints to the current time t in order to prevent subsequent MILP solutions from rescheduling these timepoints to a time earlier than t. Update Uncontrollables updates components of the MILP corresponding to AR constraints c(ri, aj) whose uncontrollable timepoint ri has been observed (i.e., v(ri) = τ t). We split Cu into two sets. Cres u denotes the set of resolved AR constraints (i.e., AR constraints c(ri, aj) whose uncontrollable timepoint ri was observed at or prior to time t, essentially transforming these constraints into rejectable STN constraints). Cpend u denotes those that are enabled (i.e., ri has not yet realized). For c Cres u , the variable λij is dropped from the MILP, and constraints 6, 7, 8 and 9 for c(ri, aj) are removed and replaced with constraints

aj τ lri,aj M(1 ξij),

LASSITER, LAUFFER, & FRANK

aj τ uri,aj + M(1 ξij).

Note that the binary variable ξij has now been re-purposed as a rejection decision variable (similar to variables βij) for constraints in Cres u ; it can only take value 1 when lri,aj s(aj) τ uri,aj. Note also that the left-hand sides of the added constraints reduce to a constant when both aj has executed and ri occurs prior to time t. We must also augment the objective function so that we obtain value qc(ri, aj) when constraints in Cres u are known to be satisfied (i.e., ξij = 1) by a proposed schedule:

Cr βijqc(ai, aj) + X

Cres u ξijqc(ri, aj) + X

λijqc(ri, aj).

Update Dists modifies the MILP by recomputing probability distributions on contingent edges and updating coefficients in the MILP corresponding to uncontrollable timepoints that have not been observed by time t. If a probabilistic duration constraint d(ai, ri) has been activated (i.e. ai has been executed) prior to time t but ri has not yet occurred, then its probability distribution can be bounded below by t. This requires truncating and scaling the PDF of Ωi to

Pi 1 Fij(t s(ai)) if Ωi t s(ai),

0 otherwise;

where F is the CDF of Ωi. With the PDF modified in this way, we rebuild and re-approximate the functions Fij (or Fji), now denoting them as F t ij, for any AR constraints involving ri, and update the α, α, mk ij, and ck ij values used in MILP constraints 6, 7, and 9 as described in 5.1. Constraints 8 and 9 are updated to be bound above by this new piecewise linear approximation of F t ij. Recall from Equation 3 in 5.1 that we can write the expected value of a fixed schedule s as

c Cs σ(c(ai, aj), s)qc(ai, aj) + X

c Cu Fij(s(aj) s(ai))qc(ri, aj).

When rescheduling at time t, we can represent the expected value of the original solution s conditioned on the updated distribution Ωt relating to uncontrollable timepoints that we have observed up to time t, denoted E(P(Ωt)f(s, Ωt)), as

c Cs σ(c(ai, aj), s)qc(ai, aj)+ X

c Cres u σ(c(ri, aj), s, ω)qc(ri, aj)+ X

F t ij(s(aj) s(ai))qc(ri, aj),

where Cres u and Cpend u are defined as above, using the renormalized PDFs. This is now our new objective function. Once all of these updates have been applied, we re-solve the MILP, obtaining a schedule st that may reschedule controllable timepoints scheduled to execute at or after time t, and compare its expected value to that of the old schedule s. As described above, Update Controllables does not fix the value of variables representing controllables that could execute at t, to allow the possibility of rescheduling. Once the next solution is chosen, Update Current Controllables fixes all variables executing at t in the MILP. This routine constrains each controllable timepoint ai for which s(ai) = t to ensure they are fixed in the MILP. While this eventuality is very unlikely for time-based approaches, it is not zero, and increases with rescheduling period. It is more important to protect against this scenario in eventbased approaches, since rescheduling is triggered by execution or occurrence of timepoints, and we can t discount the possibility that multiple timepoints are scheduled to execute simultaneously.

ON EXPECTED VALUE STRONG CONTROLLABILITY

7.4 Evaluating Rescheduling Approaches

Derived Benchmark Origin benchmark Distribution Added constraints 5 HEATlab Normal 0 9 ROVERS Normal (two values of σ) 5 10 ROVERS Beta(6, 1.5) 5 11 ROVERS Beta(1.5, 6) 5

Figure 16: The parameters of the derived benchmarks used for evaluating rescheduling policies. All of these derived benchmarks use value scheme 1 (intra-agent constraints have value 1, all inter-agent constraints, original and new, have value 5). All added constraints are inter-agent concurrency constraints. Each instance has a fixed makespan constraint of 50%. As discussed in 7.4, two different standard deviations σ are used in variants of benchmark 9.

In order to empirically evaluate the benefits of rescheduling, we use both the MIT ROVERS benchmark introduced earlier, and a benchmark from the derived from the HMC HEATlab. Both benchmarks are similar inasmuch as they consist of multiple agents (rovers or robots) with both interand intra-agent constraints, which means we can use the same value schemes we used for the ROVERS instances to add different constraint values. The benchmarks are different enough that using both provides a good demonstration of the value of rescheduling across different types of problems. The HEATlab instances time horizon is smaller than ROVERS instances. HEATlab instances have no contingent constraints, meaning they are PSTNs, which we turn into WPSTNUs. As we saw in the analysis in 6.4, the optimal fixed schedule for ROVERS instances without additional constraints or reduced makespan is quite close to the maximum possible value, thus providing little leverage for rescheduling. To make the ROVERS benchmarks more challenging and emphasize the value of rescheduling, we reduced the makespan by 50% and added a handful of rejectable inter-agent constraints. The HEATlab instances, by contrast, have more room for improvement due to rescheduling, but to maintain consistency of the benchmarks, we also reduce the makespan by 50% for the HEATlab instances. Finally, the HEATlab instances have chained uncontrollable durations, making it difficult to evaluate the implications of beta distributions on the results, since beta distributions are not closed under summation. Thus, we only evaluate HEATlab instances using normal distributions. Beta distributions in the rescheduling derived benchmarks were constructed in the same way as the earlier derived benchmarks. We summarize our rescheduling approach derived benchmarks in Figure 16. We previously described some differences between the ROVERS and HEATlab benchmarks in 6. Before describing the empirical evaluation of rescheduling approaches, we point out another difference. In the HEATlab benchmarks, we note some of the original AR constraints may take the form c(ri, aj) = [ y, 0], that is, aj must preemptively be executed before ri or immediately on observing the uncontrollable. The AR constraints in the ROVERS instances all take the form c(ri, aj) = [y, x] with 0 y < x This difference between the two benchmarks will become important when we discuss our results on rescheduling approaches below. To quantify the benefit of rescheduling, we need to know how the initial MILP solution compares to the total value of all constraints in the initial problem, and how the rescheduling policy solution compares to the total value and the initial MILP solution. For WPSTNU Pe, denote Qt = P c Cqc(ti, tj). If ˆB is the value of the optimal MILP solution, then Q ˆB is the miss-

LASSITER, LAUFFER, & FRANK

5 10 15 20 25 30 35 0

Average number of reschedules

Proportional value gained (%)

πΓ,k πT,k πR,k

Missing value gained (%)

(a) The relationship between the average number of reschedules and the proportional value gained by each rescheduling policy.

37.6 19.9 11.1 6.5 4.1 2.9

86.86 85.66 83.65 81.21 79.58 78.88

22.8 15.4 11.2 9.0 7.3 6.1 5.3 4.6

89.50 86.73 84.95 83.38 82.87 82.06 81.95 81.46

7.8 4.4 3.3 2.7 2.3 1.9

86.03 83.14 81.81 81.13 80.15 79.96

Average number of reschedules

(b) The IPC scores comparing the performance of different rescheduling policies (in colored box). The value below the IPC scores show the average number of reschedules. See the beginning of 7.4 for how the IPC scores are computed.

Figure 17: Experimental results on derived benchmark 5 (HEATlab). Each problem instance is sampled twenty times for every rescheduling policy and the resulting value after execution is averaged. Rescheduling at all timepoints achieves a greater average value than rescheduling at clock ticks with fewer average reschedules. Rescheduling at only uncontrollable timepoints achieves similar average value to the most frequent clock tick rescheduling policy with far fewer average reschedules.

ON EXPECTED VALUE STRONG CONTROLLABILITY

ing value that could be regained by rescheduling. Let ˆBπ denote the value achieved by a given rescheduling policy π. Then we can evaluate performance of π by examining two metrics. First, ˆBπ ˆB Qt ˆB is the missing value gained by π. This value only measures improvement achieved by π in relative terms. Specifically, if the missing value is very small, a large or small buy-back is almost irrelevant. Second, ˆBπ ˆB

Qt is the proportional value gained; this is the proportion of total value Qt that is achieved by π and not the original MILP. This value measures the improvement in absolute terms. By evaluating both metrics together, we can assess how useful rescheduling is versus the MILP solution alone; if there is a large buy-back potential and rescheduling is able to restore a significant percentage of this value, that is a good result for rescheduling. Notice the two metrics differ only in the denominator. Figure 17a and Figure 17b compare the performance of rescheduling approaches on derived benchmark 5, for the HEATlab instances using normal distributions. Each instance is solved and rescheduled 20 times, and the values in the plot represent averages over all instances in the benchmark. The x axis of both figures is the average number of reschedules. Because the number of event-based reschedules may vary, we use this axis instead of the rescheduling period k. For Figure 17a, the left y axis is the proportional value gained, or ˆBπ ˆB Q ˆB , and the right y axis is missing value

gained, or ˆBπ ˆB

Q . Recall πΓ,k are time-based approaches, πT,k reschedules after any event occurs, and πR,k only reschedule when an uncontrollable event occurs. We first observe that the time-based approaches exhibit a point of diminishing returns with more reschedules. We also note event-based approaches generally reschedule less than the time-based approaches; this is understandable when we realize that there are only |T| events, so the event-based approaches only reschedule approximately |T| times, while the time-based approaches can reschedule much more often. Somewhat surprisingly, we observe that the event-based approach using policy πT,k achieves the best performance with fewer reschedules than second-place, and worse performing, πΓ,k. We see that, compared to the SC schedule, up to 38.4% of the missing value can be restored by πΓ,k, and this translates to roughly 11% of the total achievable value. The other approaches restore less than 30% of the missing value, or 7.5% of the total value. Figure 17b shows the International Planning Competition (IPC) scores of the different rescheduling approaches. The IPC score is used to determine the best overall approach, as follows: for instance i, let Mi represent the proportion of total value obtained by the best approach. Then, for a given approach Y that achieves a proportion mi Y of total value on instance i, we calculate Score(i, Y ) = mi Y Mi 100%. The IPC scores show the average number of reschedules for each rescheduling period k for each rescheduling policy class we compared. Note that the x axis differs for each rescheduling approach family. The heat chart shows a darker color for the better score. The IPC score is sensitive to the relative improvement of expected values achieved by different algorithms on the same instance, allowing for a direct comparison of the algorithms that isn t as easily achieved by examining the results of Figure 17a. Performance is aggregated over all algorithms performance on the same problem instance, and performance over all instances is also aggregated. When compared this way, we see that the policy πT,k with the smallest rescheduling period, and therefore the maximum number of reschedules, is best. Furthermore, it achieves this performance by performing about 60% of the reschedules performed by the second best policy πΓ,k. Rescheduling approaches πR,k performed worse than the other two, but surprisingly well, given that rescheduling only was permitted after uncontrollable events were observed; nevertheless, πT,k is preferable. We note that an IPC score of 100% indicates an algorithm that performs best on all instances. None of the reschedul-

LASSITER, LAUFFER, & FRANK

0 50 100 150 200 250

Average number of reschedules

Proportional value gained (%)

Missing value gained (%)

πΓ,k πT,k πR,k

(a) The relationship between the average number of reschedules and the proportional value gained by each rescheduling policy. Notice the significantly lower average value gained in comparison to the HEATlab data set.

283.7 143.4 73.5 38.2 20.5 11.1

99.31 99.14 98.08 98.01 97.48 97.52

112.4 64.0 44.1 32.1 25.6 22.0 17.4 15.5

99.40 98.92 98.56 98.30 98.12 98.13 97.83 97.80

51.1 26.1 17.5 13.5 10.6 9.6

98.20 97.95 97.93 97.74 97.78 97.74

Average number of reschedules

(b) The IPC scores comparing the performance of different rescheduling policies (in colored box). The value below the IPC scores shows the average number of reschedules. See the beginning of 7.4 for how the IPC scores are computed.

Figure 18: Experimental results on derived benchmark 9 (ROVERS, Normal distributions). Each problem instance is sampled once for every rescheduling policy, and the resulting value after execution is averaged. Rescheduling at all timepoints achieves similar average value to rescheduling at clock ticks, and does so with fewer average reschedules.

ON EXPECTED VALUE STRONG CONTROLLABILITY

ing approaches exceeds an IPC score of 90% on the HEATlab instances, meaning no algorithm outperforms the others on all instances. Figure 18a and Figure 18b compare the performance of rescheduling approaches on derived benchmark 9, for the ROVERS instances using normal distributions. Due to the large number of instances, only a subset was used to generate these plots, and each instance was solved once. As before, the x axis of Figure 18a is the average number of reschedules, the left y-axis shows proportional value gained, and the right y-axis shows the missing value gained. The results differ from those we observe for HEATlab instances. As we see from Figure 18a, only roughly 1.5% of the total value can be restored, and only roughly 11% of the missing value can be restored. We see the same diminishing returns from πΓ,k as rescheduling period decreases, but πΓ,k and πT,k appear close in performance at the maximum number of reschedules. Figure 18b shows the International Planning Competition (IPC) scores of the different rescheduling approaches for the ROVERS instances. We see a difference in algorithm performance compared to performance on the HEATlab instances; πT,k and πΓ,k with the smallest rescheduling period, and therefore the maximum number of reschedules, are almost indistinguishable, but πT,k achieves this performance with only 40% of the reschedules performed by πΓ,k. We note all three algorithms achieve IPC scores of almost 100% when rescheduling as frequently as possible, indicating very little performance variation between approaches, but πR,k is a little worse than the other two approaches. We now more closely examine the performance of rescheduling approaches on these benchmarks and answer the following questions: why is more frequent rescheduling needed for timebased approaches to match the performance of event-based approaches? Why is πR,k worse than πT,k? And what accounts for the difference in performance of time-based approaches on ROVERS and HEATlab benchmarks? We observe that performance of time-based approaches πΓ,k climbs, then tails off, with smaller k, and thus more rescheduling. We note instantaneous rescheduling upon event occurrences is almost impossible when using πΓ,k, since times when rescheduling occurs aren t necessarily aligned with the scheduled execution of timepoints. Both time-based and event-based approaches can take advantage of observed uncontrollable timepoints, which resolve AR constraints into rejectable STN constraints. However, event-based approaches solve a less-constrained problem, because rescheduling occurs immediately after observing the uncontrollable. The difference is that the bounds of c(ri, aj) are [0, x] when rescheduling occurs instantly, while the bounds are [y, x] for some y > 0, where y could be as high as the rescheduling period k. The longer the rescheduling period, the more constrained the problem is for a time-based approaches due to the lower bound on all unexecuted controllable timepoints. Frequent rescheduling is needed to respond to uncontrollable events in time to satisfy AR constraints and reduce this constraint penalty. Given the large amount of information provided by observing an uncontrollable timepoint, why is πR,k worse than πT,k? Waiting for the next uncontrollable to be observed may take too long and miss opportunities to update the schedule due to premature commitment to satisfying a low-valued constraint. Consider a rejectable STN constraint c(a1, a2) with bounds [la1,a2, ua1,a2] satisfied in schedule s such that s(a2) s(a1)= ua1,a2. Now consider an AR constraint c(r1, a2) and suppose v(r2) >s(a2). If r1 is pending at s(a2), then it is possible that the renormalization of P1 will make the expected value λ12qc(r1, a2) > qc(a1, a2), and thus postponing a1 to increase the probability of satisfying c(r1, a2), at the expense of rejecting c(a1, a2), is worthwhile. Policy πT,1 will reschedule at time s(a2), and policy πR,1 will not reschedule at s(a2). This means πR,1 will forego opportunities to improve the expected value.

LASSITER, LAUFFER, & FRANK

Average number of reschedules

Proportion of value gained (%)

πΓ,k πT,k πR,k

(a) The relationship between the average number of reschedules and the proportional value gained by each rescheduling policy on derived benchmark 5. Same data as Figure 17a.

0 20 40 60 80

Average number of reschedules

Proportion of value gained (%)

πΓ,k πT,k πR,k

(b) The relationship between the average number of reschedules and the proportional value gained by each rescheduling policy on the modified version of derived benchmark 5.

Figure 19: A comparison between results on the standard version of benchmark 5 and a modified version where AR constraints c(ri, aj) = [ y, 0]) are flipped to be positive (c(ri, aj) = [0, y]). Shows that the performance of time-based rescheduling does not necessarily asymptotically approach the performance of event-based rescheduling when there are negative at-risk constraints.

Our results show that event-based approaches always beat the best time-based approaches on the HEATlab benchmark, but that both approaches yield comparable performance for ROVERS if time-based approaches reschedule very frequently. We previously noted that HEATlab instances contain AR constraints of the form c(ri, aj) = [ y, 0]. These are inter-agent constraints of high value qc(ri, aj) = 5 per value scheme 1, and are often very tight (e.g. in some problems c(ri, aj) = [ 1, 0]). Time-based rescheduling generally will only satisfy such constraints with low probability, because aj must preempt ri. Furthermore, the expected value is small due to the tightness of the constraint bounds, making the expected value tradeoff unfavorable. By contrast, both eventbased approaches can instantaneously reschedule aj after observing ri, and benefit from treating the AR constraint as a rejectable STN constraint of full (as opposed to expected) value, even though satisfying these constraints imposes a tight [0, 0] constraint. We hypothesize the presence of these negative AR constraints (of the form c(ri, aj) = [ y, 0]) explains the difference in performance between the time-based and event-based approaches. We performed a test on a modified version of derived benchmark 5, where the negative AR constraints bounds were flipped to be positive (c(ri, aj) = [0, y]. Figure 19 compares the original performance of benchmark 5 and this modified version. Figure 19b shows that time-based approaches with small k, and thus more frequent rescheduling, and event-based approaches achieve similar expected values on this modified benchmark, suggesting that this feature of the HEATlab benchmark explains the difference in the performance of the approaches. Figure 20 (spanning 2 pages) demonstrates the different performance on the ROVERS benchmark across derived benchmarks 9 through 11 as the skewness of distributions is varied. The first version of derived benchmark 9, described above, uses normal distributions for all probabilistic

ON EXPECTED VALUE STRONG CONTROLLABILITY

edges; in derived benchmarks 10 and 11, probabilistic edges with normal (µ, σ) distribution were converted into beta distributions with mean µ and lower and upper bounds x 2σ and x + 2σ, respectively, where the center x is chosen so that the beta distribution has mean µ. Since the support of our beta distributions is 4σ (rather than 1 like in a standardized beta distribution), the variance of the beta distribution is (4σ)2αβ (α+β)2(α+β+1) where α = 1.5 and β = 6. We also created a second version of benchmark 9, shown in Figures 20a and 20c, where the standard deviation of the new normal

distribution is q

(4σ)2αβ (α+β)2(α+β+1). This was done to enable a head-to-head comparison of the performance of rescheduling on beta and normal ROVER instances with similar variances, but for which skewness drives short versus long uncertain durations as compared to the normal case. Notably, this means performance of rescheduling for ROVERS with normal distributions in Figure 20c differs slightly from that of Figures 18a and 20a. Figure 20a aggregates the performance of normal and skew distributions on ROVER problems, showing the proportion of total value Qt obtained by each scheduling policy on the different classes of distributions. As with the previous figures, we see the same pattern for each rescheduling policy class for each distribution. We see that skewed right distributions lead to better quality schedules than skewed left distributions, with the scaled normal distributions falling in between the two. The pattern of improvement of solution quality with increased rescheduling is more or less the same for each of the three rescheduling approaches; the proportion of total schedule quality achieved is almost linearly shifted up or down as a function of skewness. This suggests problem structure dictates buy-back potential more strongly than skewness, and that skewness biases the initial schedule value up or down. Figure 20b shows the performance of derived benchmark 11, which uses skew right beta distributions with parameters β(1.5, 6). Figure 20c shows the performance of the second version derived benchmark 9 with the variance of normal distributions scaled such that they matched the variance of the associated skew left and right beta distributions, as described above. Figure 20d shows the performance of derived benchmark 9 which uses skew right beta distributions with parameters β(6, 1.5). Each of these figures shows shows the proportional value gained on the left y-axis and the missing value gained on the right y-axis. The improved performance across different rescheduling approaches for beta distributions follows a similar relationship to that of the normal distributions. However, the figures indicate that the overall buy-back potential of beta distributions is slightly lower than that of normal distributions, even when their variances are identical. The proportional value gained (left y-axis) for event-based scheduling in Figure 20c tops out at above 1.4% for normally distributed probabilistic durations, and is slightly lower for skew left (near 1.3%, Figure 20d) and even lower yet for skew right (near 1.2%, Figure20b.) The story is the same for missing value (right y-axis); Figure 20c tops out at around 9.6% for normally distributed durations, and is slightly lower for skew left (around 8.8%, Figure 20d) and even lower yet for skew right (around 8.1%, Figure20b.) This is consistent with the low potential of rescheduling to improve on the already high quality solutions for skew right case, and the ability of rescheduling to do better with normal distributions than with either of the beta distribution instances.

8. Conclusions and Future Work

When presented with a control problem on probabilistic simple temporal networks, the usual strategy of establishing strong controllability fails when constraints are guaranteed to be violated. To address this, we formally define a new type of temporal network, the Weighted Probabilistic Sim-

LASSITER, LAUFFER, & FRANK

0 50 100 150 200 250 300

Average number of reschedules

Proportion of total value (%)

πΓ,k πT,k πR,k

Missing value gained (%)

(a) The average proportion of value obtained for different rescheduling policies as a function of how many reschedules are performed on the ROVERS instances using derived problems 9 through 11. The solid lines represent derived benchmark 9 using scaled normal distributions, the dashed lines represent derived benchmark 10 with skew left beta distributions, and the dotted lines represent derived benchmark 11 with skew right beta distributions.

0 50 100 150 200 250

Average number of reschedules

Proportional value gained (%)

Skew right β(1.5, 6)

Missing value gained (%)

(b) The proportional value gained and the missing value gained for different rescheduling policies as a function of how many reschedules are performed on the ROVERS instances using derived benchmark 11. In this benchmark, probabilistic edges were altered to have skew right beta distributions with parameters β(1.5, 6).

ON EXPECTED VALUE STRONG CONTROLLABILITY

0 50 100 150 200 250

Average Number of reschedules

Proportional value gained (%)

Missing value gained (%)

(c) The proportional value gained and missing value gained for different rescheduling policies as a function of how many reschedules are performed on the ROVERS instances using derived benchmark 9 modified such that normal distributions on probabilistic edges have standard deviation normalized to match the standard deviation of the associated beta distributions.

0 50 100 150 200 250 300

Average number of reschedules

Proportional value gained (%)

Skew left β(6, 1.5)

Missing value gained (%)

(d) The proportional value gained and missing value gained for different rescheduling policies as a function of how many reschedules are performed on the ROVERS instances using derived benchmark 10. In this benchmark, probabilistic edges were altered to have skew left beta distributions with parameters β(6, 1.5).

LASSITER, LAUFFER, & FRANK

ple Temporal Network with uncertainty (WPSTNU), and develop algorithms to solve the Expected Value Strong Controllability (Ev SC) problem of finding a schedule maximizing the expected value of satisfied constraints on a WPSTNU. This formulation allows schedules to intentionally violate certain constraints in order to achieve maximum expected value during execution, providing a new way of handling over-constrained scenarios for which risk-based formulations are not appropriate. Solving the Ev SC problem for WPSTNUs is a nonlinear optimization problem. We first formulate a MILP whose optimal solution bounds below the expected value of a WPSTNU. This MILP relies on a piecewise linear lower bound constructed for each AR constraint. Coupled with binary variables that control whether or not each rejectable constraint is satisfied, the MILP can balance the trade-off between satisfying STN constraints and covering probability mass (and therefore expected value) associated with AR constraints. We first evaluate our MILP for Ev SC using a series of modifications of the HEATlab and MIT ROVERS data sets. The modifications use different value schemes, added constraints, and reduced makespan, all of which are designed to exercise tradeoffs of expected value. Gurobi demonstrates good performance on this MILP. Problems without additional constraints are typically solved in under half a second; adding more constraints produces problems that are exponentially more difficult, as expected, but they are typically solved in under two minutes, which is comparable to previously demonstrated approaches. Reducing the makespan and adding more constraints generally forces tradeoffs between constraints for both benchmarks, as anticipated. However, HEATlab instances appear less impacted by the addition of more constraints, because they are proportionally less constraining, and therefore more easily satisfied. For ROVERS problems, the expected value remains quite high even when the makespan is reduced by 50%. For HEATlab problems, the expected value is initially more variable, and reducing the makespan has more impact on the expected value. Even though the MILP is guaranteed to bound below the expected value, empirical analysis of the error shows the MILP achieves almost optimal solutions, achieving an error of 10 6 for symmetric normal distributions for ROVERS instances and 10 4 for HEATlab instances. We also compare instances with normal and beta distributions, and show that while skewness impacts the quality of MILP solutions, as expected, it does not have a significant impact on the error of the solution. We next introduce rescheduling approaches, which revise the schedule periodically after new information is gained. These approaches are characterized by an up-front fixed rescheduling period, and distinguished by the set of events or time ticks over which the rescheduling period is evaluated. Rescheduling approaches also require a selection criterion, which chooses to keep the current schedule or discard it in favor of the new schedule, by evaluating the expected value of both schedules. We prove that these rescheduling approaches always improve on the initial SC solution. We then empirically evaluate rescheduling approaches on the MIT ROVERS and HEATlab datasets, also varying the value schemes and distributions. We empirically quantify how well rescheduling improves the expected value, showing that the two benchmarks lead to different buyback in expected value, but that more rescheduling is better, with diminishing returns . Somewhat surprisingly, we find that rescheduling every k timepoint event occurrences (both controllable and uncontrollable) performs as well, or better, than rescheduling every k time units, with significantly less overall rescheduling effort. There are some differences in the relative merits of time-based rescheduling and event-based rescheduling on the HEATlaband ROVERS-derived benchmarks. Event-based rescheduling dominates on the HEATlab instances, and rescheduling after uncontrollables only is competitive. However, on the ROVERS instances, time-based and event-based approaches have comparable performance, while uncontrollable-only performs poorly. We identify a

ON EXPECTED VALUE STRONG CONTROLLABILITY

Benchmark Dist. Benchmark Findings 1 Normal ROVERS SC achieves > 95% max value 2 Normal ROVERS SC achieves 85 93% max value when makespan cut by 50% 3 Normal ROVERS SC achieves > 80% max value when adding constraints 2,3 Normal ROVERS More constraints, reduced makespan higher runtime 4 Beta ROVERS Left skew value < right skew value 4 Beta ROVERS Skewness does not influence error 5 Normal HEATlab SC achieves lower, more variable % max value than ROVERS 6 Normal HEATlab SC achieves 40 90% max value when makespan cut by 50% 7 Normal HEATlab SC achieves widely variable % max value when adding constraints 7 Normal HEATlab SC weakly increasing runtime when adding constraints 8 Normal HEATlab SC negligible impact of adding constraints 8 Normal HEATlab SC weakly increasing runtime of adding constraints 1-4 Both ROVERS Error is small (about 10 6) 5-8 Normal HEATlab Error is small (about 10 4)

5,9-11 Both Both Rescheduling improves value compared to SC 5,9-11 Both Both Event-based better than Time-based with fewer reschedules 5 Normal HEATLab HEATLab Time-based performance dependency (per Fig 13) 5 Normal HEATlab Regained 11% max value , 38.4% missing value 9 Normal ROVERS Regained 1.5% max value, 11% missing value (per Fig 18a) 10 Beta (left) ROVERS Regained 1.3% max value, 8.8% missing value 11 Beta (right) ROVERS Regained 1.2% max value, 8.1% missing value 10,11 Beta ROVERS Left skew value < right skew value

Figure 21: A summary of the main empirical results.

feature of the HEATLAB instances, namely AR constraints requiring pre-emptive scheduling, that is present in HEATlab instances, but not in ROVERS instances, responsible for the difference in performance. Finally, we find that instances with skew right distributions lead to better quality than skew left distributions, that the buy-back of skew distributions is somewhat smaller than that of normal distributions, and that there is a small difference in buy-back between skew left and skew right distributions, with skew left distributions providing slightly better buy-back. A summary of our main results is shown in Figure 21. We conclude by describing several areas of future work: formally characterizing dynamic controllability, exploring different rescheduling approaches, further characterization of the properties of WPSTNUs, and generalizations of WPSTNUs. While recent work (Akmal, Ammons, Li, Gao, Popowski, & Boerkoel, 2020) has resulted in algorithms for the dynamic control of PSTNs, the Expected Value Dynamic Controllability (Ev DC) problem on WPSTNUs remains open. Solving this problem will provide executives with compact approaches similar to optimal policies for MDPs, and should lead to better expected value than the rescheduling approaches described in this paper. Algorithms to solve the Ev DC problem are likely to be quite different than the techniques described in this work, and it is not clear how exactly to formulate, let alone solve, this problem. One intriguing line of research is to consider the Ev DC problem as a finite-horizon MDP problem. The key challenge lies in defining the state space and state transitions. Doing so requires an up-front discretization of time; it is not obvious how to choose the discretization, and discretization may lead to a large (exponential) state space. Other options include extending (Gao, Popowski, & Boerkoel, 2020) or other approaches to the expected value case.

LASSITER, LAUFFER, & FRANK

In lieu of solving the difficult Ev DC problem, the fixed-period rescheduling approaches introduced in this paper can be improved upon in a variety of ways. Our event-based approaches can still miss opportunities to respond to uncertainty, and time-based approaches generated without regard to the initial schedule produced by the MILP miss opportunities to react to event occurrences. A hybrid algorithm that reschedules at a fixed time-based cadence after executing controllable timepoints may outperform the approaches described in this paper. Other approaches may compute the ideal rescheduling time directly. We chose to explore a limited set of rescheduling approaches, using the true expected value as the foundation for our selection criterion. In particular, we note Proposition 1 does not, in fact, depend on the use of the MILP as the means to produce a new schedule, because the selection criterion uses the true expected value of a proposed new schedule s . This proposition also doesn t require the optimal MILP solution as a starting point either, for the same reason, nor does it depend on a periodic rescheduling policy. In fact, the proposition doesn t even depend on the production of a single schedule when rescheduling is invoked; an arbitrary number of schedules could be generated, and any schedule whose expected value exceeds the current schedule could be selected. This suggests that many rescheduling approaches, using different methods to generate the initial schedule and proposed changes, still satisfy Proposition 1; the sole requirement is to check the true expected value of the old and new schedules. We can also explore approaches that fall outside the confines of Proposition 1. For instance, can we simply use the MILP solution, and its quality, in the selection criterion? Suppose st is an optimal solution to the MILP we construct from P WE. The old schedule s can be evaluated as a candidate solution to this new MILP; doing so requires checking the satisfaction of resolved AR constraints, discarding the old values of λij, recomputing them using the new piecewise bounds of F t ij, and recomputing the MILP solution value. Denote this value by ˆBt. Denote the value of the optimal solution st to the new MILP by ˆB . As we describe in 5.3, the MILP bounds below the expected value, and so st has a (very small) likelihood of being a worse solution than s. We can protect against this possibility by comparing the quality of st to the new quality of s plus the error bound on the new solution, ˆϵ(st). The selection criterion χEV (s, st), using the MILP solution but accounting for the error, chooses st if ˆB > ˆBt + ˆϵ(st). Doing so ensures there is no question that rescheduling always uses the best of the two candidate schedules. We then see that

g(s ) ˆB > ˆBt + ˆϵ(st) g(s) ˆBt.

While this selection criterion preserves a formal proof that expected value is improved upon compared to the fixed schedule, it requires some additional work to compute the error ˆϵ(st), but eliminates the work performed by χEV to compute the true expected value. Alternately, we can simply not worry about computing the error and use criterion χm(s, st), which, chooses st if ˆB > ˆBt. This final criterion avoids the extra integration needed to compute the error, or the true expected value of either schedule, but is no longer provably able to improve on the fixed schedule expected value. Nevertheless, the error in the MILP is low enough that it should perform well in practice. Comparing the runtimes and schedule quality of these different selection criteria is an interesting area of future work. The HEATlab and ROVERS problems exhibit some differences in the potential for rescheduling to improve the expected value. The structure of the problems plays a role in the potential for rescheduling to improve expected value. The low potential improvement for the ROVERS instances

ON EXPECTED VALUE STRONG CONTROLLABILITY

could be increased by investigating the buy-back potential for derived benchmark 3 (with up to 50 more constraints) or 4 (reduced makespan) or by varying the value schemes. An investigation of the structural properties of WPSTNUs and other variants is a worthwhile line of research. The simple case of scalar-value preferences qc(ti, tj) can be extended to preferences over intervals, as is done for Simple Temporal Problems with Preferences (STPPs) (Rossi et al., 2006). Extending our work to address preferences creates a problem similar to the relaxable cc-p STP of (Fang et al., 2014), but unifies the treatment of the lower value of achieving relaxed constraints and expected value. While we believe much of the theory described in this paper can be reused, leading to a similar MILP formulation and complexity, in order to address this problem, the generalization remains to be explored. Finally, we can generalize WPSTNUs to express the quality of satisfying a group of constraints, rather than just a single constraint. Generalizing our MILP formulation is simple for weights of STN constraints, by dint of introducing a 0 1 variable that can only take on the value 1 if all constraints in a group are satisfied, via a simple linear inequality. However, if a single quality applies to multiple AR constraints, the probability of satisfying all such constraints is the product of their individual probabilities, which leads to a nonlinear optimization problem. It is also not clear how a single weight or value for satisfying a mix of STN and AR constraints should be modeled.

9. Acknowledgements

We thank the anonymous reviewers of previous versions of this work for their valuable comments. This work was funded by the NASA Advanced Exploration Systems Program.

Abrahams, J. R., Chu, D. A., Diehl, G., Knittel, M., Lin, J., Lloyd, L., Boerkoel, J., & Frank, J. (2019). Reducing the computational and communication overhead of robust agent rescheduling. In Proceedings of the 29th National Conference on Automated Planning and Scheduling, pp. 3 12. Akmal, S., Ammons, S., Li, H., Gao, M., Popowski, L., & Boerkoel, J. (2020). Quantifying controllability in temporal networks with uncertainty. Artificial Intelligence, 289, 33 84. Boyan, J., & Littman, M. (2000). Exact solutions to time-dependent MDPs. In Advances in Neural Information Processing Systems, pp. 1026 1032. Brooks, J., Reed, E., Gruver, A., & Boerkoel, J. (2015). Robustness in probabilistic temporal planning. In Proceedings of the 29th National Conference on Artificial Intelligence, pp. 3239 3246. Dechter, R., Meiri, I., & Pearl, J. (1991). Temporal constraint networks. Artificial Intelligence, 49, 61 94. Fang, C., Yu, P., & Williams, B. (2014). Chance-constrained probabilistic simple temporal problems. In Proceedings of the 28th National Conference on Artificial Intelligence, pp. 2264 2270. Gao, M., Popowski, L., & Boerkoel, J. (2020). Dynamic control of probabilistic simple temporal networks. In Proceedings of the 34th National Conference on Artificial Intelligence, pp. 9851 9858. Gurobi Optimization, LLC (2022). Gurobi Optimizer Reference Manual. Ibragimov, I. A. (1956). On the composition of unimodal distributions. Teor. Veroyatnost. i Primenen., 1(2), 283 288. Lund, K., Dietrich, S., Chow, S., & Boerkoel, J. (2017). Robust execution of temporal plans. In Proceedings of the 31st National Conference on Artificial Intelligence, pp. 3597 3604. Muscettola, N., Morris, P., & Vidal, T. (2001). Dynamic control of plans with temporal uncertainty. In Proceedings of the 17th International Joint Conference on Artificial Intelligence, pp. 494 499.

LASSITER, LAUFFER, & FRANK

Peinter, B., Moffitt, M. D., & Pollack, M. E. (2005). Solving overconstrained disjunctive temporal problems with preferences. In Proceedings of the 15th International Conference on Automated Planning and Scheduling, pp. 202 211.

Rossi, F., Venable, K. B., & Yorke-Smith, N. (2006). Uncertainty in soft temporal constraint problems: A general framework and controllability algorithms for the fuzzy case. Journal of Artificial Intelligence Research, 27, 617 674.

Saint-Guillain, M., Vaquero, T., Agrawal, J., & Chien, S. (2020). Robustness computation of dynamic controllability in probabilistic temporal networks with ordinary distributions. In Proceedings of the 29th

International Joint Conference on Artificial Intelligence, pp. 4168 4176.

Santana, P., Vaquero, T., Toledo, C., Wang, A., & Williams, B. (2016). PARIS: A polynomial-time, risksensitive scheduling algorithm for probabilistic simple temporal networks with uncertainty. In Proceedings of the 30th National Conference on Artificial Intelligence, pp. 267 275.

Tsamardinos, I. (2002). A probabilistic approach to robust execution of temporal plans with uncertainty. In Methods and Applications of Artificial Intelligence, pp. 97 108.

Vidal, T., & Fargier, H. (1999). Handling contingency in temporal constraint networks: from consistency to controllabilities. Journal of Experimental and Theoretical Artificial Intelligence, 11(1), 23 45.

Vidal, T., & Ghallab, M. (1996). Dealing with uncertain durations in temporal constraint networks dedicated to planning. In Proceedings of the 12th European Conference on Artificial Intelligence, pp. 48 54.

Wang, A. J., & Williams, B. (2015). Chance-constrained scheduling via conflict-directed risk allocation. In Proceedings of the 29th National Conference on Artificial Intelligence, pp. 3620 3627.

Weld, D., & Mausam (2006). Probabilistic temporal planning with uncertain durations. In Proceedings of the 21st National Conference on Artificial Intelligence, pp. 880 887.

Yu, P., Fang, C., & Williams, B. (2015). Resolving over-constrained probabilistic temporal problems through chance constraint relaxation. In Proceedings of the 29th National Conference on Artificial Intelligence, pp. 3425 3431.

Yu, P., Williams, B., Fang, C., Cui, J., & Haslum, P. (2017). Resolving over-constrained temporal problems with uncertainty through conflict-directed relaxation. Journal of Artificial Intelligence Research, 60, 425 490.