# prophet_inequalities_for_bayesian_persuasion__7138a7bf.pdf Prophet Inequalities for Bayesian Persuasion Niklas Hahn1 , Martin Hoefer1 and Rann Smorodinsky2 1 Goethe University Frankfurt, Germany 2 Technion, Israel {nhahn, mhoefer}@em.uni-frankfurt.de, rann@ie.technion.ac.il We study an information-structure design problem (i.e., a Bayesian persuasion problem) in an online scenario. Inspired by the classic gambler s problem, consider a set of candidates who arrive sequentially and are evaluated by one agent (the sender). This agent learns the value from hiring the candidate to herself as well as the value to another agent, the receiver. The sender provides a signal to the receiver who, in turn, makes an irrevocable decision on whether or not to hire the candidate. A-priori, for each agent the distribution of valuation is independent across candidates but may not be identical. We design good online signaling schemes for the sender. To assess the performance, we compare the expected utility to that of an optimal offline scheme by a prophet sender who knows all candidate realizations in advance. We show an optimal prophet inequality for online Bayesian persuasion, with a 1/2-approximation when the instance satisfies a satisfactory-status-quo assumption. Without this assumption, there are instances without any finite approximation factor. We extend the results to combinatorial domains and obtain prophet inequalities for matching with multiple hires and multiple receivers. 1 Introduction In many settings an informed agent wants to use private information in order to persuade other agents to take some action that he would benefit from. Consider a salesperson informed about the quality of the product who would like to maximize sales. Is full disclosure of the product quality to customers an optimal strategy? Perhaps revealing no information or revealing it partly would result in a higher sales volume. The study of optimal information disclosure, known as Bayesian persuasion, has gained enormous attention in the recent decade. The canonical model is one where the informed agent (the sender) commits to some information disclosure (or signaling) scheme before learning the true state of nature. Once the state is realized, the appropriate signal is sent to other agents (the receivers) who, in turn, take an action which results in payoffs for the sender and the receivers. Applications abound and can be found in diverse areas such as online advertisement [Badanidiyuru et al., 2018; Emek et al., 2012; Arieli and Babichenko, 2019], security problems [Rabinovich et al., 2015; Xu et al., 2015; Xu et al., 2016], medical research [Kolotilin, 2015], financialsector stress testing [Goldstein and Leitner, 2018], and voter coalition formation [Alonso and Cˆamara, 2016]. There are scenarios where this canonical model does not work well. For example, consider a routing problem where autonomous agents try to minimize their travel time over a network. Information about network congestion is available to a central planner who may share it with the agents in order to maximize the network s throughput. The planner receives the information gradually (road closures, broken traffic lights, traffic surges, etc.) and shares it gradually with the agents, who sequentially need to decide on the next edge to take in the network. In this case, we need to study online variants of the persuasion problem. In this paper, we study the online version of the persuasion problem in a simple setting. Inspired by the classical gambler s problem, we consider a setting where n tasks arrive sequentially. A sender learns the completion value of each incoming task, for herself as well as for the receiver. Before the tasks arrive the sender commits to some signaling scheme and at each stage, depending on this policy and the actual information received, the receiver decides whether or not to undertake the task. This decision is irrevocable, and the receiver has capacity for a single task. An alternative motivation could involve incoming threats with a limited defense capacity, job candidates for a single position, priority routing for incoming jobs, and more. Our goal is to provide good online signaling schemes that maximize the expected utility of the sender. We design simple schemes and characterize the impact of dynamic information revelation with prophet inequalities we compare the performance of our online schemes to the optimal expected utility for the sender that could be obtained from knowing all the information in advance. 1.1 Our Contribution We study online Bayesian persuasion inspired by prophet inequalities for the classic gambler s problem. At each round, i = 1, . . . , n, a pair of values (one for the sender and one for Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20) the receiver) is drawn from a commonly known prior distribution, Di. The pair of values is revealed to the sender who then partly shares this information with the receiver according to a signaling scheme committed to in advance. Based on this information, the receiver takes an irrevocable binary decision. We begin by considering the iid case (Di = Dj for all i, j) in Section 2. We show that a simple signaling scheme designed by Dughmi and Xu [2016] for the offline case can be applied online and that it provides a (1 1/e)-approximation. We then turn to the more general case under an additional assumption, referred to as the satisfactory-status-quo (or SSQ for short) assumption. Here, there exists an outside option (e.g. the current employee) for the receiver which has a deterministic value which is at least as profitable as her expected value of any of the candidates arriving online. An equivalent way of expressing this assumption is to assume that the last candidate has the maximum expected value for the receiver among all candidates. For this scenario, we provide an online signaling scheme that provides a 1/2-approximation and show that this is the best possible guarantee. Unfortunately, if we drop the SSQ assumption, then in general no scheme can guarantee any finite approximation. On the positive side, one can compute an optimal online scheme in polynomial time. In Section 3, we discuss extensions of our general design template. More concretely, we design online schemes for matching variants with multiple receivers and multiple candidate hires per receiver, when each receiver satisfies SSQ. Our schemes rely on an optimal solution to an LPrelaxation, combined with carefully designed probabilistic damping techniques from the area of Bayesian mechanism design. 1.2 Related Work Our work extends the study of prophet inequalities which was introduced by Krengel and Sucheston [1977]. More recently, the problem was the focus of a lot of research which showed improvements for special cases and introduced combinatorial variants of the original problem [D utting et al., 2017; Kleinberg and Weinberg, 2019; Alaei, 2014; Chawla et al., 2010; Correa et al., 2017; Correa et al., 2019; Esfandiari et al., 2015]. The study of Bayesian persuasion was initiated by Aumann and Maschler [1966] and came back into focus more recently following the work of Kamenica and Gentzkow [2011]. A plethora of authors worked on variants of persuasion [Celli et al., 2019; Arieli and Babichenko, 2019; Ely, 2017; Ely et al., 2015; Au, 2015; Dughmi and Xu, 2017], including combinations of persuasion with online learning and multi-armed bandit problems (see, e.g., [Kremer et al., 2014; Frazier et al., 2014; Mansour et al., 2015] and subsequent work). For a recent survey on related algorithmic work see Dughmi [2017]. An online model of Bayesian persuasion closely related to our work was studied in our recent work [Hahn et al., 2019]. We study a similar round-wise persuasion game with very different assumptions regarding the a-priori knowledge of sender and receiver. The scenario is inspired by the secretary problem candidate values are unknown and adversarially chosen but candidates arrive in uniform random or- der. This uncertainty also has consequences for the definition of persuasiveness. In contrast, in this paper we assume candidate values are drawn independently from known distributions, which allows to compute expected utilities and to apply standard notions of persuasiveness. Moreover, we use significantly different techniques for analysis. 1.3 Model In the basic version of our model, there are two players a sender S and a receiver R. There are n rounds, and in each round a candidate arrives. Candidate i has a type θi, and each type is associated with a pair of non-negative utility values, one for S and one for R. The type of candidate i is drawn independently from distribution Di. The n distributions D1, . . . , Dn are known to both players in advance and have finite support of size m. The probability that candidate i has type θi = j is denoted by qij for all1 i [n] and j [m]. The utility values of candidate i with type j for R and S are denoted by ρij and ξij, respectively. We use ρi = Ej Di[ρij] to denote the expected utility for R in round i. Upon arrival of a candidate in round i, S observes the type θi and the associated values and sends a signal σi to R. The receiver knows D1, . . . , Dn and signals σ1, . . . , σi 1. Upon reception of σi, she has to immediately decide whether she wants to hire or dismiss candidate i. The process ends once a candidate is hired or all n candidates have been dismissed. The decision made in each round is irrevocable. Each player strives to maximize its own utility of the hired candidate. As usual in Bayesian persuasion, we assume that the sender has commitment power [Kamenica and Gentzkow, 2011], i.e., S can commit in advance to a signaling scheme ϕ. In each round i, the scheme ϕ takes as input the vector of observed types (θ1, . . . , θi) and outputs a signal σi to R. We restrict the set of schemes to schemes ϕ that are direct and persuasive: In a direct scheme, all signals are σi {HIRE, NOT} for all i [n]. A persuasive scheme is one that is incentivecompatible, i.e., R maximizes her expected utility by following the recommendations. By a revelation-principle style argument, the assumption of a direct and persuasive scheme is without loss of generality [Kamenica and Gentzkow, 2011; Arieli and Babichenko, 2019]. We design online signaling schemes that are good for the sender, i.e., our goal is to find direct and persuasive schemes that maximize the sender s expected utility of the hired candidate. To avoid technicalities, we assume that R breaks ties in favor of S. We compare the expected utility to that of an optimal (direct and persuasive) offline scheme, in which (the prophet-version of) S can see the full vector of realizations (θ1, . . . , θn) in advance and signal to R accordingly. In Section 2, we briefly consider the iid scenario (with D1 = D2 = . . . = Dn) and make the following satisfactorystatus-quo assumption. Under such an assumption, there exists an external option E / [n] for R which is always available and gives the best expected utility for R (ρE ρi for all i [n]). An alternative formulation of this assumption is that the last candidate provides the highest expected value for R, i.e. ρn ρi for all i [n]. Thus, R can always wait for 1We use the short notation [x] = {1, 2, . . . , x} for x N. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20) the final candidate. Since we are interested in the worst case for S, we assume that the utility of S for the outside option is zero. 2 Single Candidate 2.1 A Simple Scheme for the IID-Case We start by briefly discussing the case of iid distributions. Optimal persuasive signaling in the offline case, in which S sees all realizations θ1, . . . , θn in advance, was studied by Dughmi and Xu [2016]. Consider an optimal offline scheme. We denote the ex-post distribution of the hired candidate by xo = (xo ij)i [n],j [m]. There is an optimal offline scheme that is symmetric, i.e., that yields xo ij = xo i j =: xo j, for all rounds i, i . Hence, the probability that a candidate in a round is recommended for hire is P j xo j = 1/n. Dughmi and Xu show that the additional constraints to ensure persuasiveness can be expressed by a linear polytope, and an optimal scheme can be computed by solving a polynomial-sized LP. In contrast to the optimal scheme, they also propose a simple approximate scheme, which relies on solving the following simpler LP with fewer constraints: Max. n P j xjξj s.t. n P j xj = 1 xj + (n 1)yj = qj j [m] P j xjρj P j yjρj xj, yj 0 j [m] The intuition is that xj is an ex-post probability of receiving a HIRE signal on a candidate of type j in some round i, while yj is an ex-post probability of getting a NOT signal. For this LP, the complex persuasiveness constraints are relaxed to simple ones the first two are symmetry and consistency constraints, the third one requires that hiring a candidate upon HIRE signal is better for R than hiring one upon NOT signal. The objective function yields the expected utility of S from the hired candidate. It is easy to see that the ex-post distribution for the optimal symmetric offline scheme xo is a feasible solution for this LP. Hence, an optimal LP solution x yields an upper bound on the expected utility for S in xo. For every feasible solution yj = (qj xj)/(n 1), where yj 0 whenever xj qj. We plug this into the third constraint and rearrange it to P j xjρj P j qjρj 1 n = ρ P j xj. Thus, LP (1) is equivalent to Max. n P j xjξj s.t. n P j xj = 1 xj qj j [m] P j xj xj 0 j [m] An LP-optimum x might not represent an ex-post distribution of some persuasive signaling scheme ϕ, since it adheres only to the simple, relaxed set of constraints. Dughmi and Xu propose a way to turn x into an persuasive signaling scheme, which we formulate as our simple scheme: In each round i [n 1], observe the type θi. If there has been no HIRE, signal HIRE independently with probability x θi/qθi, and NOT otherwise. In round n, signal HIRE if not done so before, and NOT otherwise. The simple scheme can be applied by S in the online scenario, because in round i it does not use information about future realizations. The proof of the following proposition follows from [Dughmi and Xu, 2016, Theorem 3.8]. Proposition 1. The simple scheme is persuasive in the online setting and yields a (1 1/e)-approximation. 2.2 Beyond IID In case of general distributions, our first result is that an optimal online scheme can be computed in polynomial time. The approach is via backwards induction and solving n 1 linear programs. Interestingly, this result contrasts the conditions for an optimal offline scheme, which is known to be hard (for details see Dughmi and Xu [2016]). Theorem 1. An optimal persuasive signaling scheme in the online setting can be computed in polynomial time. Proof. The mechanism signals at most one HIRE signal. If the receiver does not hire, the mechanism never sends a HIRE signal again. As such, we can assume that the online process ends upon sending the first HIRE signal. Now consider the case that we reach round i without having sent any HIRE signal so far. Let xij be the probability to signal HIRE upon arrival of a candidate of type j in round i. We determine the values of xij using an LP and the optimal solution for rounds i + 1, . . . , n. Let ξi and ρi be the expected utility for S and R from the optimal mechanism applied in rounds i, i + 1, . . . , n, respectively. Let ρi be the best expected utility for receiver in any single round i, i + 1, . . . , n, i.e., ρi = maxi {i,...,n} P j qi jρi j. In the last round it is optimal to set xnj = 1 for all j, since it is in the interest of both S and R to hire the last candidate. Hence ξn = P j qnjξnj and ρn = P j qnjρnj. Now suppose we have computed the optimal mechanism to be applied in rounds i + 1, . . . , n. Consider round i. The expected value of the sender is given by qijξij if a candidate of type j arrived and she signals HIRE, or qijξi+1 if a candidate of type j arrived and she signals NOT. Thus, S strives to maximize P j qij(xijξij + (1 xij)ξi+1) or, equivalently, ξi+1 + P j qijxij(ξij ξi+1) . Clearly, this is maximized for xij {0, 1} with xij = 1 if and only if ξij ξi+1. However, S also needs to incentivize R to follow the signal. Suppose R gets a HIRE signal in round i. If she accepts, her conditional expectation is P j qjxijρij P j qjxij. Upon rejection, she can only resort to the subsequent round with best expectation, i.e., ρi+1. Hence, R accepts upon a HIRE signal if and only if P j qijxij(ρij ρi+1) 0 . Suppose R gets a NOT signal in round i. If she accepts, her conditional expectation is P j qj(1 xij)ρij P j qj(1 xij). If she follows the mechanism, then by the inductive Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20) assumption that the mechanism is persuasive from round i+1 on, the best expected utility for R is ρi+1. Hence, R rejects upon a NOT signal if and only if P j qij(1 xij) ρi+1 P j qij(1 xij)ρij, which is equivalent to P j qijxij(ρij ρi+1) P j qijρij ρi+1 . Given the optimal mechanism for rounds i + 1, . . . , n we obtain the optimal mechanism for rounds i, . . . , n by solving the LP Max. ξi+1 + P j qijxij(ξij ξi+1) s.t. P j qijxij(ρij ρi+1) 0 P j qijxij(ρij ρi+1) P j qijρij ρi+1 xij [0, 1] j [m] . (3) By the inductive assumption, the mechanism is persuasive for rounds i + 1, . . . , n. This implies ρi+1 ρi+1, since following the mechanism must be at least as profitable for R as deviating to pick the candidate in the remaining round with highest expectation. Given this property, we observe that LP (3) always has a feasible solution. If P j qijρij ρi+1, then xij = 1 for all j [m] satisfies both constraints. Otherwise, if P j qijρij < ρi+1, then 0 > P j qijρij ρi+1 P j qijρij ρi+1, and setting xij = 0 for all j [m] satisfies both constraints. Although we can compute the optimal online mechanism in polynomial time, there are instances in which no finite approximation to the optimal offline signaling scheme can be obtained. This is a consequence of commitment power of the sender S publishes and commits to a signaling scheme ϕ in advance. By inspecting this scheme, R can determine if S in round i uses access to (θ1, . . . , θn) available in the offline case or (θ1, . . . , θi) available in the online case. Hence, R can determine whether the signal of S contains information about realizations in future rounds or not. This property is key for the following lower bound. Theorem 2. There are instances in which the optimal online signaling scheme yields an unbounded approximation ratio. Proof. Consider the following instance with n = 2 candidates. D1 is deterministic, a single realization with value pair (ρ11, ξ11) = (1, 0). D2 has two possible realizations with probability 1/2 each. The value pairs are (ρ21, ξ21) = (2 ε, 1) for some ε (0, 1), and (ρ22, ξ22) = (0, 0). Consider the optimal offline scheme, in which the sender knows both realizations. S signals HIRE for the second candidate if and only if the realization θ2 = (2 ε, 1). Otherwise, S signals HIRE for the first candidate. In this way, R always gets her optimal candidate the scheme is persuasive. The expected utility for S is 1/2. Now consider the online case. By inspecting the online scheme, the receiver realizes that the signal in the first round contains no information θ1 is perfectly known to both S and R, and S has no information about θ2. The signal in the second round, however, is irrelevant for R upon reaching round 2, it is a dominant strategy for R to hire the second candidate. As a consequence, R will take an action independent of the signal of S and accept the candidate in round 1, since it yields the higher expected value. The unique persuasive scheme in the online scenario for S is to signal HIRE in the first round, which has utility 0 for S. There is a broad set of conditions under which the online case leads to a drastic deterioration in expected sender utility. For example, if a later candidate has a golden-nugget - type distribution (small expected value, with tiny probability a super-valuable realization for both players), then an offline scheme can convince R to wait, since the signal contains the information that the golden nugget will indeed arrive. In contrast, an online scheme cannot transport this information, and hence R has an incentive to accept an earlier candidate with better expected value for her (and possibly much less value for S). 2.3 A Simple Scheme for SSQ In this section, we extend the idea of the simple scheme in the case that SSQ holds. The main condition to ensure a small constant approximation ratio is that the receiver has a canonical option for deviation. In the iid case a valid deviation from a HIRE signal in rounds i [n 1] is to simply take the last candidate in round n. This candidate has the best (in fact, the same) expected value for R as every other candidate i + 1, . . . , n 1. In this section, we assume that there is an external option E / [n] that has the best expectation for R, i.e., ρE ρi for all i = 1, . . . , n. This external option (i.e. the current employee to be replaced) can be chosen by R at any time. Equivalently, we assume that the last candidate has the highest expectation for R, i.e. ρn ρi, for all rounds i [n 1]. An optimal offline scheme can be obtained by solving an exponential-sized LP using the ideas in [Dughmi and Xu, 2016]. Instead, we set up a polynomial-sized LP as a natural extension of LP (2). The intuition for xij is the ex-post probability for a HIRE signal for candidate type j from Di: Max. P i,j xi,jξi,j s.t. P i,j xi,j 1 xij qij i [n], j [m] P j xijρij ρE P j xij i [n] xij 0 i [n], j [m] (4) Lemma 1. The optimal value of LP (4) is an upper bound on the expected utility for S in any offline persuasive signaling scheme. Proof. Consider an optimal offline scheme for S. Suppose xij be the ex-post probability that a HIRE signal is issued for candidate j from Di. We show that the vector x defined in this way is a feasible solution for LP (4). The utility function of the LP corresponds to the expected utility for S upon a successful recommendation. The constraint xij qij is fulfilled, since a candidate cannot be recommended more often than it arrives. The constraint P j xijρij ρE P j xij is fulfilled for all i [n], since the scheme is persuasive and therefore does not allow a profitable deviation upon a HIRE signal in round i to the outside option. The remaining two constraints Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20) Algorithm 1. Simple Scheme for SSQ Input: Distributions (Di)i [n], factors d = (di)i [n], online sequence of θi drawn from Di for rounds i = 1 to n do Upon seeing the draw θi from Di: W. prob. 1 di: Signal NOT, go to next round. Otherwise, w. prob. x iθi/qiθi: Signal HIRE now and NOT in all remaining rounds Otherwise: Signal NOT, go to next round simply state that x is a vector of probabilities that sum to at most 1 both conditions are fulfilled in the optimal offline scheme. Thus, x is a feasible solution for the LP. The optimal solution to LP (4) gives an upper bound on the expected utility for S in any persuasive scheme. Based on the optimal solution x to LP (4), we define a simple scheme for SSQ (Algorithm 1) and prove our main result of this section. Theorem 3. For a suitable choice of parameters d, the simple scheme is persuasive in the online setting satisfying SSQ and yields a 1/2-approximation. Proof. First, we show that for every set of damping parameters d [0, 1]n the scheme is persuasive. We subdivide the proof into two cases. In case 1 we assume S signals HIRE in round i and NOT in all previous rounds. By following the signal, Bayes rule shows that R gets an expected utility of P j qij xij di ρij/qij P j qij xij di/qij = P j xijρij P j xij ρE , where the last inequality follows from the LP-constraint P j xijρij ρE P j xij. In case the receiver decides to deviate and reject candidate θi, the scheme will not provide any more signals. Thus, ρE is the maximum utility that R can expect for such a deviation. Hence, R has an incentive to follow the signal. In case 2, we assume that there are only signals NOT in all rounds 1, . . . , i. If the receiver deviates and hires, she gets an expected utility of P j qij 1 xijdi ρij P j qij 1 xijdi P j(qij xijdi) ρij P j qij xijdi j qijρij di P j xijρij 1 di P j xij ρE diρE P j xij 1 di P j xij = ρE . The inequality arises from the SSQ-constraint P j qijρij = ρi ρE and the LP-constraint P j xijρij ρE P j xij. If, on the other hand, R obeys the signal and does not hire, the expected utility from the remaining scheme in rounds i + 1, . . . , n is at least ρE: If R gets a HIRE signal in one of the subsequent rounds i > i, we showed in case 1 that the conditional expectation upon hiring in round i is at least ρE. Otherwise, if R gets no HIRE signal in the later rounds, she can revert to the outside option and secure a value of ρE. This shows that the scheme is persuasive. Let us now show that it achieves a 1/2-approximation w.r.t. the optimal offline signaling scheme. Using Lemma 1, it is sufficient to show that the scheme achieves an expected utility for S that is at least 1/2 of the optimal value for LP (4). Following Chawla et al. [2010] as well as Alaei [2014], we use damping factors di for all i [n] defined as follows. We define ri = Pr[reaching round i]. It follows a recursion: r1 = 1, ri+1 = ri 1 di P j xij . We choose di = 1/(2ri), which yields ri di = 1/2 for every i [n]. di is welldefined since, inductively, ri 1/2. This is due to ri+1 = r1 1 2 P k i,j xkj and P i,j xij 1. As a consequence, S obtains an expected utility of P i ri di P j ξij qij x ij/qij = P i ri di P j x ijξij = 1 2 P i,j x ijξij , i.e., 1/2 of the optimal value of LP (4). The bound of 1/2 is best possible: Suppose n = 2, D1 is deterministic with (ρ11, ξ11) = (1, 1), D2 has two realizations ((ρ21, ξ21) = (n, n) with probability 1/n and (ρ22, ξ22) = (0, 0) with probability 1 1/n), and ρE = 1. The optimal offline scheme recommends the best candidate. It yields expected utility of 2 1/n for both S and R. Online schemes can only guarantee a utility of at most 1. 3 Extensions The approach in the previous section can be generalized to a variety of combinatorial problems when a good external option provides a canonical deviation opportunity for R. In this case, to obtain persuasiveness it suffices to provide R with an expected value of ρE conditioned on a HIRE signal. We discuss natural extensions with multiple hirings and multiple receivers. Our approach is again to solve an appropriate LPrelaxation for the ex-post distribution of hired candidates and compute a signal using carefully chosen probabilistic damping factors. For the latter we incorporate results from the area of Bayesian mechanism design with sequential posted prices [Alaei, 2014]. 3.1 Hiring Multiple Candidates Suppose R strives to hire 1 k n candidates and has k good outside options, i.e., k options with a value ρE each, where ρE ρi for all i [n]. We call this k-SSQ for short. The utilities of both S and R are additive over the hired candidates. Theorem 4. There is a persuasive signaling scheme in the online setting with k hires satisfying k-SSQ that yields a 1 1 k+3 -approximation. Proof. Consider LP (5) as the natural extension of LP (4). The optimal value constitutes an upper bound for the expected utility for S in the optimal offline scheme if we set xij to the ex-post probability of hiring candidate i of type j in the Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20) offline scheme, the vector x is feasible for the LP and the objective function value is the expected utility of S in the offline scheme. Max. P i,j xijξij s.t. P i,j xij k xij qij i [n], j [m] P j xijρij ρE P j xij i [n] xij 0 i [n], j [m] (5) To devise an online scheme, we use a natural extension of the simple scheme under SSQ. First solve LP (5) optimally, let x be the optimum solution. The decision in round i is again split into two steps. In step 1, send NOT with probability 1 di, otherwise advance to step 2. In step 2, send HIRE with probability x iθi/qiθi, and NOT otherwise. Overall, we send HIRE with probability di x iθi/qiθi. The exact same calculations as in Theorem 3 show that conditioned on a HIRE signal in round i, the expected value of the candidate for R is at least ρE. The same calculations show that conditioned on a NOT signal in round i, the expected value of the candidate for R is at most ρE. Hence, a deviation of R to an outside option or to hiring upon a NOT signal is not profitable and, thus, the scheme is persuasive. To show the approximation factor, we need to carefully design the damping scheme di for step 1. di should be high to hire good candidates in round i, but also low to ensure that better candidates in later rounds i > i can be hired. This is exactly the trade-off faced by the γ-Conservative Magician in [Alaei, 2014]. Here, a magician has k wands to open n boxes. If he applies a wand to a box, the box opens, but the wand breaks with some probability. Each box comes with a distribution for the value of the content and a probability that a wand breaks when opening it. In our context, the k wands are k positions for hire, the n boxes are n rounds, opening a box means surviving step 1, and breaking the wand means actually sending a HIRE signal in step 2. In [Alaei, 2014, Definition 3] Alaei devises an adaptive strategy to set the di such that every round i the joint probability of having at most k 1 HIRE signals and arriving at step 2 of round i is at least γk = 1 1 k+3. Hence, using his strategy to design the di, the expected utility for the sender in the resulting scheme is at least P j ξij qij x ij/qij = γk P i,j x ijξij . It recovers at least a γk-fraction of the optimum for LP (5). 3.2 Hiring with Multiple Receivers Private Signals Suppose there are ℓdifferent receivers R1, . . . , Rℓ. In every round i and for every t [ℓ], S sends Rt a private signal σ(t) i {HIRE, NOT} (c.f. [Dughmi and Xu, 2017] for results on private and public signaling channels in a related offline scenario). A problem arising with multiple receivers is feasibility of the assignment of candidates. There might be conflicting situations when several receivers simultaneously decide to hire the same candidate. We circumvent this problem due to the following: (1) Our scheme will ensure at most a single receiver gets a HIRE-signal in every round. (2) When deviating, the (weakly) most-preferred option for any receiver is the outside option, which does not interfere with the other receivers. Note that our scheme is persuasive even when every receiver assumes that she can hire every candidate, irrespective of the decisions made by others. Formally, a candidate type is a vector (ρijt, ξijt)t [ℓ] of value-pairs, for all i [n], j [m]. Receiver Rt strives to hire kt candidates and fulfills kt-SSQ, i.e. has kt unique good outside options with value ρ(t) E ρ(t) i , where ρ(t) i = Ej Di[ρijt]. We define k = mint kt. The utility of the sender is additive over all hired candidates. The utility of receiver Rt is additive over candidates hired by Rt. The proof of the following result is deferred to the full version of this paper. Theorem 5. There is a persuasive signaling scheme in the online setting with ℓreceivers R1, . . . , Rℓwith Rt having kt hires and satisfying kt-SSQ for all t [ℓ] that yields a 1 1 k+3 -approximation. Public Signals Let us briefly look at the case when S can only use public signals (instead of private signals studied above). With public signals S faces the feasibility problem discussed above, i.e., multiple receivers might get an incentive to hire the same candidate. Ensuring feasibility with persuasive signals can lead to a drastic performance loss for S, even in case of SSQ for all receivers. The proof of the following result is deferred to the full version of this paper. Proposition 2. There are instances with SSQ in which any persuasive signaling scheme with public signals yields an unbounded approximation ratio. Acknowledgements Niklas Hahn and Martin Hoefer are funded by the German Israel Foundation grant I-1419-118.4/2017. Rann Smorodinsky is funded by the joint United States-Israel Binational Science Foundation and National Science Foundation grant 2016734, German-Israel Foundation grant I-1419118.4/2017, Israel Ministry of Science and Technology grant 19400214, Technion VPR grants, and the Bernard M. Gordon Center for Systems Engineering at the Technion. References [Alaei, 2014] Saeed Alaei. Bayesian combinatorial auctions: Expanding single buyer mechanisms to many buyers. SIAM J. Comput., 43(2):930 972, 2014. [Alonso and Cˆamara, 2016] Ricardo Alonso and Odilon Cˆamara. Persuading voters. Amer. Econ. Rev., 106(11):3590 3605, 2016. [Arieli and Babichenko, 2019] Itai Arieli and Yakov Babichenko. Private bayesian persuasion. J. Econ. Theory, 182:185 217, 2019. [Au, 2015] Pak Hung Au. Dynamic information disclosure. RAND J. Econ., 46(4):791 823, 2015. [Aumann and Maschler, 1966] Robert Aumann and Michael Maschler. Game theoretic aspects of gradual disarmament. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20) Report of the US Arms Control and Disarmament Agency, 80:1 55, 1966. [Badanidiyuru et al., 2018] Ashwinkumar Badanidiyuru, Kshipra Bhawalkar, and Haifeng Xu. Targeting and signaling in ad auctions. In Proc. 29th Symp. Disc. Algorithms (SODA), pages 2545 2563, 2018. [Celli et al., 2019] Andrea Celli, Stefano Coniglio, and Nicola Gatti. Bayesian persuasion with sequential games. Co RR, abs/1908.00877, 2019. [Chawla et al., 2010] Shuchi Chawla, Jason Hartline, David Malec, and Balasubramanian Sivan. Multi-parameter mechanism design and sequential posted pricing. In Proc. 42nd Symp. Theory Comput. (STOC), pages 311 320, 2010. [Correa et al., 2017] Jos e Correa, Patricio Foncea, Ruben Hoeksma, Tim Oosterwijk, and Tjark Vredeveld. Posted price mechanisms for a random stream of customers. In Proc. 18th Conf. Econ. Comput. (EC), pages 169 186, 2017. [Correa et al., 2019] Jos e Correa, Paul D utting, Felix Fischer, and Kevin Schewior. Prophet inequalities for I.I.D. random variables from an unknown distribution. In Proc. 20th Conf. Econ. Comput. (EC), pages 3 17, 2019. [Dughmi and Xu, 2016] Shaddin Dughmi and Haifeng Xu. Algorithmic Bayesian persuasion. In Proc. 48th Symp. Theory Comput. (STOC), pages 412 425, 2016. [Dughmi and Xu, 2017] Shaddin Dughmi and Haifeng Xu. Algorithmic persuasion with no externalities. In Proc. 18th Conf. Econ. Comput. (EC), pages 351 368, 2017. [Dughmi, 2017] Shaddin Dughmi. Algorithmic information structure design: a survey. SIGecom Exchanges, 15(2):2 24, 2017. [D utting et al., 2017] Paul D utting, Michal Feldman, Thomas Kesselheim, and Brendan Lucier. Prophet inequalities made easy: Stochastic optimization by pricing non-stochastic inputs. In Proc. 27th Symp. Found. Comput. Sci. (FOCS), pages 540 551, 2017. [Ely et al., 2015] Jeffrey Ely, Alexander Frankel, and Emir Kamenica. Suspense and surprise. J. Political Econ., 123(1):215 260, 2015. [Ely, 2017] Jeffrey Ely. Beeps. Amer. Econ. Rev., 107(1):31 53, 2017. [Emek et al., 2012] Yuval Emek, Michal Feldman, Iftah Gamzu, Renato Paes Leme, and Moshe Tennenholtz. Signaling schemes for revenue maximization. In Proc. 13th Conf. Econ. Comput. (EC), pages 514 531, 2012. [Esfandiari et al., 2015] Hossein Esfandiari, Mohammad Taghi Hajiaghayi, Vahid Liaghat, and Morteza Monemizadeh. Prophet secretary. In Proc. 23rd European Symp. Algorithms (ESA), pages 496 508, 2015. [Frazier et al., 2014] Peter Frazier, David Kempe, Jon Kleinberg, and Robert Kleinberg. Incentivizing exploration. In Proc. 15th Conf. Econ. Comput. (EC), pages 5 22, 2014. [Goldstein and Leitner, 2018] Itay Goldstein and Yaron Leitner. Stress tests and information disclosure. J. Econ. Theory, 177:34 69, 2018. [Hahn et al., 2019] Niklas Hahn, Martin Hoefer, and Rann Smorodinsky. The secretary recommendation problem. Co RR, abs/1907.04252, 2019. [Kamenica and Gentzkow, 2011] Emir Kamenica and Matthew Gentzkow. Bayesian persuasion. Amer. Econ. Rev., 101(6):2590 2615, 2011. [Kleinberg and Weinberg, 2019] Robert Kleinberg and Matthew Weinberg. Matroid prophet inequalities and applications to multi-dimensional mechanism design. Games Econom. Behav., 113:97 115, 2019. [Kolotilin, 2015] Anton Kolotilin. Experimental design to persuade. Games Econom. Behav., 90:215 226, 2015. [Kremer et al., 2014] Ilan Kremer, Yishay Mansour, and Motty Perry. Implementing the wisdom of the crowd. J. Political Econ., 122:988 1012, 2014. [Krengel and Sucheston, 1977] Ulrich Krengel and Louis Sucheston. Semiamarts and finite values. Bull. Amer. Math. Soc, 83:745 747, 1977. [Mansour et al., 2015] Yishay Mansour, Aleksandrs Slivkins, and Vasilis Syrgkanis. Bayesian incentivecompatible bandit exploration. In Proc. 16th Conf. Econ. Comput. (EC), pages 565 582, 2015. [Rabinovich et al., 2015] Zinovi Rabinovich, Albert Xin Jiang, Manish Jain, and Haifeng Xu. Information disclosure as a means to security. In Proc. 14th Conf. Auton. Agents and Multi-Agent Syst. (AAMAS), pages 645 653, 2015. [Xu et al., 2015] Haifeng Xu, Zinovi Rabinovich, Shaddin Dughmi, and Milind Tambe. Exploring information asymmetry in two-stage security games. In Proc. 29th Conf. Artif. Intell. (AAAI), pages 1057 1063, 2015. [Xu et al., 2016] Haifeng Xu, Rupert Freeman, Vincent Conitzer, Shaddin Dughmi, and Milind Tambe. Signaling in Bayesian Stackelberg games. In Proc. 15th Conf. Auton. Agents and Multi-Agent Syst. (AAMAS), pages 150 158, 2016. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)