# multireceiver_online_bayesian_persuasion__0d6003c0.pdf Multi-Receiver Online Bayesian Persuasion Matteo Castiglioni 1 Alberto Marchesi 1 Andrea Celli 1 Nicola Gatti 1 Bayesian persuasion studies how an informed sender should partially disclose information to influence the behavior of a self-interested receiver. Classical models make the stringent assumption that the sender knows the receiver s utility. This can be relaxed by considering an online learning framework in which the sender repeatedly faces a receiver of an unknown, adversarially selected type. We study, for the first time, an online Bayesian persuasion setting with multiple receivers. We focus on the case with no externalities and binary actions, as customary in offline models. Our goal is to design no-regret algorithms for the sender with polynomial per-iteration running time. First, we prove a negative result: for any 0 < α 1, there is no polynomial-time no-α-regret algorithm when the sender s utility function is supermodular or anonymous. Then, we focus on the case of submodular sender s utility functions and we show that, in this case, it is possible to design a polynomial-time no1 1 e - regret algorithm. To do so, we introduce a general online gradient descent scheme to handle online learning problems with a finite number of possible loss functions. This requires the existence of an approximate projection oracle. We show that, in our setting, there exists one such projection oracle which can be implemented in polynomial time. 1. Introduction Bayesian persuasion was originally introduced by Kamenica & Gentzkow (2011) to model multi-agent settings where an informed sender tries to influence the behavior of a selfinterested receiver through the strategic provision of payoffrelevant information. Agents payoffs are determined by the receiver s action and some exogenous parameters collectively termed the state of nature, whose value is drawn from 1Politecnico di Milano, Milan, Italy. Correspondence to: Matteo Castiglioni . Proceedings of the 38 th International Conference on Machine Learning, PMLR 139, 2021. Copyright 2021 by the author(s). a common prior distribution and observed by the sender only. Then, the sender decides how much of her/his private information has to be revealed to the receiver, according to a public randomized policy known as signaling scheme. From the sender s perspective, this begets a decision-making problem that is essentially about controlling who gets to know what . This kind of problems are ubiquitous in application domains such as auctions and online advertising (Bro Miltersen & Sheffet, 2012; Emek et al., 2014; Badanidiyuru et al., 2018), voting (Alonso & Cˆamara, 2016; Cheng et al., 2015; Castiglioni et al., 2020a; Castiglioni & Gatti, 2021), traffic routing (Vasserman et al., 2015; Bhaskar et al., 2016; Castiglioni et al., 2021), recommendation systems (Mansour et al., 2016), security (Rabinovich et al., 2015; Xu et al., 2016), and product marketing (Babichenko & Barman, 2017; Candogan, 2019).1 The classical Bayesian persuasion model by Kamenica & Gentzkow (2011) makes the stringent assumption that the sender knows the receiver s utility exactly. This is unreasonable in practice. Recently, Castiglioni et al. (2020b) propose to relax the assumption by framing Bayesian persuasion into an online learning framework, focusing on the basic single-receiver problem.2 In their model, the sender repeatedly faces a receiver whose type during each iteration determining her/his utility function is unknown and adversarially selected beforehand. In this work, we extend the model by Castiglioni et al. (2020b) to multi-receiver settings, where the (unknown) type of each receiver is adversarially selected before each iteration of the repeated interaction. We consider the case in which the sender has a private communication channel towards each receiver, which is commonly studied in multi-receiver models (see, e.g., (Babichenko & Barman, 2016)). Dealing with multiple receivers introduces the additional challenge of correlating information disclosure across them and requires different 1Persuasion was famously attributed to a quarter of the GDP in the United States by Mc Closkey & Klamer (1995), with a more recent estimate placing this figure at 30% (Antioch et al., 2013). 2A recent work by Babichenko et al. (2021) relaxes the assumption in the offline setting. In that work, the goal is minimizing the sender s regret over a single iteration, and the authors provide positive results for the case in which the sender knows the ordinal preferences of the receiver over states of nature. The authors study the case of a single receiver with a binary action space, and an arbitrary (unknown) utility function. Multi-Receiver Online Bayesian Persuasion techniques from those used in the single-receiver setting. As customary when studying multi-receiver Bayesian persuasion problems (Dughmi & Xu, 2017; Xu, 2020), we address the case in which there are no inter-agent externalities, where each receiver s utility does not depend on the actions of the other receivers, but only on her/his own action and the state of nature. Moreover, we focus on the commonlystudied setting with binary actions (Babichenko & Barman, 2016; Arieli & Babichenko, 2019), and we analyze different scenarios depending on whether the sender s utility function is supermodular, submodular, or anonymous. Despite its simplicity, this basic model encompasses several realworld scenarios. For instance, think of a marketing problem in which a firm (sender) wants to persuade some potential buyers (receivers) to buy one of its products. Each buyer has to take a binary decision as to whether to buy a unit of the product or not, while the firm s goal is to strategically disclose information about the product to the buyers, so as to maximize the number of units sold. In this example, the sender s utility is anonymous, since it only depends on the number of buyers who decide to purchase (and not on their identities). Moreover, submodular sender s utilities represent diminishing returns in the number of items sold, while supermodular ones encode decreasing production costs. 1.1. Original Contributions Our goal is to design online algorithms for the sender that recommend a signaling scheme at each iteration of the repeated interaction, guaranteeing a sender s expected utility close to that of the best-in-hindsight signaling scheme. In particular, we look for no-α-regret algorithms, which collect an overall utility that is close to a fraction α of what can be obtained by the best-in-hindsight signaling scheme. In this work, we assume full-information feedback, which means that, after each iteration, the sender observes each receiver s type during that iteration. Moreover, we are interested in no-α-regret algorithms having a per-iteration running time polynomial in the size of the problem instance. To this end, we assume that the number of possible types of each receiver is fixed, otherwise polynomial-time no-α-regret algorithms cannot be obtained even in the degenerate case of only one receiver (Castiglioni et al., 2020b). In Section 4, we prove a negative result: for any 0 < α 1, there is no polynomial-time no-α-regret algorithm when the sender s utility function is supermodular or anonymous. Thus, in the rest of the work, we focus on the case in which the sender s utility function is submodular, where we provide a polynomial-time no1 1 e -regret algorithm.3 3Our result is tight, as there is no poly-time no-α-regret algorithm with α > 1 1 e. Indeed, it is NP-hard to approximate the sender s optimal utility within a factor > 1 1 e, even in the basic (offline) multi-receiver model of Babichenko & Barman (2016). As a first step in building our algorithm, in Section 5 we introduce a general online gradient descent (OGD) scheme to handle online learning problems with a finite number of possible loss functions. This can be applied to our setting, as we have a sender s utility function (or, equivalently, negative loss function) for every combination of receivers types obtained as feedback. The OGD scheme works in a modified decision space whose dimensionality is the number of observed loss functions, and it is not affected by the dimensionality of the original space. This is crucial in our setting, as it avoids dealing with the set of sender s signaling schemes, whose dimensionality grows exponentially in the number of receivers. Any OGD algorithm requires a projection oracle. Since in our setting an exact oracle cannot be implemented in polynomial time, we build our OGD scheme so that it works having access to a suitably-defined approximate projection oracle, which, as we show later, can be implemented in polynomial time in our model. In Section 6, we build a polynomial-time approximate projection oracle. First, we formulate the projection problem as a convex linearly-constrained quadratic program, which has exponentially-many variables and polynomially-many constraints. Next, we show how to compute in polynomial time an approximate solution to this program by applying the ellipsoid algorithm to its dual. Since the dual has polynomially-many variables and exponentially-many constraints, the algorithm needs access to a particular (problemdependent) polynomial-time separation oracle. Unfortunately, we do not have this in our setting, and, thus, our algorithm must rely on an approximate separation oracle. In general, running the ellipsoid method with an approximate separation oracle does not give any guarantee on the approximation quality of the returned solution. In order to make the ellipsoid algorithm return the desired approximate solution by only using an approximate separation oracle, we employ some ad-hoc technical tools suggested by a non-trivial primal-dual analysis. As a preparatory step towards our main result, at the beginning of Section 6, we use a derivation similar to that described so far to design a polynomial-time approximation algorithm for the offline version of our multi-receiver Bayesian persuasion problem, which may be of independent interest. In Section 7, we conclude the construction of the no1 1 e -regret algorithm by showing how to implement in polynomial time an 1 1 e -approximate separation oracle for settings in which the sender s utility is submodular. All the proofs omitted from the paper are in the Appendix. 1.2. Related Works Most of the computational works on Bayesian persuasion study (offline) models in which the sender knowns the receiver s utility function exactly. Dughmi & Xu (2016) initi- Multi-Receiver Online Bayesian Persuasion ate these studies with the single receiver case, while Arieli & Babichenko (2019) extend their work to multiple receivers without inter-agent externalities, with a focus on private signaling. In particular, they focus on settings with binary actions for the receivers and a binary space of states of nature. They provide a characterization of the optimal signaling scheme in the case of supermodular, anonymous submodular, and super-majority sender s utility functions. Babichenko & Barman (2016) extend this latter work by providing tight (1 1 e)-approximate signaling schemes for monotone submodular sender s utilities and showing that an optimal private signaling scheme for anonymous utility functions can be found efficiently. Dughmi & Xu (2017) generalize the previous model to settings with an arbitrary number of states of nature. There are also some works focusing on public signaling with no inter-agent externalities, see, among others, (Dughmi & Xu, 2017) and (Xu, 2020). The only computational work on Bayesian persuasion in an online learning framework is that of Castiglioni et al. (2020b), which, however, is restricted to the single-receiver case. The results and techniques in Castiglioni et al. (2020b) are different from those in our paper. In particular, they show that there are no polynomial-time no-α-regret algorithms even in settings with a single receiver, when the number of receiver s types is arbitrary. In contrast, we focus on settings in which the number of receivers types is fixed. Moreover, the main goal of Castiglioni et al. (2020b) is to design a (necessarily exponential-time) no-regret algorithm in the partial-information feedback setting in which the sender only observes the actions played by the receiver (and not her/his types). This is accomplished by providing slightly-biased estimators of the sender s utilities for different signaling schemes. In our work, we assume fullinformation feedback, and, thus, our main focus is dealing with multiple receivers. This also introduces the additional challenge of correlating information disclosure across the receivers and working with an exponential number of possible feedbacks (tuples specifying a type for each receiver). Our work is also related to the research line on online linear optimization with approximation oracles. In such setting, Kakade et al. (2009) show how to design a no-α-regret algorithm relying on an α-approximate linear optimization oracle, while Garber (2017) and Hazan et al. (2018) obtain analogous results with a better query complexity. The approach of these works to design approximate projection oracles is fundamentally different from ours, since they have access to a linear optimization oracle working in the learner s decision space. On the other hand, our OGD scheme works on a modified decision space, and the approximate projection oracle can only rely on an approximate sepration oracle dealing with the original sender s decision space. 2. Preliminaries There is a finite set R := {ri}n i=1 of n receivers, and each receiver r R has a type chosen from a finite set Kr := {kr,i}mr i=1 of mr different types (let m := maxr R mr). We introduce K := r R Kr as the set of type profiles, which are tuples k K defining a type kr Kr for each receiver r R.4 Each receiver r R has two actions available, defined by Ar := {a0, a1}. We let A := r R Ar be the set of action profiles specifying an action for each receiver. Sender and receivers payoffs depend on a random state of nature, which is selected from a finite set Θ := {θi}d i=1 of d states. The payoff of a receiver also depends on the action played by her/him, while it does not depend on the actions played by the other receivers, since there are no inter-agent externalities. Formally, a receiver r R of type k Kr has a utility ur,k : Ar Θ [0, 1]. For the ease of notation, we let ur,k θ := ur,k(a1, θ) ur,k(a0, θ) be the payoff difference for a receiver r of type k when the state of nature is θ Θ. The sender s utility depends on the actions played by all the receivers, and it is defined by us : A Θ [0, 1]. For the ease of presentation, for every state θ Θ, we introduce the function fθ : 2R [0, 1] such that fθ(R) represents the sender s utility when the state of nature is θ and all the receivers in R R play action a1, while the others play a0. As it is customary in Bayesian persuasion, we assume that the state of nature is drawn from a common prior distribution µ int( Θ), which is explicitly known to both the sender and the receivers.5 The sender can commit to a signaling scheme φ, which is a randomized mapping from states of nature to signals for the receivers. In this work, we focus on private signaling, where each receiver has her/his own signal that is privately communicated to her/him. Formally, there is a finite set Sr of possible signals for each receiver r R. Then, φ : Θ S, where S := r R Sr is the set of signal profiles, which are tuples s S defining a signal sr Sr for each receiver r R. We denote with φθ the probability distribution employed by φ when the state of nature is θ Θ, with φθ(s) being the probability of sending a signal profile s S. The one-shot interaction between the sender and the receivers goes on as follows: (i) the sender commits to a publicly known signaling scheme φ; (ii) she/he observes the realized state of nature θ µ; (iii) she/he draws a signal profile s φθ and communicates to each receiver r R signal sr; and (iv) each receiver r R rationally updates her/his prior belief over Θ according to the Bayes rule and selects an action maximizing her/his expected utility. We remark that, given a signaling scheme φ, a receiver r R of type k Kr 4All vectors and tuples are denoted by bold symbols. For any vector (tuple) x, the value of its i-th component is denoted by xi. 5int(X) is the interior of a set X, while X is the set of all the probability distributions over a set X. Multi-Receiver Online Bayesian Persuasion observing a private signal s Sr experiences an expected utility P s S:sr=s φθ(s) ur,k(a, θ) (up to a normalization constant) when playing action a Ar. Assuming the receivers type profile is k K, the goal of the sender is to commit to an optimal signaling scheme φ, which is one maximizing her/his expected utility f(φ, k) := P s S φθ(s) fθ(Rk s), where we let Rk s R be the set of receivers who play a1 after observing their private signal sr in s, under signaling scheme φ. Assumptions In the rest of this work, we assume that the the sender s utility is monotone non-decreasing in the set of receivers playing a1. Formally, for each state θ Θ, we let fθ(R) fθ(R ) for every R R R, while fθ( ) = 0 for the ease of presentation. Moreover, we assume that the number of types mr of each receiver r R is fixed; in other words, the value of m cannot grow arbitrarily large.6 Direct Signaling Schemes By well-known revelationprinciple-style arguments (Kamenica & Gentzkow, 2011; Arieli & Babichenko, 2019), we can restrict our attention to signaling schemes that are direct and persuasive. In words, a signaling scheme is direct if signals correspond to recommendations of playing actions, while it is persuasive if the receivers do not have any incentive to deviate from the recommendations prescribed by the signals they receive. In our setting, a direct signal sent to a receiver specifies an action recommendation for each receiver s type; thus, we let Sr := 2Kr for every r R. A signal s Sr for a receiver r R is encoded by a subset of her/his types, namely s Kr. Intuitively, s can be interpreted as the recommendation to play action a1 when the receiver has type k Kr such that k s, while a0 otherwise. Given a direct and persuasive signaling scheme φ, for a signal profile s S and a type profile k K, the set Rk s appearing in the definition of the sender s expected utility f(φ, k) can be formally expressed as Rk s := {r R | kr sr}. Set Functions and Matroids In Section 7, we show how to implement our approximate separation oracle by optimizing functions fθ over suitably defined matroids (representing signals). Next, we introduce the necessary definitions on set functions and matroids. For the ease of presentation, we consider a generic function f : 2G [0, 1] for a finite set G. We say that f is submodular, respectively supermodular, if for I, I G: f(I I )+f(I I ) f(I)+f(I ), respectively f(I I )+f(I I ) f(I)+f(I ). The function f is anonymous if f(I) = f(I ) for all I, I G : |I| = |I |. A matroid M := (G, I) is defined by a finite ground set 6The monotonicity assumption is w.l.o.g. for this work, since our main positive result (Theorem 7) relies on it. Instead, assuming a fixed number of types is necessary, since, even in single-receiver settings, designing no-regret algorithms with running time polynomial in m is intractable (Castiglioni et al., 2020b). G and a collection I of independent sets, i.e., subsets of G satisfying some characterizing properties (see (Schrijver, 2003) for a detailed formal definition). We denote by B(M) the set of the bases of M, which are the maximal sets in I. 3. Multi-Receiver Online Bayesian Persuasion We consider a multi-receiver generalization of the online setting introduced by Castiglioni et al. (2020b). The sender plays a repeated game in which, at each iteration t [T], she/he commits to a signaling scheme φt, observes the realized state of nature θt µ, and privately sends signals determined by st φt θt to the receivers.7 Then, each receiver (whose type is unknown to the sender) selects an action maximizing her/his expected utility given the observed signal (in the one-shot interaction at iteration t). We focus on the problem of computing a sequence {φt}t [T ] of signaling schemes maximizing the sender s expected utility when the sequence of receivers types {kt}t [T ], with kt K, is adversarially selected beforehand. After each iteration t [T], the sender gets payoff f(φt, kt) and receives a full-information feedback on her/his choice at t, which is represented by the type profile kt. Therefore, after each iteration, the sender can compute the expected utility f(φ, kt) guaranteed by any signaling scheme φ she/he could have chosen during that iteration. We are interested in an algorithm computing φt at each iteration t [T]. We measure the performance of one such algorithm using the α-regret RT α. Formally, for 0 < α 1, RT α := α max φ t [T ] f(φ, kt) E t [T ] f(φt, kt) where the expectation is on the randomness of the algorithm. The classical notion of regret is obtained for α = 1. Ideally, we would like an algorithm that returns a sequence {φt}t [T ] with the following properties: the α-regret is sublinear in T for some 0 < α 1; the number of computational steps it takes to compute φt at each iteration t [T] is poly(T, n, d), that is, it is a polynomial function of the parameters T, n, and d An algorithm satisfying the first property is called a no-αregret algorithm (it is no-regret if it does so for α = 1). In this work, we focus on the weaker notion of α-regret since, as we discuss next, requiring no-regret is oftentimes too limiting in our setting (from a computational perspective). 7Throughout the paper, the set {1, . . . , x} is denoted by [x]. Multi-Receiver Online Bayesian Persuasion 4. Hardness of Being No-α-Regret We start with a negative result. We show that designing noα-regret algorithms with polynomial per-iteration running time is an intractable problem (formally, it is impossible unless NP RP) when the sender s utility is such that functions fθ are supermodular or anonymous. This hardness result is deeply connected with the intractability of the offline version of our multi-receiver Bayesian persuasion problem that we formally define in the following Section 4.1. Then, Section 4.2 collects all the hardness results. 4.1. Offline Multi-Receiver Bayesian Persuasion We consider an offline setting where the receivers type profile k K is drawn from a known probability distribution (rather then being selected adversarially at each iteration). Given a subset of possible type profiles K K and a distribution λ int( K), we call BAYESIAN-OPT-SIGNAL the problem of computing a signaling scheme that maximizes the sender s expected utility. This can be achieved by solving the following LP of exponential size.8 s S φθ(s)fθ(Rk s) (1a) s S:sr=s φθ(s)ur,k θ 0 r R, s Sr, k Kr : k s (1b) X s S φθ(s) = 1 θ Θ (1c) φθ(s) 0 θ θ, s S. (1d) 4.2. Hardness Results First, we study the computational complexity of finding an approximate solution to BAYESIAN-OPT-SIGNAL. In particular, given 0 < α 1, we look for an α-approximate solution in the multiplicative sense, i.e., a signaling scheme providing at least a fraction α of the sender s optimal expected utility (the optimal value of LP (1)). Theorem 1 provides our main hardness result, which is based on a reduction from the promise-version of LABEL-COVER (see Appendix A for its definition and the proof of the theorem). Theorem 1. For every 0 < α 1, it is NP-hard to compute an α-approximate solution to BAYESIAN-OPT-SIGNAL, even when the sender s utility is such that, for every θ Θ, fθ(R) = 1 iff |R| 2, while fθ(R) = 0 otherwise. Notice that Theorem 1 holds for problem instances in which 8Constraints (1b) encode persuasiveness for the signals recommending to play a1. The analogous constraints for a0 can be omitted. Indeed, by assuming that each fθ is non-decreasing in the set of receivers who play a1, any signaling scheme in which the sender recommends a0 when the state is θ and the receiver prefers a1 over a0 can be improved by recommending a1 instead. functions fθ are anonymous. Moreover, the reduction can be easily modified so that functions fθ are supermodular and satisfy fθ(R) = max{0, |R| 1} for R R. Thus: Corollary 1.1. For 0 < α 1, it is NP-hard to compute an α-approximate solution to BAYESIAN-OPT-SIGNAL, even when the sender s utility is such that functions fθ are supermodular or anonymous for every θ Θ. By using arguments similar to those employed in the proof of Theorem 6.2 by Roughgarden & Wang (2019), the hardness of computing an α-approximate solution to the offline problem can be extended to designing no-α-regret algorithms in the online setting. Then: Theorem 2. For every 0 < α 1, there is no polynomialtime no-α-regret algorithm for the multi-receiver online Bayesian persuasion problem, unless NP RP, even when functions fθ are supermodular or anonymous for all θ Θ. In the rest of the work, we show how to design a polynomialtime no-(1 1 e)-regret algorithm for the case in which the sender s utility is such that functions fθ are submodular. 5. An Online Gradient Descent Scheme with Approximate Projection Oracles As a first step in building our polynomial-time algorithm, we introduce our OGD scheme with an approximate projection oracle. Intuitively, it works by transforming the multi-receiver online Bayesian persuasion setting into an equivalent online learning problem whose decision space does not need to explicitly deal with signaling schemes (thus avoiding the burden of having an exponential number of possible signal profiles). The OGD algorithm is then applied on this new domain. In our setting, we do not have access to a polynomial-time (exact) projection oracle, and, thus, we design and analyze the algorithm assuming access to an approximate one only. As we show later in Sections 6 and 7, such approximate projection oracle can be implemented in polynomial time when the functions fθ are submodular. Let us recall that the OGD scheme that we describe in this section is general and applies to any online learning problem with a finite number of possible loss functions. 5.1. A General Approach Consider an online learning problem in which the learner takes a decision yt Y at each iteration t [T]. Then, the learner observes a feedback et E, where E is a finite set of p possible feedbacks. The reward (or negative loss) of a decision y Y given feedback e E is defined by u(y, e) for a given function u : Y E [0, 1]. Thus, the learner is awarded u(yt, et) for decision yt at iteration t, while she/he would have achieved u(y, et) for any other choice y Y. Multi-Receiver Online Bayesian Persuasion We transform this general online learning problem to a new one in which the learner s decision set is X [0, 1]p with: n x [0, 1]p | xe u(y, e) e E o . (2) Intuitively, the set X contains all the vectors whose components xe (one for each feedback e E) are the learner s rewards u(y, e) for some decision y Y in the original problem. Moreover, the inequality in the definition of X also includes all the reward vectors that are dominated by those corresponding to some decision in Y. At each iteration t [T], the learner takes a decision xt X and observes a feedback et E. The reward of decision x X at iteration t is the et-th component of x, namely xet. It is sometimes useful to write it as 1 etx, where 1et {0, 1}p is a vector whose et-th component is 1, while all the others are 0. Thus, the learner s reward at iteration t is xt et. Notice that the size of the decision set X of the new online learning setting does not depend on the dimensionality of the original decision set Y (which, in our setting, would be exponential), but only on the number of feedbacks p. If Y and u are such that X is compact and convex, then we can minimize the α-regret RT α in the original problem by doing that in the new setting. Let us introduce the set αX := {αx | x X} for any 0 < α 1. Given a sequence of feedbacks {et}t [T ] and a sequence of decisions {xt}t [T ], with et E and xt X, we have that: RT α := max x αX t [T ] 1 et x xt t [T ] u(y, et) X t [T ] u(yt, et), where {yt}t [T ] is a sequence of decisions yt Y for the original problem such that xt e u(yt, e) for e E. We assume to have access to an approximate projection oracle for αX, which we define in the following. By letting E E be a subset of feedbacks, we define τE : X [0, 1]p as the function mapping any vector x X to another one that is equal to x in all the components corresponding to feedbacks e E, while it is 0 everywhere else. Moreover, we let XE := {τE(x) | x X} be the image of X through τE, while αXE := {αx | x XE} for 0 < α 1. Definition 1 (Approximate projection oracle). Consider a subset of feedbacks E E, a vector y [0, 2]p such that ye = 0 for all e / E, and an approximation error ϵ R+. Then, for any 0 < α 1, an approximate projection oracle ϕα(E, y, ϵ) is an algorithm returning a vector x XE and a decision y Y with xe u(y, e) for all e E, such that: ||x x||2 ||x y||2 + ϵ x αXE. Intuitively, ϕα returns a vector x XE that is an approximate projection of y onto the subspace αXE. The vector x can be outside of αXE. However, it is better than a projection onto αXE, since, ignoring the ϵ error, x is closer than y to any vector in αXE. Moreover, ϕα also gives a decision y Y that corresponds to the returned vector x. Notice that, if α = 1 and ϵ = 0, this is equivalent to find an exact projection onto the subspace XE. 5.2. A Particular Setting: Multi-Receiver Online Bayesian Persuasion Our setting can be easily cast into the general learning framework described so far. The possible feedbacks are type profiles, namely E := K, while the receivers type profile kt K is the feedback observed at iteration t [T], namely et := kt. Notice that the number of possible feedbacks is p is mn, which is exponential in the number of receivers. The decision set of the learner (sender) Y is the set of all the possible signaling schemes φ, with yt := φt being the one chosen at iteration t. The rewards observed by the sender are the utilities f(φ, k); formally, for every signaling scheme φ and type profile k K, which define a pair y Y and e E using the generic notation, we let u(y, e) := f(φ, k). Then, the new decision set X [0, 1]|K| is defined as in Equation (2). Notice that X is a compact and convex set, since it can be defined by a set of linear inequalities. In the following, we overload the notation and, for any subset K K of types profiles, we let XK := XE for E E : E = K. 5.3. OGD with Approximate Projection Oracle Algorithm 1 is an OGD scheme that operates in the X domain by having access to an approximate projection oracle ϕα (we call the algorithm OGD-APO). The procedure in Algorithm 1 keeps track of the set Et E of different feedbacks observed up to each iteration t [T]. Moreover, it works on the subspace XEt, whose vectors are zero in all the components corresponding to feedbacks e / Et. Since it is the case that |Et| t, the procedure in Algorithm 1 attains a per-iteration running time that is independent of the number of possible feedbacks p. Algorithm 1 OGD-APO Input: approximate projection oracle ϕα learning rate η (0, 1] approximation error ϵ [0, 1] Initialize y1 Y, E0 , and x1 0 XE1 for t = 1, . . . , T do Take decision yt Observe feedback et E and reward u(yt, et) = xt et Et Et 1 {et} yt+1 xt + η1et xt+1, yt+1 ϕα Et, yt+1, ϵ Multi-Receiver Online Bayesian Persuasion Next, we bound the α-regret incurred by Algorithm 1. Theorem 3. Given an oracle ϕα (as in Definition 1) for some 0 < α 1, a learning rate η (0, 1], and an approximation error ϵ [0, 1], Algorithm 1 has α-regret with a per-iteration running time poly(t). By setting η = 1 T , we get RT α T 1 + |ET | Notice that the bound only depends on the number of observed feedbacks |ET |, while it is independent of the overall number of possible feedbacks p. This is crucial for the multi-receiver online Bayesian persuasion case, where p is exponential in the the number of receivers n. On the other hand, as T goes to infinity, we have |ET | p, so that the regret bound is sublinear in T. 6. Constructing a Poly-Time Approximate Projection Oracle The crux of the OGD-APO algorithm (Algorithm 1) is being able to perform the approximate projection step. In this section, we show that, in the multi-receiver Bayesian persuasion setting, the approximate projection oracle ϕα required by OGD-APO can be implemented in polynomial time by an appropriately-engineered ellipsoid algorithm. This calls for an approximate separation oracle Oα (see Definition 2). We proceed as follows. In Section 6.1, we define an appropriate notion of approximate separation oracle, and show how to find, in polynomial time, an α-approximate solution to the offline problem BAYESIAN-OPT-SIGNAL. This is a preparatory step towards the understanding of our main result in this section, and it may be of independent interest. Then, in Section 6.2, we exploit some of the techniques introduced for the offline setting in order to build ϕα starting from an approximate separation oracle Oα. 6.1. Warming Up: The Offline Setting An approximate separation oracle Oα finds a signal profile s S that approximately maximizes a weighted sum of the fθ functions, plus a weight for each receiver which depends on the signal sr sent to that receiver. Formally: Definition 2 (Approximate separation oracle). Consider a state θ Θ, a subset K K, a vector λ R|K| + , weights w = (wr,s)r R,s Sr with wr,s R and wr, = 0 for all r R, and an approximation error ϵ R+. Then, for any 0 < α 1, an approximation oracle Oα(θ, K, λ, w, ϵ) is an algorithm returning an s S such that: k K λkfθ(Rk s) + X k K λkfθ(Rk s ) + X in time poly n, |K|, maxr,s |wr,s|, maxk λk, 1 As a preliminary result, we show how to use an oracle Oα to find in polynomial time an α-approximate solution to BAYESIAN-OPT-SIGNAL (see Section 4). This problem is interesting in its own right, and allows us to develop a line of reasoning that will be essential to prove Theorem 5. Theorem 4. Given ϵ R+ and an approximate separation oracle Oα, with 0 < α 1, there exists a polynomial-time approximation algorithm for BAYESIAN-OPT-SIGNAL returning a signaling scheme with sender s utility at least αOPT ϵ, where OPT is the value of an optimal signaling scheme. Moreover, the algorithm works in time poly( 1 Proof Overview. The dual of LP (1) has a polynomial number of variables and an exponential number of constraints, and a natural way to prove polynomial-time solvability would be via the ellipsoid method (see, e.g., (Khachiyan, 1980; Gr otschel et al., 1981)). However, in our setting, we can only rely on an approximate separation oracle, which renders the traditional ellipsoid method unsuitable for our problem. We show that it is possible to exploit a binary search scheme on the dual problem to find a value γ [0, 1] such that the dual problem with objective γ is feasible, while the dual with objective γ β, β 0, is infeasible. That algorithm runs in log(β) steps. At each iteration of the algorithm, we solve a feasibility problem through the ellipsoid method equipped with an appropriate approximate separation oracle which we design. In order to build a poly-time separation oracle we have to carefully manage all the settings in which Oα would not run in polynomial time, according to Definition 2. Specifically, we need to properly manage large values of the weights w, since Oα is polynomial in maxr,s |wr,s|. Once we do that, the approximate separation oracle is guaranteed to find a violated constraint, or to certify that all constraints are approximately satisfied. Finally, we show that the approximately feasible solution computed via bisection allows one to recover an approximate solution to the original problem 6.2. From an Approximate Separation Oracle to an Approximate Projection Oracle Now, we show how to design a polynomial-time approximate projection oracle ϕα using an approximate separation oracle Oα. The proof employs a convex linearly-constrained Multi-Receiver Online Bayesian Persuasion quadratic program that computes the optimal projection on X, the ellipsoid method, and a careful primal-dual analysis. Theorem 5. Given a subset K K, a vector y [0, 2]|K| such that yk = 0 for all k / K, and an approximation error ϵ R+, for any 0 < α 1, the approximate projection oracle ϕα(K, y, ϵ) can be computed in polynomial time by querying the approximate separation oracle Oα. Proof Overview. We start by defining a convex minimization problem, which we denote by P , for computing the projection of y on XK. Then, we work on the dual of P , which we suitably simplify by reasoning over the KKT conditions of the problem. As in the proof of Theorem 4, we proceed by repeatedly applying the ellipsoid method on a feasibility problem obtained from the dual, decreasing the required objective γ by a small additive factor β. The ellipsoid method is equipped with the approximate separation oracle that employs the oracle in Definition 2 and carefully manages the cases in which Oα would not run in polynomial time. In this case, the problem is complicated by the fact that we have to determine an approximate projection over αXK, rather than an approximate solution to P . We found two dual problems such that one dual problem with objective γ is feasible, while the second one with objective γ + β is infeasible. From these problems, we define a new convex optimization problem that is a modified version of P and has value at least γ . Then, we show that a solution to this problem is close to a projection on a set which includes αXK. Finally, we restrict P to the primal variables corresponding to the set of (polynomially-many) violated dual constraints determined during the last application of the ellipsoid method that returns unfeasible, i.e., where the ellipsoid method for feasibility problem is run with objective γ + β. We conclude the proof by showing that a solution to this restricted problem is precisely an approximate projection on a superset of αXK. 7. A Poly-Time No-α-Regret Algorithm for Submodular Sender s Utilities In this section, we conclude the construction of our polynomial-time no-(1 1 e)-regret algorithm for settings in which sender s utilities are submodular. The last component that we need to design is an approximate separation oracle Oα (see Definition 2) running in polynomial time. Next, we show how to obtain this by exploiting the fact that functions fθ are submodular in the set of receivers playing action a1. First, we establish a relation between direct signals S and matroids. We define a matroid MS := (GS, IS) such that: the ground set is GS := {(r, s) | r R, s Sr}; a subset I GS belongs to IS if and only if I contains at most one pair for each receiver r R. The elements of the ground set GS represent receiver, signal pairs. However, sets I IS do not characterize signal profiles, as they may not define a signal for each receiver. Indeed, direct signal profiles are captured by the basis set B(MS) of the matroid MS. Let us recall that B(MS) contains all the maximal sets in IS, and, thus, a subset I IS belongs to B(MS) if and only if I contains exactly one pair for each receiver r R. Intuitively, a basis I B(MS) defines a direct signal profile s S in which, for each receiver r R, all the receiver s types in s Sr such that (r, s) I are recommended to play action a1, while the others are told to play a0. The following Theorem 6 provides a polynomial-time approximation oracle O1 1 e for instances in which fθ is submodular for each state of nature θ Θ. The core idea of its proof is that P k K λkfθ(Rk s) (see Equation (3)) can be seen as a submodular function defined for the ground set GS and optimizing over direct signal profiles s S is equivalent to doing that over the bases B(MS) of the matroid MS. Then, the result is readily proved by exploiting some results concerning the optimization over matroids.9 Theorem 6. If the sender s utility is such that function fθ is submodular for each θ Θ, then there exists a polynomialtime separation oracle O1 1 In conclusion, by letting KT K be the set of receivers type profiles observed by the sender up to iteration T, the following Theorem 7 provides our polynomial-time no-(1 1 e)- regret algorithm working with submodular sender s utilities. Theorem 7. If the sender s utility is such that function fθ is submodular for each θ Θ, then there exists a no-(1 1 e)- regret algorithm having (1 1 with a per-iteration running time poly(T, n, d). Proof. We can run Algorithm 1 on an instance of our multireceiver online Bayesian persuasion problem. By Theorem 3, if we set η = 1 T , and α = 1 1 e, we get the desired regret bound (notice that the set of observed feedbacks is Et = Kt in our setting). Algorithm 1 employs an approximate projection oracle ϕ1 1 e that we can implement in polynomial time by using the algorithm provided in Theorem 5. This requires access to a polynomial-time approximate separation oracle O1 1 e , which can be implemented by using Theorem 6, under the assumption that the sender s utility is such that functions fθ are submodular. 9The separation oracle provided in Theorem 6 guarantees the desired approximation factor with arbitrary high probability. It is easy to see that, since the algorithm fails with arbitrary small probability, this does not modify our regret bound except for an (arbitrary small) negligible term. Multi-Receiver Online Bayesian Persuasion Notice that the regret bound only depends on the number |KT | of receivers type profiles observed up to iteration T, while it is independent of the overall number of possible type profiles |K| = mn, which is exponential in the number of receivers. Thus, the (1 1 e)-regret is polynomial in the size of the problem instance provided that the type profiles received as feedbacks by the sender are polynomially many (though the sender does not have to know which are these type profiles in advance). This is reasonable in many practical applications, where not all the type profiles can occur, since, e.g., receivers types are highly correlated. On the other hand, let us remark that, as T goes to infinity, we have |KT | mn, so that the regret is sublinear in T. Acknowledgments This work has been partially supported by the Italian MIUR PRIN 2017 Project ALGADIMAR Algorithms, Games, and Digital Market . Alonso, R. and Cˆamara, O. Persuading voters. American Economic Review, 106(11):3590 3605, 2016. Antioch, G. et al. Persuasion is now 30 per cent of US GDP: Revisiting Mc Closkey and Klamer after a quarter of a century. Economic Round-up, 2013. Arieli, I. and Babichenko, Y. Private bayesian persuasion. Journal of Economic Theory, 182:185 217, 2019. Arora, S., Lund, C., Motwani, R., Sudan, M., and Szegedy, M. Proof verification and the hardness of approximation problems. Journal of the ACM (JACM), 45(3):501 555, 1998. Babichenko, Y. and Barman, S. Computational aspects of private bayesian persuasion. ar Xiv preprint ar Xiv:1603.01444, 2016. Babichenko, Y. and Barman, S. Algorithmic aspects of private Bayesian persuasion. In Innovations in Theoretical Computer Science Conference, 2017. Babichenko, Y., Talgam-Cohen, I., Xu, H., and Zabarnyi, K. Regret-minimizing bayesian persuasion. ar Xiv preprint ar Xiv:2105.13870, 2021. Badanidiyuru, A., Bhawalkar, K., and Xu, H. Targeting and signaling in ad auctions. In Proceedings of the Twenty Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 2545 2563, 2018. Bhaskar, U., Cheng, Y., Ko, Y. K., and Swamy, C. Hardness results for signaling in bayesian zero-sum and network routing games. In Proceedings of the 2016 ACM Conference on Economics and Computation, pp. 479 496, 2016. Bro Miltersen, P. and Sheffet, O. Send mixed signals: earn more, work less. In Proceedings of the 13th ACM Conference on Electronic Commerce, pp. 234 247, 2012. Candogan, O. Persuasion in networks: Public signals and k-cores. In Proceedings of the 2019 ACM Conference on Economics and Computation, pp. 133 134, 2019. Castiglioni, M. and Gatti, N. Persuading voters in districtbased elections. In The Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021. Castiglioni, M., Celli, A., and Gatti, N. Persuading voters: It s easy to whisper, it s hard to speak loud. In The Thirty Fourth AAAI Conference on Artificial Intelligence, pp. 1870 1877, 2020a. Castiglioni, M., Celli, A., Marchesi, A., and Gatti, N. Online bayesian persuasion. Advances in Neural Information Processing Systems, 33, 2020b. Castiglioni, M., Celli, A., Marchesi, A., and Gatti, N. Signaling in bayesian network congestion games: the subtle power of symmetry. In The Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021. Cheng, Y., Cheung, H. Y., Dughmi, S., Emamjomeh-Zadeh, E., Han, L., and Teng, S.-H. Mixture selection, mechanism design, and signaling. In 56th Annual Symposium on Foundations of Computer Science, pp. 1426 1445, 2015. Dughmi, S. and Xu, H. Algorithmic bayesian persuasion. In ACM STOC, pp. 412 425, 2016. Dughmi, S. and Xu, H. Algorithmic persuasion with no externalities. In ACM EC, pp. 351 368, 2017. Emek, Y., Feldman, M., Gamzu, I., Paes Leme, R., and Tennenholtz, M. Signaling schemes for revenue maximization. ACM Transactions on Economics and Computation, 2(2):1 19, 2014. Garber, D. Efficient online linear optimization with approximation algorithms. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 30, pp. 627 635. Curran Associates, Inc., 2017. Gr otschel, M., Lov asz, L., and Schrijver, A. The ellipsoid method and its consequences in combinatorial optimization. Combinatorica, 1(2):169 197, 1981. Multi-Receiver Online Bayesian Persuasion Hazan, E., Hu, W., Li, Y., and Li, Z. Online improper learning with an approximation oracle. In Advances in Neural Information Processing Systems, volume 31, pp. 5652 5660, 2018. Kakade, S. M., Kalai, A. T., and Ligett, K. Playing games with approximation algorithms. SIAM Journal on Computing, 39(3):1088 1106, 2009. doi: 10.1137/ 070701704. Kamenica, E. and Gentzkow, M. Bayesian persuasion. American Economic Review, 101(6):2590 2615, 2011. Khachiyan, L. G. Polynomial algorithms in linear programming. USSR Computational Mathematics and Mathematical Physics, 20(1):53 72, 1980. Mansour, Y., Slivkins, A., Syrgkanis, V., and Wu, Z. S. Bayesian exploration: Incentivizing exploration in bayesian games. In Proceedings of the 2016 ACM Conference on Economics and Computation, pp. 661 661, 2016. Mc Closkey, D. and Klamer, A. One quarter of GDP is persuasion. The American Economic Review, 85(2):191 195, 1995. Rabinovich, Z., Jiang, A. X., Jain, M., and Xu, H. Information disclosure as a means to security. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, pp. 645 653, 2015. Raz, R. A parallel repetition theorem. SIAM Journal on Computing, 27(3):763 803, 1998. Roughgarden, T. and Wang, J. R. Minimizing regret with multiple reserves. ACM Transactions on Economics and Computation (TEAC), 7(3):1 18, 2019. Schrijver, A. Combinatorial optimization: polyhedra and efficiency, volume 24. Springer Science & Business Media, 2003. Sviridenko, M., Vondr ak, J., and Ward, J. Optimal approximation for submodular and supermodular optimization with bounded curvature. Mathematics of Operations Research, 42(4):1197 1218, 2017. Vasserman, S., Feldman, M., and Hassidim, A. Implementing the wisdom of waze. In Twenty-Fourth International Joint Conference on Artificial Intelligence, pp. 660 666, 2015. Xu, H. On the tractability of public persuasion with no externalities. In Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 2708 2727, 2020. Xu, H., Freeman, R., Conitzer, V., Dughmi, S., and Tambe, M. Signaling in bayesian stackelberg games. In Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems, pp. 150 158, 2016.