# dynamic_pricing_and_learning_with_bayesian_persuasion__3c1817d9.pdf Dynamic Pricing and Learning with Bayesian Persuasion Shipra Agrawal Columbia University sa3305@columbia.edu Yiding Feng University of Chicago yidingfeng@uchicago.edu Wei Tang Columbia University wt2359@columbia.edu We consider a novel dynamic pricing and learning setting where in addition to setting prices of products in sequential rounds, the seller also ex-ante commits to advertising schemes . That is, in the beginning of each round the seller can decide what kind of signal they will provide to the buyer about the product s quality upon realization. Using the popular Bayesian persuasion framework to model the effect of these signals on the buyers valuation and purchase responses, we formulate the problem of finding an optimal design of the advertising scheme along with a pricing scheme that maximizes the seller s expected revenue. Without any apriori knowledge of the buyers demand function, our goal is to design an online algorithm that can use past purchase responses to adaptively learn the optimal pricing and advertising strategy. We study the regret of the algorithm when compared to the optimal clairvoyant price and advertising scheme. Our main result is a computationally efficient online algorithm that achieves an O(T 2/3(m log T) 1/3) regret bound when the valuation function is linear in the product quality. Here m is the cardinality of the discrete product quality domain and T is the time horizon. This result requires some natural monotonicity and Lipschitz assumptions on the valuation function, but no Lipschitz or smoothness assumption on the buyers demand function. For constant m, our result matches the regret lower bound for dynamic pricing within logarithmic factors, which is a special case of our problem. We also obtain several improved results for the widely considered special case of additive valuations, including an O regret bound independent of m when m T 1/3. 1 1 Introduction Dynamic pricing is a key strategy in revenue management that allows sellers to anticipate and influence demand in order to maximize revenue and/or utility. When the customer valuation and demand response for a product is apriori unknown, price variation can also be used to observe and learn the demand function in order to adaptively optimize price and revenue over time. This learning and optimization problem has been a focus of much recent literature that uses exploration-exploitation and multi-armed bandit techniques with dynamic pricing algorithms (e.g., see [43, 16, 41, 8]). In practice, there is another important tool available to sellers in the form of advertising, using which the sellers can inform and shape customers valuations of a product. It has been theoretically [53, 54] and empirically [55, 49] shown that advertisements can serve as a credible signal of the quality or characteristics of the advertised product. Sellers can use advertising to provide partial information about a product in order to better position the product in the market and potentially increase customers chances of purchasing the product. For example, as a common strategy to drive 1A full version of this work is at https://arxiv.org/abs/2304.14385. 37th Conference on Neural Information Processing Systems (Neur IPS 2023). subscriptions, online newspapers may use a teaser that selectively includes previews of some news articles that are likely to entice readers to subscribe for access to the full story; in the online used-car market, the dealer can advertise the used car by emphasizing different aspects of the car, such as fuel efficiency/mileage/unique features, or selectively disclose history-report information from reputable third parties, catering to specific buyer interests; a film distributor may advertise the movie by selectively showing footage from the film. However, advertising must be carefully designed to achieve the desired gains. At one glance upselling or inflating the product quality by selectively disclosing only favorable information might appear as a profitable advertising strategy. But such strategies carry the disadvantage of not being very effective in modifying customer beliefs as customers may not trust that the provided information accurately reflects the product s true quality. Also, the design of the advertising strategy needs to interact with the design of the pricing strategy and account for the demand function. For example, to sell highly-priced products or under heavy competition/low demand, the customer may need to be convinced of a good match through more information and thorough insights about the product characteristics. On the other hand, in markets with high demand or for very low-priced goods, the seller may get away with revealing very little information. An extreme example of this phenomenon is the concept of mystery deal boxes sold by some retailers like Amazon/Woot, where the customers are not even made aware of the exact contents of the low-cost box that they are purchasing.2 In this paper, we use a Bayesian persuasion framework [39] to model the effect of an advertising strategy on customers beliefs about product quality and consequently their purchase decisions. Our novel formulation combines the Bayesian persuasion model with dynamic pricing and learning in order to quantify the tradeoffs between the design of the pricing and advertising strategies and their combined impact on the revenue outcomes. Without any apriori knowledge of the demand function, our goal is to design an online algorithm that can use past customer responses to adaptively learn a joint pricing and advertising strategy that maximizes the seller s revenue. Bayesian persuasion is a popular framework for information design with several different settings considered in the literature [39, 30, 38, 10]. We consider a Bayesian persuasion model where the sender (seller) ex-ante commits to an information policy (advertising strategy) that prescribes the distribution of signals the sender will provide to the receiver (buyer) on observing the true state of the world (product quality). The receiver, on observing the sender s signal, uses Bayes rule to form a posterior on the state of the world. The receiver s action (purchase decision) then optimizes their expected utility under this posterior. Problem formulation. Specifically, our dynamic pricing and advertising problem is formulated as follows. There are T sequential and discrete rounds. In each round, a fresh product is offered by the seller, with a public prior distribution λ on the product quality ! 2 [0, 1]. At the beginning of each round t, before observing the realized quality of the tth product, the seller commits to a price pt 2 [0, U] and an advertising strategy φt, where φt(σ|!) prescribes the distribution of signal σ 2 where is an arbitrary signaling space given the realized product quality ! 2 . A buyer arrives in each round t with private type t generated i.i.d. from a distribution with CDF F( ) and support3 = [0, 1]. The CDF F( ) (or equivalently the demand function D( ) , 1 F( )) is fixed but unknown to the seller. For a buyer of type 2 , the valuation of a product with product quality ! is given by function v( , !). The tth product quality !t λ is then realized and observed by the seller. The buyer cannot observe the realized product quality, but only a signal σt φt( |!t) provided by the seller. The buyer uses this signal along with the prior λ to formulate a Bayesian posterior distribution on the product quality µt(!|σt) / φt(σt|!) λ(!). The buyer then purchases the product if and only if the expected valuation under this posterior E! µt( |σt)[v( t, !)] is greater than or equal to the price pt. We denote the buyer decision at time t by at 2 {0, 1} with at = 1 denoting purchase. Remark 1.1. We consider the setting where buyers use only the prior distribution and signal in the current round, and not the signals in the past rounds, to make their decisions. This is motivated by 2For example, when selling the opaque products, the precise product features or characteristics are hidden from the customers [31]. 3Our results can be generalized to the setting where is any compact interval [ , ] R+, or unbounded (see Remark 2.3). the fact that at each round, the buyer is facing a fresh product, whose quality is drawn independently across time. Thus, previous signals do not carry information about the current product. Fresh products with independent qualities are common in many real-world applications such as secondhand markets, mystery boxes sold by Amazon/Woot, etc. This similar problem structure has also been studied in the online/sequential Bayesian persuasion literature with repeated interactions of sender and receivers, for example, see [61, 24, 23, 33, 58, 15]. A summary of the game timeline is as follows: at t 2 [T], (1) the seller commits to a price pt 2 [0, U] and an advertising strategy φt; (2) a buyer t with private type t F arrives; (3) a product with quality !t λ is realized; the seller sends signal σt φt( |!t) to the buyer; (4) the buyer formulates Bayesian posterior µt(!|σt), and the buyer purchases the product (denoted by at = 1) to generate revenue pt if and only if her expected value exceeds the price, i.e., E! µt( |σt)[v( t, !)] pt. Our goal is to design an online learning algorithm that sequentially chooses the price and advertising strategy pt, φt in each round t based on the buyers responses in the previous rounds, in order to optimize total expected revenue over a time horizon T without apriori knowledge of the distribution F. Let Rev(pt, φt) , E[pt at; pt, φt] denote the expected revenue at time t under the price pt and advertising strategy φt. Here the expectation is over realizations of customer type t F, product quality !t λ and advertising signal σt φt( |!t). Note that since product quality and types are i.i.d. across time, for any given pt = p, φt = φ, the expected per-round revenue Rev(p, φ) does not depend on time. Thus a static price and advertising strategy maximizes total expected revenue over the time horizon T. Therefore, we can measure the performance of an algorithm in terms of regret that compares the total expected revenue of the algorithm to that of the best static pricing and advertising strategy. We define Regret(T) , T max p,φ Rev(p, φ) Rev(pt, φt) . Our contributions. In this work, we present a computationally efficient online pricing and advertis- ing algorithm that achieves an O 2/3(m log T) bound on the regret in time T, where m = | | is the cardinality of the (discrete) product quality space. Importantly, we achieve this result without any assumptions like Lipschitz or smoothness on the demand function D( ) = 1 F( ). However, our results require certain assumptions on the valuation function. Following the literature, we make the common assumption that the function v( , !) is linear in the product quality !. Furthermore, we assume the following monotonicity and Lipschitz properties on the valuation function. Assumption 1. Buyer s valuation function v( , ) satisfies: 1a Fix any buyer type , function v( , !) is non-decreasing w.r.t. quality !. 1b Fix any quality !, function v( , !) is increasing and 1-Lipschitz4 w.r.t. type . Such assumptions are in fact common in literature and natural in many economic situations where the valuation of a product increases with the product quality and buyer s type (paying ability/need). Existing literature on Bayesian persuasion/dynamic pricing often makes even stronger assumptions about the receiver s utility function. For example, both the additive functions v( , !) = + ! [cf. 35, 45, 27] and the multiplicative functions v( , !) = ! + [cf. 22, 48], are linear in ! and satisfy Assumption 1. Our main result is then summarized as follows: Theorem 1.1. For any type CDF F, given a valuation function v( , !) that is linear in product quality ! and satisfies Assumption 1, Algorithm 1 with parameter " = ((m log T/T) 1/3) has an expected regret of O 2/3(m log T) . Here, m is the cardinality of the discrete quality space . Furthermore, we obtain several improved results for the widely considered special case of additive valuations, i.e., for v( , !) = + !. See Appendix B for the formal statements and analysis. 1. (Theorem B.1) Consider discrete sets that are equally-spaced , e.g., = {0, 1} or = [m]. Given such a product quality space and additive valuation function, we show that the regret of Algorithm 1 is bounded by O(T 1/3) when m (T/log T) 1/3, and by O(pm T log T) for larger m. 4Here 1 Lipschitz constant is for exposition simplicity, an arbitrary Lipschitz constant L can be treated similarly. 2. (Theorem B.2) For any arbitrary (discrete or continuous) product quality space , given additive valuation functions, we have a slightly modified algorithm (see Algorithm 3 in Appendix C.2) with an expected regret of O(T 1/4) independent of m. When the valuation function v( , !) is L-Lipschitz w.r.t. type , our regret bound in Theorem 1.1 would become O(T 2/3(Lm log T) 1/3), and the regret bound for arbitrary space would become O(T 3/4(L log T) We might compare our results to the best regret bounds available for the well-studied dynamic pricing and learning problem with unlimited supply [43, 8], which is a special case of our problem if the product quality is deterministic, i.e., m = 1, and the advertising scheme reveals no information and thus has no impact on the buyer s purchase decision. For the dynamic pricing problem a lower bound of (T 2/3) on regret is known [43]. Therefore the dependence on T in our results cannot be improved. In fact, our result matches this lower bound in the case of binary or constant size quality space, which are common settings in information design literature [39, 18, 5, 57, 33, 12]. High-level descriptions of the proposed algorithm and challenges. Our problem can be viewed as a very high dimensional combinatorial multi-armed bandit problem, where each arm is a pair of a price and a feasible advertising strategy: the set of feasible advertising strategies being the set of all possible conditional distributions {φ( |!), ! 2 } over signal space . As a first step towards obtaining a more tractable setting, we present an equivalent reformulation of the problem which uses the observation that advertising affects the buyer s decision only via the posterior distribution over quality. By the linearity of valuation function v( , ) over product quality !, seller s choice of advertising strategy in every round can be further simplified to selecting a distribution over posterior means that is subject to a feasibility constraint. From here, the seller s decision space now becomes two-dimensional (a price and a distribution of posterior means). Viewing seller s expected revenue as an unknown (nonlinear) function over this two-dimensional decision space, one may consider applying algorithms in contextual bandits or Lipschitz bandits to get sublinear regrets, e.g., O(T 3/4poly(m)) regret for two-dimensional decision space if there exists a Lipschitz property of reward function relative to seller s decision space. However, it is unclear whether one can establish such Lipschitz property given that we do not assume Lipschitzness or smoothness on demand function and we have complex constraints on the feasibility of advertising space. Instead, in our algorithm we use a model-based approach : we use buyers purchase responses to explore the demand function over the (discretized) type space and jointly learn the optimal advertising and pricing. To explore the demand function over the one-dimensional (discretized) type space, we propose a novel discretization scheme such that it enables near-optimal price and advertising strategy even without Lipschitzness or smoothness assumption and with the complex feasibility constraint on the advertising space. These treatments lead us to the optimal 1/3) regret. Related work. Our work is related to several streams of research. Below we briefly review the some of these connections. In our setting, the seller can utilize her information advantage to design an advertisement to signal the product quality to the buyer. There is a long line of research in the literature, from both empirical and theoretical perspective, dedicated to study how to use advertisement as a signal to steer buyers evaluations of advertised goods [53, 54, 42, 52, 37, 55, 40]. In our problem, we follow the literature in information design, a.k.a., Bayesian persuasion [39] [also see the recent surveys by 30, 38, 10], where the seller can commit to an information policy that can strategically disclose product information to the buyer so that to influence buyer s belief about the product quality. Similar formulation for advertising has also appeared in [19, 32, 3, 34, 14, 13, 28]. Our work differs from these works in several ways. First, the seller s offline problem in our setting is a joint pricing and advertising problem. Second, we focus on an online setting where the seller has no apriori knowledge of the demand function and has to use past buyers purchase responses to adaptively learn optimal pricing and advertising strategy. Our problem shares similarity to the problem on sale of information in economics and computer science literature [7, 9, 11, 26, 20, 48, 12, 59, 47, 25]. In particular, similar to the problem on sale of information, the seller, in our setting, also commits to design an information structure to reveal information about the realized state to the decision maker (i.e., buyer); and the buyer, in our setting, then makes the payment based on the declared information structure, not for specific realizations of the seller s informative signals [12, 20, 48]. However, different from these works, the seller in our setting is selling a product with some inherent value and not just information. The valuation of the product can be shaped by providing information. This gives new interesting tradeoffs between information and revenue in our problem that are absent in the settings where only information is being sold. Moreover, in most literature of selling information, the buyers type distribution is usually assumed to be known. We consider a more practical data-driven setting where the underlying buyers type distribution (demand function) is apriori unknown to the seller and needs to be learnt from observations. Facing unknown buyer s preference (i.e., buyer s private type), the seller s dynamic advertising problem also relates to the growing line of work in information design on relaxing one fundamental assumption in the canonical Bayesian persuasion model the sender perfectly knows receiver s preference. The present paper joins the recent increased interests on using online learning approach to study the regret minimization when the sender repeatedly interacts with receivers [23, 24, 61, 33] without knowing receivers preferences. Moreover, our work also conceptually relates to research on Bayesian exploration in multi-armed bandit [46, 51, 50, 36] which also studies an online setting where one player can utilize her information advantage to persuade another player to take the desired action. Our work departs from this line of work in terms of both the setting and the application domain. Particularly, the above works typically consider an online setting on how to learn optimal signaling scheme whereas in our setting the optimal policy is a joint pricing and signaling (advertising) scheme. When there is no uncertainty in the product quality, the seller s problem in our setting reduces to a standard dynamic pricing and learning problem with unknown non-parametric demand function [43, 16, 41, 8]. However, given any non-trivial product quality space and prior distributions, in our problem, in addition to a price, the seller needs to choose a non-trivial advertising strategy in order to maximize revenue. This makes the seller s decision space high dimensional and (as we discuss in the next section) introduces significant complexities and difficulties so that the typical techniques (like uniform or adaptive discretization) used in pricing and continuous/combinatorial multi-armed bandit literature cannot be directly applied. 2 Algorithm Design In this section, we present our main algorithm for the dynamic pricing and advertising problem. In subsection 2.1, we present an equivalent reformulation for tractable advertising strategies, then in subsection 2.2, we discuss many important challenges even after this simplification, and finally, in subsection 2.3 we present our algorithm. 2.1 An equivalent reformulation for tractable advertising strategies Recall that in every round, the buyer t sees the offered price pt and advertising strategy φt that specifies the distributions over signals φt( |!) 2 that the seller will send for each possible value ! of the realized product quality. After the product quality !t is realized, the buyer t sees a signal σt φt( |!t) from the seller s declared advertising strategy. The buyer uses this signal along with the prior λ to form a Bayesian posterior µt( |σt) 2 on the product quality. The Bayesian rational buyer then takes the action at 2 {0, 1}, based on expected utility maximization. In particular, we have at = 1 if E! µt( |σt)[v( t, !)] pt and at = 0 otherwise. From the decision formula above, it is clear that the choice of advertising strategy affects the buyer t s decision only through the realized posterior µt( |σt). Consequently, the seller s choice of advertising strategy in time t can be reduced to selecting a distribution over posteriors µt. Seller s choice can in fact be further simplified in the case where the buyer s valuation function is linear in the product quality ! since in that case we have E! µt( |σt)[v( t, !)] = v( t, E! µt( |σt)[!]) = v( t, qt) where qt is the realized posterior mean qt , E! µt( |σt)[!]. Here qt 2 [0, 1] since ! 2 [0, 1]. Therefore the buyer purchases (at = 1) if and only if v( t, qt) pt. As a result, we can reduce the seller s advertising in round t to the choice of distribution (pdf) t( ) 2 [0,1] over posterior means. However, the choice of t must be restricted to only feasible distributions of posterior means, that is, all possible distributions over posterior means that can be induced by any advertising scheme given the prior λ. It is well known that the distribution over posterior means is feasible if and only if it is the mean-preserving contraction of the prior [17, 6]. This condition can be equivalently written in terms of a Bayes-consistency condition [39] on the conditional means ( |!), ! 2 . For simplicity of exposition, we consider discrete quality space = { !1, . . . , !m} [0, 1], where 0 = !1 < !2 < . . . < !m = 1 and m = | | is the cardinality of the quality space. Then, a distribution over posterior means is feasible if one can construct a set of conditional distributions ( i)i2[m] satisfying the following Bayes-consistency condition, and vice versa [39]: P i2[m] λi i(q) !i P i2[m] λi i(q) = q, 8q 2 supp( ) . (BC) Throughout this paper, we use the collection of distributions ( i)i2[m] satisfying (BC) condition as a convenient way to construct feasible distributions over posterior means: (q) = P With the above observations, we can without loss of generality assume that seller s advertising strategy is to directly choose a distribution t over the posterior means that satisfies (BC), without considering the design of the underlying signaling scheme {φ(σ|!), }. We summarize the new equivalent game timeline as follows: at t 2 [T], (1) the seller commits to a price pt 2 [0, U] and an advertising strategy t = ( i,t)i2[m] satisfying (BC); (2) a buyer t with private type t F arrives; (3) a product with quality !t λ is realized; a posterior mean qt t is realized; (4) the buyer observes the posterior mean qt; the buyer purchases the product (at = 1) to generate revenue pt if only if v( t, qt) pt. Note that the seller knows the form of the buyer s valuation function v. Moreover, the seller observes the realized product quality !t, the realized posterior mean qt, and the buyer s purchase decision at, but does not know type CDF F (i.e., the demand function D) and the realized buyer type t. Revenue and regret. Given the new formulation, we can also rewrite the revenue and regret in terms of the choices of t, t = 1, . . . , T. We define the following function (p, q), which we refer to as the critical type for a given price p and posterior mean q. Definition 2.1 (Critical type). For any p 2 [0, U] and q 2 [0, 1], define function ( , ) as (p, q) , min{ 2 : v( , q) p}. Now under Assumption 1b, due to the monotonicity of the valuation function in buyer s type, we have that given any p, q, , 1[v( , q) p] = 1[ (p, q)]. Therefore, the buyer t will purchase the product if and only if t (pt, qt). Therefore, given the price, advertising pt = p, t = and prior distribution λ, the expected revenue in any round t is given by 5 Rev(p, ) = E! λ, F,q [p 1[ (p, q)]] = p i(q)D( (p, q))dq (1) Let the seller s online policy offer price pt and advertising t at time t, where pt, t can depend on the history of observations/events up to time t. Then expected regret defined in Section 1 can be equivalently written as Regret[T] = TRev(p , ) E[Rev(pt, t)] . Here, the expectation is taken with respect to any randomness in the algorithm s choice of pt, t; and p , = ( i )i2[m] are defined as the best price and advertising strategy for a given F (and ( , ) which is determined by F). Given the expression for Rev(p, ) derived above, these can be characterized by the following optimization program p , , arg max p, Rev(p, ) i2[m] λi i(q) !i P i2[m] λi i(q) = q, q 2 [0, 1]; i 2 [0,1], i 2 [m] . where the first constraint is due to (BC). 5Here, we slightly abuse the notation to redefine Rev as a function of price and , instead of price and φ defined earlier. 2.2 Algorithm design: challenges and ideas Challenge: high-dimensional continuous decision space. In the last section, we obtained a considerable simplification of the problem by reducing the seller s advertising strategy in every round t to a distribution t 2 [0,1] over posterior means satisfying the (BC) condition. However, the decision space (a.k.a space of arms) still remains high dimensional and therefore a naive application of (uniform or adaptive) discretization-based bandit techniques, e.g. from Lipschitz bandit literature [44, 56], would not achieve the desired results. Algorithm design idea: exploring over one-dimesnional type space. Our algorithm uses a model-based approach instead, where we use buyer purchase responses to develop (upper confidence bound) estimates of the demand model D( ) = 1 F( ) on the points of a discretized type space S . We then use these upper confidence bounds to construct a piecewise-constant demand function that is an upper confidence bound (UCB) for the demand function D. Then, in each round we solve for the optimal price and advertising strategy by solving an optimization problem similar to POPT, but with the UCB demand function. Challenge: efficient discretization of type space. The challenge then is to design a discretization scheme for the type space such that we have a) efficient learning, i.e., the discretized space can be efficiently explored to accurately estimate the demand function on those points, and b) Lipschitz property, i.e., the optimal revenue with the UCB estimate of demand function is close to the true optimal revenue as long as the estimation error on the discretized space is small. There are two main difficulties in achieving this: (1) Lack of any smoothness/Lipschitz assumption on CDF F. (2) Sensitivity of the Bayes-consistency condition (BC). To see these difficulties, recall that given a price pt and realized posterior mean qt, the tth buyer s purchase decision is given by at = 1[ t (pt, qt)]; thus the seller can obtain demand function estimate at point (pt, qt). Without any smoothness or Lipschitz property of demand function, estimates of demand function cannot be extrapolated accurately to other points. This means that in our revenue optimization problem (estimated version of POPT), we need to find a price and advertising strategy that we can only use estimates of demand function on a discretized type space, say S . However, if we restrict to a discretized type space S, then the support of posterior mean distributions (a.k.a advertising strategy) must be restricted to the points q such that the corresponding critical types (p, q) are in the set S. Unfortunately, the (BC) condition makes the set of feasible advertising strategies very sensitive to their support. In particular, if we use uniform-grid based discretization (which is commonly used in previous dynamic pricing literature such as [43, 8]), it is easy to construct examples of prior distribution and valuation function such that there are no or very few feasible advertising strategies with the corresponding restricted support. Example 2.2. Consider additive valuation function, i.e., v( , !) = +!, and thus (p, q) = p q. Consider quality space = { !i}i2[3]. Given a small ", consider a uniform-grid based discretization for the type space S, i.e., S = {0, ", 2", . . . , 1}. If we also use a price p that is from uniform-grid based discretized price space, i.e., p = k" for some k 2 N +, then to ensure (p, q) 2 S, the support of advertising strategy (i.e., the distribution of the posterior means) must also be restricted within the set S. However, if the prior distribution λ has negligible probabilities on qualities !1, !3, and quality !2 /2 S, then we cannot construct any posterior distribution with the mean in the set S. Therefore, there does not exist any feasible advertising strategy. Note that this difficulty cannot be fixed by simply modifying the discretized type space to S [ , because even then we would need to construct an advertising such that (p, q) = p q 2 S [ in the grid for all p. That would need that the support of the strategy (i.e., posterior means q) is restricted to be in {k !i}k2N+,i2[3]; such posterior means again may not be achieved here. Algorithm design idea: novel discretization scheme. Our algorithm employs a carefully-designed quality-and-price-dependent discretization scheme. The above example shows that we cannot uniformly discretize price and type using "-grids, as we may not have any feasible advertising strategy under such discretization. And furthermore, it also shows that this difficulty cannot be fixed by simply adding the m points in quality space to the discretized type space S. Instead, in our discretization scheme, we first uniformly discretize the price space to an "-grid P. Then to construct a discretized type space S, in addition to the points on an "-grid over [0, 1], we also include points { (p, !)} for every price p 2 P and quality ! 2 . This gives us a discretized type space S of size m U/". We claim that our construction ensures that there exist near-optimal price and advertising strategy with support in the discretized type space S. The proof of this claim requires a careful rounding argument, which forms one of the main technical ingredients for our regret analysis in Section 3. 2.3 Details of the proposed algorithm Our dynamic pricing and advertising algorithm jointly discretizes the price space and type space using the following quality-and-price-dependent discretiztion scheme: given parameter ", we define the following set: P , {", 2", . . . , U} S , {0, ", . . . , 1 ", 1} [ {( (p, !) 1) _ 0}p2P,!2 . At time t, we restrict the price and advertising strategies (pt, t) to the set of (p, = ( i)i2[m]) such that p 2 P, and given price p, each conditional distribution i has restricted support Qp defined as Qp , {q : (p, q) 2 S}. Given the price and advertising pt, t, let the realized posterior mean at time t be qt t, and let the corresponding critical type be xt , (pt, qt) 2 [0, 1]. Then, note that the above restrictions on price and advertising strategies guarantee that xt 2 S. Next, to compute the offered price and advertising strategy in round t, we optimize an upper confidence bound on the revenue function that we develop using upper confidence bounds DUCB(x), x 2 S of the demand function computed as follows. For every type x 2 S, let Nt(x) denote the set of time rounds before time t that the induced critical type is exactly x, and let Nt(x) be the number of such time rounds. That is, Nt(x) , { < t : (p , q ) = x} , Nt(x) , |Nt(x)| , x 2 S. Recall that buyer s purchase decision follows a = 1[ (p , q )]. We estimate the demand function at x as: Dt(x) , Nt(x) . We can now define the following UCB index: t (x) = min x02S:x0 x (1 + Nt(x0)) ln(1 + Nt(x0)) Nt(x0) 1, x 2 S (3) Then, for any pair of discretized price p 2 P and advertising strategy with discretized support for that price = ( i 2 Qp, i 2 [m]}, we define the following seller s revenue estimates: t (p, ) , p t ( (p, q))dq . Above is well-defined since by definition (p, q) 2 S for each such (p, q) 2 P Qp. Finally, we let pt, t be the optimal solution to the following optimization problem: (pt, t) = arg max s.t. p 2 P; i2[m] λi i(q) !i P i2[m] λi i(q) = q, q 2 Qp; i 2 Qp, i 2 [m] . We summarize our algorithm as Algorithm 1. The main computational bottleneck of Algorithm 1 is to solve the high-dimensional program PUCB t at each time t |S| + 1. As we illustrate in Proposition 2.1, there exists an efficient method to optimally solve this program. The proof of this result utilizes the monotoncity of the function DUCB t . Proposition 2.1 (Adopted from [4, 21]). Let " be the discretization parameter for the set P defined in (2). There exists a polynomial time (in |S|U/") algorithm that can solve the program PUCB Proof. Since function DUCB t is monotone with discontinuities at the points in the set S, when we fix a price p 2 P, the function DUCB t ( (p, )) is also monotone with discontinuities at the points in Qp = {q : (p, q) = x}x2S. Given a prior λ, optimizing a monotone function with discontinuities over all feasible advertising strategies induced from the prior λ subject to the constraint where the support of advertising strategies must be in the set Qp has been studied in [4, 21]. It has been shown that there Algorithm 1: Algorithm for Dynamic Pricing and Advertising with Demand Learning. 1 Input: Discretization parameter ". 2 For the first |S| rounds, for each x 2 S, offer a price p with no information advertising s.t. (p, E! λ[!]) = x. // No information advertising provides completely uninformative signal the distribution φ( |!) of signals does not depend on the realized quality ! 3 for each round t = |S| + 1, |S| + 2, . . . , T do 4 For all x 2 S, compute DUCB t (x) as defined in (3). 5 Offer the price pt and an advertising t computed as an optimal solution to program PUCB /* pt, t satisfies (pt, q) 2 S for every q 2 supp( t). */ 6 Observe realized posterior mean qt t and buyer s purchase decision at 2 {0, 1}. Nt+1(x), Nt+1(x), Dt+1(x) exists a polynomial (w.r.t. the number of discontinuities) algorithm based on convex programming that can find an optimal advertising strategy (see Proposition 2 in 4). Thus, an exhaustively search over the discretized price space P can lead to an optimal solution to the program PUCB We conclude this section with the following remark on the extension to unbounded type support. Remark 2.3. Our algorithm and analysis can be extended to the case with unbounded type support (e.g., = [0, 1)). In particular, since the price is bounded by [0, U], and the quality is bounded within [0, 1], by the monotoncity of the valuation function, we know the critical types (p, q) induced by any possible p 2 [0, U] and q 2 [0, 1] is bounded within [ (0, 1), (U, 0)]. Thus, an instance with unbounded type support is equivalent to an instance with bounded type support [ (0, 1), (U, 0)]. 3 Regret Analysis: Proof Overview of Theorem 1.1 In this section, we present our main regret bound for Algorithm 1, as stated in Theorem 1.1. Specifically, we show that for any type CDF F, given a valuation function v( , !) that is linear in product quality ! and satisfies Assumption 1, our algorithm (Algorithm 1 with parameter " = ((m log T/T) 1/3)), has an expected regret of O 2/3(m log T) . Importantly, for this result we do not assume any smoothness or Lipschitz properties of distribution F. For this result, we consider arbitrary but discrete quality space of cardinality m. Later in Appendix B, we show improved regret bounds for the case of additive valuations and equally-spaced quality space (see Theorem B.1), and also extend to arbitrary large and continuous quality spaces (see Theorem B.2). Proof outline. Recall that in every round t, Algorithm 1 sets the price pt and advertising strategy t as an optimal solution of program PUCB t that approximates the benchmark POPT in two ways. Firstly, it restricts the price and support of advertising strategy to be in a discretized space P {Qp, p 2 P}. Secondly, it approximates the true demand function with an upper bound DUCB t . Our proof consists of two main steps that bound the errors due to each of the above approximations. Due to the space limit, all missing proofs in this section are deferred to Appendix A. Step 1: bounding the discretization error using a rounding argument (see Appendix A.1). To separate the discretization error from the error due to demand function estimation, we consider an intermediate optimization problem e P (in Appendix A.1) obtained on replacing the UCB demand function DUCB t with the true demand function D (while keeping the discretized space for p, ). Let ep , e be an optimal solution of program e P. We show that the revenue Rev(ep , e ) is sufficiently close (within 2") to the optimal revenue Rev(p , ). This bound is obtained using a careful rounding argument: we show that the optimal price p and the optimal advertising can be rounded to a new price p and a new advertising that satisfy (i) feasibility (Lemma A.4): p 2 P, supp( ) supp(Qp ); and (ii) revenue guarantee (Lemma A.5): Rev(p , ) Rev(p , ) 2 . Step 2: bounding estimation error and establishing optimism (see Appendix A.3). Next, we show that the UCB estimates of the demand function DUCB t (x), x 2 S converge to the true demand function D with high probability, along with concentration bounds on the gap between the true and estimated function (Lemma A.7). This allows us to show that (i) Revenue optimism (Lemma A.8): we show that the algorithm s revenue estimates are (almost) optimistic, i.e., Rev UCB t (pt, t) Rev(ep , e ) Rev(p , ) 2". (ii) Revenue approximation (Lemma A.9): we show that the optimistic revenue estimates are close to the true revenue in round t, with the gap between the two being inversely proportional to the number of observations, in particular, Rev UCB t (pt, t) Rev(pt, t) log T/Nt( (pt, q)) Finally, in Appendix A.7 we put it all together to bound the regret as stated in Theorem 1.1. We first use the above observations to show that regret over each round can be roughly bounded as 2 T + 5pt Eq t log T/Nt( (pt, q)) . Then, using the constraint P x2S NT (x) T, we show that in the worst case, total regret over time T is bounded by O(T + |S|T log T). The theorem is then obtained by substituting |S| = O(m/ ) and optimizing ". 4 Conclusions and Future Direction In this work, we use a foundational information design framework, Bayesian persuasion, to model the effect of an advertising strategy on customers beliefs about product quality and consequently their purchase decisions. Our formulation combines the Bayesian persuasion model with dynamic pricing and learning to quantify the tradeoffs between the design of the pricing and advertising strategies and their combined impact on the revenue outcomes. Without any apriori knowledge of the demand function, we show that there exists an efficient online policy that has the regret O(T 2/3(m log T)1/3) for a finite state space with cardinality m. This result implies that when the number of the states m is a constant, there is almost no additional learning regret for the seller to additionally learn the optimal advertising compared to the dynamic pricing (without advertising) for non-parametric demand learning problem. There are interesting future directions from this work. First, in our current formulation, buyers valuation is linear w.r.t the product states. It would be interesting to explore whether our results could be generalized to more general valuation function. Secondly, in our setting, a buyer with i.i.d private type arrives at each time round, and leaves the market no matter what the purchase decision is. However, in practice, buyers may strategize their purchase time for a more favorable price [2, 29, 60]. How to model the advertising effect with strategic buyers and achieve efficient demand learning is another interesting future direction. Acknowledgements We thank the anonymous reviewers for the helpful comments. This work was supported in part by grants G-2020-13917 from Alfred P. Sloan Foundation and NSF CAREER award 1846792. [1] Yasin Abbasi-Yadkori, David Pal, and Csaba Szepesvari. Online-to-confidence-set conversions and application to sparse stochastic bandits. In Artificial Intelligence and Statistics, pages 1 9. PMLR, 2012. [2] Kareem Amin, Afshin Rostamizadeh, and Umar Syed. Learning prices for repeated auctions with strategic buyers. Advances in Neural Information Processing Systems, 26, 2013. [3] Itai Arieli and Yakov Babichenko. Private bayesian persuasion. Journal of Economic Theory, 182:185 217, 2019. [4] Itai Arieli, Yakov Babichenko, Rann Smorodinsky, and Takuro Yamashita. Optimal persuasion via bi-pooling. In Proceedings of the 21st ACM Conference on Economics and Computation, pages 641 641, 2020. [5] Pak Hung Au and Keiichi Kawai. Competitive information disclosure by multiple senders. Games and Economic Behavior, 119:56 78, 2020. [6] Robert J Aumann, Michael Maschler, and Richard E Stearns. Repeated games with incomplete information. MIT press, 1995. [7] Moshe Babaioff, Robert Kleinberg, and Renato Paes Leme. Optimal mechanisms for selling information. In Proceedings of the 13th ACM Conference on Electronic Commerce, pages 92 109, 2012. [8] Moshe Babaioff, Shaddin Dughmi, Robert Kleinberg, and Aleksandrs Slivkins. Dynamic pric- ing with limited supply. 3(1), 2015. [9] Dirk Bergemann and Alessandro Bonatti. Selling cookies. American Economic Journal: Mi- croeconomics, 7(3):259 94, 2015. [10] Dirk Bergemann and Stephen Morris. Information design: A unified perspective. Journal of Economic Literature, 57(1):44 95, 2019. [11] Dirk Bergemann, Alessandro Bonatti, and Alex Smolin. The design and price of information. American economic review, 108(1):1 48, 2018. [12] Dirk Bergemann, Yang Cai, Grigoris Velegkas, and Mingfei Zhao. Is selling complete infor- mation (approximately) optimal? ar Xiv preprint ar Xiv:2202.09013, 2022. [13] Dirk Bergemann, Tibor Heumann, and Stephen Morris. Screening with persuasion. ar Xiv preprint ar Xiv:2212.03360, 2022. [14] Dirk Bergemann, Tibor Heumann, Stephen Morris, Constantine Sorokin, and Eyal Winter. Optimal information disclosure in auctions. American Economic Review: Insights, 2022. [15] Martino Bernasconi, Matteo Castiglioni, Andrea Celli, Alberto Marchesi, Francesco Trovò, and Nicola Gatti. Optimal rates and efficient algorithms for online bayesian persuasion. In International Conference on Machine Learning, pages 2164 2183. PMLR, 2023. [16] Omar Besbes and Assaf Zeevi. Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Operations Research, 57(6):1407 1420, 2009. [17] David A Blackwell and Meyer A Girshick. Theory of games and statistical decisions. Courier Corporation, 1979. [18] Alessandro Bonatti, Munther Dahleh, Thibaut Horel, and Amir Nouripour. Selling information in competitive environments. ar Xiv preprint ar Xiv:2202.08780, 2022. [19] Peter Bro Miltersen and Or Sheffet. Send mixed signals: earn more, work less. In Proceedings of the 13th ACM Conference on Electronic Commerce, pages 234 247, 2012. [20] Yang Cai and Grigoris Velegkas. How to sell information optimally: An algorithmic study. In Proceedings of the12th Innovations in Theoretical Computer Science Conference, volume 185, 2021. [21] Ozan Candogan. Persuasion in networks: Public signals and k-cores. In Proceedings of the 2019 ACM Conference on Economics and Computation, pages 133 134, 2019. [22] Ozan Candogan and Philipp Strack. Optimal disclosure of information to a privately informed receiver. In Proceedings of the 22nd ACM Conference on Economics and Computation, pages 263 263, 2021. [23] Matteo Castiglioni, Andrea Celli, Alberto Marchesi, and Nicola Gatti. Online bayesian per- suasion. Advances in Neural Information Processing Systems, 33, 2020. [24] Matteo Castiglioni, Alberto Marchesi, Andrea Celli, and Nicola Gatti. Multi-receiver online bayesian persuasion. In International Conference on Machine Learning, pages 1314 1323. PMLR, 2021. [25] Yanlin Chen and Jun Zhang. Signalling by bayesian persuasion and pricing strategy. The Economic Journal, 130(628):976 1007, 2020. [26] Yiling Chen, Haifeng Xu, and Shuran Zheng. Selling information through consulting. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2412 2431. SIAM, 2020. [27] Davide Crapis, Bar Ifrach, Costis Maglaras, and Marco Scarsini. Monopoly pricing in the presence of social learning. Management Science, 63(11):3586 3608, 2017. [28] Bolin Ding, Yiding Feng, Chien-Ju Ho, Wei Tang, and Haifeng Xu. Competitive information design for pandora s box. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 353 381. SIAM, 2023. [29] Alexey Drutsa. Optimal non-parametric learning in repeated contextual auctions with strategic buyer. In International Conference on Machine Learning, pages 2668 2677. PMLR, 2020. [30] Shaddin Dughmi. Algorithmic information structure design: a survey. ACM SIGecom Ex- changes, 15(2):2 24, 2017. [31] Adam N Elmachtoub and Michael L Hamilton. The power of opaque products in pricing. Management Science, 67(8):4686 4702, 2021. [32] Yuval Emek, Michal Feldman, Iftah Gamzu, Renato Paes Leme, and Moshe Tennenholtz. Sig- naling schemes for revenue maximization. ACM Transactions on Economics and Computation (TEAC), 2(2):1 19, 2014. [33] Yiding Feng, Wei Tang, and Haifeng Xu. Online bayesian recommendation with no regret. In Proceedings of the 23rd ACM Conference on Economics and Computation, pages 818 819, 2022. [34] Ilwoo Hwang, Kyungmin Kim, and Raphael Boleslavsky. Competitive advertising and pricing. [35] Bar Ifrach, Costis Maglaras, Marco Scarsini, and Anna Zseleva. Bayesian social learning from consumer reviews. Operations Research, 67(5):1209 1221, 2019. [36] Nicole Immorlica, Jieming Mao, Aleksandrs Slivkins, and Zhiwei Steven Wu. Incentivizing exploration with selective data disclosure. In Proceedings of the 21st ACM Conference on Economics and Computation, pages 647 648, 2020. [37] Kenneth L Judd and Michael H Riordan. Price and quality in a new product monopoly. The Review of Economic Studies, 61(4):773 789, 1994. [38] Emir Kamenica. Bayesian persuasion and information design. Annual Review of Economics, 11:249 272, 2019. [39] Emir Kamenica and Matthew Gentzkow. Bayesian persuasion. American Economic Review, 101(6):2590 2615, 2011. [40] Kei Kawai, Ken Onishi, and Kosuke Uetake. Signaling in online credit markets. Journal of Political Economy, 130(6):000 000, 2022. [41] N Bora Keskin and Assaf Zeevi. Dynamic pricing with an unknown demand model: Asymp- totically optimal semi-myopic policies. Operations research, 62(5):1142 1167, 2014. [42] Richard E Kihlstrom and Michael H Riordan. Advertising as a signal. Journal of Political Economy, 92(3):427 450, 1984. [43] Robert Kleinberg and Tom Leighton. The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings., pages 594 605. IEEE, 2003. [44] Robert Kleinberg, Aleksandrs Slivkins, and Eli Upfal. Multi-armed bandits in metric spaces. In Proceedings of the fortieth annual ACM symposium on Theory of computing, pages 681 690, 2008. [45] Anton Kolotilin, Tymofiy Mylovanov, Andriy Zapechelnyuk, and Ming Li. Persuasion of a privately informed receiver. Econometrica, 85(6):1949 1964, 2017. [46] Ilan Kremer, Yishay Mansour, and Motty Perry. Implementing the wisdom of the crowd . Journal of Political Economy, 122(5):988 1012, 2014. [47] Yingkai Li. Selling data to an agent with endogenous information. In Proceedings of the 23rd ACM Conference on Economics and Computation, pages 664 665, 2022. [48] Shuze Liu, Weiran Shen, and Haifeng Xu. Optimal pricing of information. In Proceedings of the 22nd ACM Conference on Economics and Computation, pages 693 693, 2021. [49] Puneet Manchanda, Jean-Pierre Dubé, Khim Yong Goh, and Pradeep K Chintagunta. The effect of banner advertising on internet purchasing. Journal of Marketing Research, 43(1): 98 108, 2006. [50] Yishay Mansour, Aleksandrs Slivkins, and Vasilis Syrgkanis. Bayesian incentive-compatible bandit exploration. In Proceedings of the Sixteenth ACM Conference on Economics and Computation, pages 565 582, 2015. [51] Yishay Mansour, Alex Slivkins, Vasilis Syrgkanis, and Zhiwei Steven Wu. Bayesian explo- ration: Incentivizing exploration in bayesian games. Operations Research, 70(2):1105 1127, 2022. [52] Paul Milgrom and John Roberts. Price and advertising signals of product quality. Journal of political economy, 94(4):796 821, 1986. [53] Phillip Nelson. Information and consumer behavior. Journal of political economy, 78(2): 311 329, 1970. [54] Phillip Nelson. Advertising as information. Journal of political economy, 82(4):729 754, [55] Navdeep S Sahni and Harikesh S Nair. Does advertising serve as a signal? evidence from a field experiment in mobile search. The Review of Economic Studies, 87(3):1529 1564, 2020. [56] Aleksandrs Slivkins. Contextual bandits with similarity information. In Proceedings of the 24th annual Conference On Learning Theory, pages 679 702. JMLR Workshop and Conference Proceedings, 2011. [57] Wei Tang and Chien-Ju Ho. On the bayesian rational assumption in information design. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, volume 9, pages 120 130, 2021. [58] Jibang Wu, Zixuan Zhang, Zhe Feng, Zhaoran Wang, Zhuoran Yang, Michael I Jordan, and Haifeng Xu. Sequential information design: Markov persuasion process and its efficient reinforcement learning. In Proceedings of the 23th ACM Conference on Economics and Computation, 2022. [59] Shuran Zheng and Yiling Chen. Optimal advertising for information products. In Proceedings of the 22nd ACM Conference on Economics and Computation, pages 888 906, 2021. [60] Anton Zhiyanov and Alexey Drutsa. Bisection-based pricing for repeated contextual auctions against strategic buyer. In International Conference on Machine Learning, pages 11469 11480. PMLR, 2020. [61] You Zu, Krishnamurthy Iyer, and Haifeng Xu. Learning to persuade on the fly: Robustness against ignorance. In Proceedings of the 22nd ACM Conference on Economics and Computation, pages 927 928, 2021.