# pricing_with_contextual_elasticity_and_heteroscedastic_valuation__ae142566.pdf

Pricing with Contextual Elasticity and Heteroscedastic Valuation

Jianyu Xu 1 Yu-Xiang Wang 2

Abstract We study an online contextual dynamic pricing problem, where customers decide whether to purchase a product based on its features and price. We introduce a novel approach to modeling a customer s expected demand by incorporating feature-based price elasticity, which can be equivalently represented as a valuation with heteroscedastic noise. To solve the problem, we propose a computationally efficient algorithm called "Pricing with Perturbation (Pw P)", which enjoys an O( d T log T) regret while allowing arbitrary adversarial input context sequences. We also prove a matching lower bound at Ω(

d T) to show the optimality regarding d and T (up to log T factors). Our results shed light on the relationship between contextual elasticity and heteroscedastic valuation, providing insights for effective and practical pricing strategies.

1. Introduction Contextual pricing, a.k.a., Feature-based dynamic pricing, considers the problem of setting prices for a sequence of highly specialized or individualized products. With the growth of e-commerce and the increasing popularity of online retailers as well as customers, there has been a growing interest in this area (see, e.g., Amin et al., 2014; Qiang & Bayati, 2016; Javanmard & Nazerzadeh, 2019; Shah et al., 2019; Cohen et al., 2020; Xu & Wang, 2021; Bu et al., 2022).

Formulated as a learning problem, the seller has no prior knowledge of ideal prices but is expected to learn on the fly by exploring different prices and adjusting their pricing strategy after collecting every demand feedback from customers. Different from non-contextual dynamic pricing (Kleinberg & Leighton, 2003) where identical products are sold repeatedly, a contextual pricing agent is expected to

1University of California, Santa Barbara 2University of California, San Diego. Correspondence to: Jianyu Xu <xu_jy15@ucsb.edu>, Yu-Xiang Wang <yuxiangw@ucsd.edu>.

Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by the author(s).

generalize from one product to another in order to successfully price a previously-unseen product. A formal problem setup is described below:

Contextual pricing. For t = 1, 2, ..., T : 1. A product occurs, described by a context xt Rd. 2. The seller (we) proposes a price pt 0. 3. The customer reveals a demand 0 Dt 1. 4. The seller gets a reward rt = pt Dt.

Here T is the time horizon, and the (random) demand Dt is drawn from a distribution determined by context (or feature) xt and price pt. The sequence of contexts {xt} can be either independently and identically distributed (iids) or chosen arbitrarily by an adversary. The seller s goal is to minimize the cumulative regret against the sequence of optimal prices.

Existing works on contextual pricing usually assumes linearity on the demand, but they fall into two camps. On the one hand, the "linear demand" camp (Qiang & Bayati, 2016; Ban & Keskin, 2021; Bu et al., 2022) assumes the demand Dt as a (generalized) linear model. A typical model is Dt = λ(αpt + x T t β) + ϵt. Here α < 0 is a parameter closely related to the price elasticity. We will rigorously define a price elasticity in Appendix A.1 according to Parkin et al. (2002), where we also show that α is the coefficient of elasticity. Besides of α, other parameters like β Rd

captures the base demand of products with feature xt, ϵt is a zero-mean demand noise, and λ is a known monotonically increasing link function. With this model, we have a noisy observation on the expected demand, which is reasonable as the same product is offered many times in period t. On the other hand, the "linear valuation" camp (Cohen et al., 2020; Javanmard & Nazerzadeh, 2019; Xu & Wang, 2021) models a buyer s valuation yt as linear and assumes a binary demand Dt = 1[pt yt]. All three works listed above assume a linear-and-noisy model with yt = x t θ + Nt, where θ Rd is an unknown linear parameter that captures common valuations and Nt is an idiosyncratic noise assumed to be iid.

Interestingly, the seemingly different modeling principles are closely connected to each other. In the "linear valuation" camp, notice that a customer s probability of "buying" a product equals E[Dt], which is further given by

Pricing with Contextual Elasticity and Heteroscedastic Valuation

E[Dt|p] = P[yt p] := S(p x t θ ),

where S is the survival function of Nt (i.e. S(w) = 1 CDF(w) for w R). This recovers a typical linear demand model by taking λ(w) = S( w) with α = 1 and β = θ . In other words, the distribution of Nt completely characterizes the demand function λ( ) and vice versa.

However, the "linear demand" camp is not satisfied with a fixed α = 1, while the "linear valuation" camp are skeptical about an observable demand Dt even with zeromean iid noise. One common limitation to both models is that neither captures how feature xt affects the price elasticity.

Our model. To address this issue, we propose a natural model that unifies the perspectives of both groups. Also, we resolve the common limitation by modeling heteroscedasticity, where we assume that the elasticity coefficient α is linearly dependent on feature xt. This contextual modeling originates from the fact that different products have different price elasticities (Anderson et al., 1997), and a linear elasticity is widely adopt in literature (Bijmolt et al., 2005). A detailed discussion will be provided in Section 6.

In specific, we assume:

Dt Ber(S(x t η pt x t θ )), (1)

which adopts a generalized linear demand model (GLM) and a Boolean-censored feedback simultaneously. From the perspective of valuation model, it is equivalent to assume Dt = 1[pt yt]

where yt = 1 x t η (x t θ + Nt)

and CDFNt(w) = 1 S(w).

Although Eq. (1) seems more natural than Eq. (2), they are equivalent to each other (with reasonable assumptions on S). Notice that the random valuation yt is heteroscedastic, which means its variance is not the same constant across a variety of xt s. We provide a detailed interpretation of this linear fractional valuation model in appendix.

1.1. Contributions. Our main results are twofold.

1. We propose a new demand model that assumes a featuredependent price elasticity on every product. Equivalently, we model the heteroscedasticity on customers valuations among different products. This model unifies the linear demand and linear valuation camps.

2. We propose a Pricing with Perturbation (Pw P) algorithm that achieves O( d T log T) regret on this model, which is optimal up to log T factors. This regret upper bound holds for both i.i.d. and adversarial {xt} sequences.

1.2. Technical Novelty To the best of our knowledge, we are the first to study a contextual pricing problem with heteroscedastic valuation and Boolean-censored feedback. Some existing works, including Javanmard & Nazerzadeh (2019); Miao et al. (2019); Ban & Keskin (2021); Wang et al. (2021a), focus on related topics and achieve theoretical guarantees. However, their methodologies are not applicable to our settings due to substantial obstacles, which we propose novel techniques to overcome.

Randomized surrogate regret. Xu & Wang (2021) solves the problem with x t η = 1, by taking the negative loglikelihood as a surrogate regret and running an optimization oracle that achieves a fast rate (i.e. an O(log T) regret). However, the log-likelihood is no longer a surrogate regret in our setting, since it is not "convex enough" and therefore cannot provide sufficient (Fisher) information. In this work, we overcome this challenge by constructing a randomized surrogate loss function, whose expectation is "strongly convex" enough to upper bound the regret.

OCO for adversarial inputs. Javanmard & Nazerzadeh (2019) and Ban & Keskin (2021) study the problem with unknown or heterogeneous noise variances (i.e. elasticity coefficients), but their techniques highly rely on the distribution of the feature distributions. As a result, their algorithm could be easily attacked by an adversarial {xt} series. In our work, we settle this issue by conducting an online convex optimization (OCO) scheme while updating parameters. Instead of estimating from the history that requires sufficient randomness in the inputs, our algorithm can still work well for adversarial inputs.

In addition, our algorithm has more advanced properties such as computational efficiency and informationtheoretical optimality. For more highlights of our algorithm, please refer to Section 4.1.

2. Related Works Here we present a review of the pertinent literature on contextual pricing and heteroscedasticity in machine learning, aiming to position our work within the context of related studies. For a detailed comparison on closely-related problem settings and regret bounds, please refer to Table 1. For more related works on non-contextual pricing, contextual pricing, contextual searching and contextual bandits, please refer to Wang et al. (2021b), Xu & Wang (2021), Krishnamurthy et al. (2021) and Zhou (2015) respectively.

Pricing with Contextual Elasticity and Heteroscedastic Valuation

Pricing with (Generalized) Linear Demand. As we mentioned in Section 1.2, there are a large number of recent works on contextual dynamic pricing problems, and we refer to Ban & Keskin (2021) as a detailed introduction. On the one hand, Qiang & Bayati (2016); Nambiar et al. (2019); Miao et al. (2019); Wang et al. (2021a); Ban & Keskin (2021); Bu et al. (2022) assume a (generalized) linear demand model with noise, i.e. E[Dt] = g(αpt β xt). Among those papers, Miao et al. (2019) worksl with a fixed α while we assume α as context-dependent. Wang et al. (2021a) and Ban & Keskin (2021) are the closest to our problem setting, which consist of a generalized linear demand model and noisy observations. On the one hand, Ban & Keskin (2021) assumes independent add-on noises (while we allow binary martingale observations). With the development of a least-square estimator, they present an algorithm that achieves O(s

T) regret (with s being the sparsity factor). On the other hand, Wang et al. (2021a) further gets rid of the independence among noises and allow them to be idiosyncratic. They proposes a UCB-based algorithm with O(d

T) regret and another Thompson-Sampling-based algorithm with O(d 3 2

T) regret, both of which are suboptimal in d. Moreover, all works mentioned above assume the context sequence {xt} to be i.i.d., whereas we consider it "too good to be true" and work towards an algorithm adaptive to adversarial input sequences.

Linear Valuation. Golrezaei et al. (2019); Shah et al. (2019); Goyal & Perivier (2021) adopts the linear valuation model yt = x t θ + Nt, which is a special case of our model as x t η = 1. However, existing works diverge on the assumption whether the seller has precise knowledge on the noise distribution. On the one hand, Cohen et al. (2020); Javanmard & Nazerzadeh (2019); Xu & Wang (2021) assume Nt s drawn from a known distribution. Specifically, both Javanmard & Nazerzadeh (2019) and Xu & Wang (2021) achieve an O(d log T) regret for stochastic and adversarial {xt} sequences, respectively. Javanmard & Nazerzadeh (2019) also studies the setting when x t η is fixed but unknown (in our model) and achieves O(d

T) regret for stochastic {xt} sequences. In comparison, we achieve O( d T log T) on a more general problem and get rid of those assumptions. On the other hand, Fan et al. (2021); Luo et al. (2021); Xu & Wang (2022); Luo et al. (2022) study the problem with unspecified noise distributions. Fan et al. (2021) introduces an exploration-first algorithm with carefully-designed unbiased Method-of Moment estimators. Luo et al. (2021; 2022) proposes UCB-style algorithms with optimistic plug-in estimators, and Xu & Wang (2022) adopts an EXP-4 algorithm with hypothesis discretization. Their methodologies could potentially be extended to a generalized version of our problem setting, where the valuation noise (before scaling) Nt is drawn from an unknown distribution. However, it is worth noting that our problem setting

is not a special case of theirs: The expected valuation in their settings is still linear (E[yt] = x t θ ), while our paper assumes E[yt] = x t θ

x t η that is a linear fractional mapping. We have included more discussions in Appendix C (see "Algorithm and analysis for unknown link function S( )") regarding this aspect.

Nonparametric Valuation. There are various studies on nonparametric pricing problems. Specifically, Chen et al. (2019) studies the contextual pricing problem with nonparametric demand model and binary feedback, achieving a O(T

2+d 4+d ) regret for d-dimension covariates. Ye & Jiang (2024) studies the non-contextual non-parametric pricing problem with β-th order smooth demand curve but without pre-knowledge on β, recovering the minimax rate of O(T β+1

2β+1) given certain additional assumptions. perakis2023dynamic There are various studies on nonparametric pricing problems. Specifically, Chen et al. (2019) studies the contextual pricing problem with non-parametric demand model and binary feedback, achieving a O(T

2+d 4+d ) regret for d-dimension covariates. Ye & Jiang (2024) studies the non-contextual non-parametric pricing problem with β-th order smooth demand curve but without pre-knowledge on β, recovering the minimax rate of O(T β+1

2β+1) given certain additional assumptions. Perakis & Singhvi (2023) studies a non-contextual pricing problem with additive noises, where they improve the switching cost while also maintaining the optimal regret rate.

Heteroscedasticity. Since the valuation noise is scaled by a 1 x t η coefficient, the valuation is heteroscedastic, referring to a situation where the variance is not the same constant across all observations. Heteroscedasticity may lead to bias estimates or loss of sample information. There are several existing methods handling this problem, including weighted least squares method (Cunia, 1964), White s test (White, 1980) and Breusch-Pagan test (Breusch & Pagan, 1979). Furthermore, Anava & Mannor (2016) and Chaudhuri et al. (2017) study online learning problems with heteroscedastic variances and provide regret bounds. For a formal and detailed introduction, we refer the audience to the textbook of Kaufman (2013).

3. Problem Setup 3.1. Notations To formulate the problem, we firstly introduce necessary notations and symbols used in the following sections. The sales session contains T rounds with T known to the seller in advance1. At each time t = 1, 2, . . . , T, a product with

1Here we assume T known for simplicity. For unknown T, we may apply a doubling epoch trick as Javanmard & Nazerzadeh (2019) without affecting the regret rate.

Pricing with Contextual Elasticity and Heteroscedastic Valuation

Known α Unknown fixed α Heteroscedastic α = x t η

Features Stochastic & Adversarial Stochastic Adversarial Stochastic Adversarial

Upper Bound d log T [XW21]

d T This Work

T (independent noises) [BK21] d

T (idiosyncratic noises) [WTL21]

d T This Work

Lower Bound d log T [BR12]

d T This Work

Table 1. Existing related literature and results on regret bounds, with O( ) dropped. Note that each adversarial setting covers the stochastic setting under the same assumptions. Here [XW21] stands for Xu & Wang (2021), [JN19] for Javanmard & Nazerzadeh (2019), [BR12] for Broder & Rusmevichientong (2012), [BK21] for Ban & Keskin (2021), and [WTL21] for Wang et al. (2021a).

feature xt Rd occurs and we propose a price pt 0. Then the nature draws a demand Dt Ber(S(x t η pt x t θ )), where θ , η Rd are fixed unknown linear parameters and the link function S : R [0, 1] is non-increasing. By the end of time t, we receive a reward rt = pt Dt.

Equivalently, this customer has a valuation yt = x t θ +Nt

x t η with noise Nt R, and then make a decision 1t = 1[pt yt] = Dt after seeing the price pt. Similarly, we receive a reward rt = pt 1t. Assume Nt DF is independently and identically distributed (i.i.d.), with cumulative distribution function (CDF) F = 1 S. Denote s := S and f := F .

3.2. Definitions Here we define some key quantities. Firstly, we define an expected reward function.

Definition 3.1 (expected reward function). Define

r(u, β, p) := E[rt|x t θ = u, x t η = β, pt = p] = p S(β p u) (3) as the expected reward function.

Given this, we further define a greedy price function as the argmax of r(u, β, p) over p.

Definition 3.2 (greedy price function). Define J(u, β) as a greedy price function, i.e. the price that maximizes the expected reward given u = x t θ and β = x t η .

J(u, β) = argmax p R r(u, β, p) = argmax p R p S(β p u)

Notice that

J(u, β) = argmax p p S(βp u) = 1

β argmax βp βp S(βp u) = 1

(5) According to Xu & Wang (2021, Section B.1), we have the following properties.

Lemma 3.3. Denote φ(w) := S(w)

s(w) w = 1 F (w)

and we have J(u, β) = u+φ 1(u)

β . Also, for u 0 and

β > 0, we have J(u,β)

Then we define a negative log-likelihood function of parameter hypothesis (θ, η) given the results at time t.

Definition 3.4 (log-likelihood functions). Denote ℓt(θ, η) as the negative log-likelihood at time t, and define Lt(θ, η) as their summations:

ℓt(θ, η) =1t log S(xt η pt x t θ)

+ (1 1t) log(1 S(x t η pt x t θ)).

Finally, we define a round-t expected regret and a cumulative expected regret.

Definition 3.5 (regrets). Define Regt(pt) := r(x t θ , x t η , J(x t θ , x t η )) r(x t θ , x t η , pt) as the expected regret at round t, conditioning on price pt. Also, define the cumulative regret as follows

t=1 Regt(pt) (7)

3.3. Assumptions We establish three technical assumptions to make our analysis and presentation clearer. Firstly, we assume that all feature and parameter vectors are bounded within a unit ball in Euclidean norm. This assumption is without loss of generality as it only rescales the problem.

Assumption 3.6 (bounded feature and parameter spaces). Assume features xt Hx and parameters θ Hθ, η Hη. Denote U d p := {x Rd, x p 1} as an Lp-norm unit ball in Rd. Assume all Hx, Hθ, Hη U d p . Also, assume

Pricing with Contextual Elasticity and Heteroscedastic Valuation

x θ > 0, x Hx, θ Hθ and x η > Cβ > 0, x Hx, η Hη for some constant Cβ (0, 1).

The positiveness of elasticity coefficient x η > 0 comes from the Law of Demand (Gale, 1955; Hildenbrand, 1983), stating that the quantity purchased varies inversely with price. This is derived from the Law of Diminishing Marginal Utilities and has been widely accepted (Marshall, 2009). We will show the necessity of assuming an elasticity lower bound Cβ in Appendix C. In specific, we claim that any algorithm will suffer a regret of Ω( 1

Cβ ). For the simplicity of notation, we denote [θ; η] := [θ , η ] R2d as the combination of d-dimension column vectors θ and η. Since we know that x t θ [0, 1] and x t η [Cβ, 1], we have J(x t θ, x t η) [J(0, 1), J(1, Cβ)]. Later we will show that the price perturbation is no more than J(0,1)

10 . Therefore, we may have the following assumption.

Assumption 3.7 (bounded prices). For any price pt at each time t = 1, 2, . . . , T, we require pt [c1, c2], where c1 = J(0,1)

2 and c2 = 2J(1, Cβ).

Similar to Javanmard & Nazerzadeh (2019), we also assume a log-concavity on the noise CDF.

Assumption 3.8 (log-concavity). Every Dt is independently sampled according to Eq. (1), with S(ω) [0, 1] and s(ω) = S (ω) > 0, ω R. Equivalently, the valuation noise Nt DF is independently and identically distributed (i.i.d.), with CDF F = 1 S. Assume that S C2, and S and (1 S) are strictly log-concave.

4. Main Results To solve the contextual pricing problem with featurized elasticity, we propose our Pricing with Perturbation (Pw P) algorithm. In the following, we firstly describe the algorithm and highlight its properties, then analyze (and bound) its cumulative regret, and finally prove a regret lower bound to show its optimality.

4.1. Algorithm The pseudocode of Pw P is displayed as Algorithm 1, which calls an ONS oracle (Algorithm 2).

At each time t, it inherits parameters θt and ηt from (t 1) and takes in a context vector xt. By trusting in θt and ηt, it calculates a greedy price ˆpt and outputs a perturbed version pt = ˆpt + t. After seeing customer s decision 1t, Pw P calls an Online Newton Step (ONS) oracle (see Algorithm 2) to update the parameters as θt+1 and ηt+1 for future use.

Algorithm 1 Pricing with Perturbation (Pw P)

1: Input: parameter spaces Hθ, Hη, link function S, time horizon T, dimension d 2: Initialization:parameters θ1 Hθ, η1 Hη, price perturbation , cumulative likelihood L0 = 0, matrix A0 = ϵ I2d and parameter ϵ, γ 3: for t = 1, 2, . . . , T do 4: Observe xt; 5: Calculate greedy price ˆpt = J(x t θt, x t ηt) 6: Sample t = with Pr = 0.5 and t = with Pr = 0.5; 7: Propose price pt = ˆpt + t; 8: Receive the customer s decision 1t; 9: Construct negative log-likelihood ℓt(θ, η) and Lt(θ, η) as eq. (6); 10: Update parameters:

[θt+1; ηt+1] ONS([θt; ηt])

11: end for

Algorithm 2 Online Newton Step (ONS)

1: Input: current parameter [θt, ηt], likelihood ℓt(θ, η), matrix At, parameter γ, parameter spaces Hθ and Hη. 2: Calculate t = ℓt(θ, η); 3: Rank-1 update: At = At 1 + t t ; 4: Newton step: [ˆθt+1; ˆηt+1] = [ˆθt; ˆηt] 1

5: Projection: [θt+1; ηt+1] = ΠAt Hθ Hη([ˆθt+1; ˆηt+1]);

4.1.1. HIGHLIGHTS We highlight the achievements of the Pw P algorithm in the following three aspects.

In this pricing problem. As we mentioned in Section 1.2, the key to solving this contextual elasticity (or heteroscedastic valuation) pricing problem is to construct a surrogate loss function. Xu & Wang (2021) adopts negative log-likelihood in their setting, which does not work for ours since it is not "convex" enough. In our Pw P algorithm, we overcome this challenge by introducing a perturbation on the proposed greedy price. This idea originates from the observation that the variance of pt contributes positively to the "convexity" of the expected log-likelihood, which helps "re-build" the upper-bound inequality.

In online optimization. Pw P perturbs the greedy action (price) it should have taken. This idea is similar to a "Following the Perturbed Leader (FTPL)" algorithm (Hutter et al., 2005) that minimizes the summation of the empirical risk and a random loss function serving as a perturbation. However, this might lead to extra computational cost as the random perturbation is not necessarily smooth and therefore

Pricing with Contextual Elasticity and Heteroscedastic Valuation

hard to optimize. In this work, Pw P introduces a possible way to overcome this obstacle: Instead of perturbing the objective function, we may directly perturb the action to explore its neighborhood. Our regret analysis and results indicate the optimality of this method and imply a potentially wide application.

In information theory. We show the following fact in the regret analysis of Pw P: By adding perturbation on pt, we may lose O( 2) in reward but will gain O( 2) I in Fisher information (i.e. the expected Hessian of negative log-likelihood function) in return. By Cramer-Rao Bound, this leads to O( 1

2 ) estimation error. In this way, we quantify the information (observing from exploration) on the scale of reward, which shares the same idea with the Upper Confidence Bound (Lai & Robbins, 1985) method that always maximizes the summation of empirical reward and information-traded reward.

Besides, Pw P is computationally efficient as it only calls the ONS oracle for once. As for the ONS oracle, it updates an A 1 t = (At 1 + t t ) 1 at each time t, which is with O(d2) time complexity according to the following Woodbury matrix identity

(A+xx ) 1 = A 1 1 1 + x A 1x A 1x(A 1x) . (8)

4.2. Regret Upper Bound Now we analyze the regret of Pw P and propose an upper bound up to constant coefficients.

Theorem 4.1. Under Assumption 3.6, 3.7 and 3.8, by taking

= min d log T

, the algorithm Pw P

guarantees an expected regret at O( d T log T).

In the following, we prove Theorem 4.1 by stating a thread of key lemmas. We leave the detailed proof of those lemmas to Appendix A.

Proof. The proof overview can be displayed as the following roadmap of inequalities:

E[Regret] =

t=1 Regt(pt)

t=1 O (x t (θt θ ))2 + (x t (ηt η ))2 + 2 #

PT t=1 E [ℓt(θt, ηt) ℓt(θ , η )]

2 + T 2 = O( p

d T log T).

Here the first inequality is by the smoothness of regret function (see Lemma 4.2), the second inequality is by a special strong convexity of ℓt(θ, η) that contributes to the surrogate loss (see Lemma 4.3), the third inequality is by Online Newton Step (see Lemma 4.4), and the last equality is by the value of . A rigorous version of Eq. (9) can be found in Appendix A.4.

We firstly show the smoothness of Regt(pt):

Lemma 4.2 (regret smoothness). Denote p t := J(x t θ , x t η ). There exists constants Cr > 0 and CJ > 0 such that

Regt(pt) Cr (pt p t )2

Cr 2 CJ (x t (θt θ ))2 + (x t (ηt η ))2 + 2 . (10)

While the first inequality of Eq. (10) is from the smoothness, and the second inequality is by the Lipschitzness of function J(u, β). Please refer to Appendix A.2 for proof details. We then show the reason why the log-likelihood function can still be a surrogate loss with carefully randomized pt.

Lemma 4.3 (surrogate expected regret). There exists a constant Cl > 0 such that θ Hθ, η Hη, we have

E[ℓt(θ, η) ℓt(θ , η )|θt, ηt]

10 [(θ θ ) , (η η ) ] xtx t 0 0 xtx t

h x t (θ θ ) 2 + x t (η η ) 2i .

This lemma is crucial. We show a proof sketch here and defer the detailed proof to Appendix A.3.

Proof sketch of Lemma 4.3. We show that there exist constants Cl > 0, Cp > 0 such that

1. 2ℓt(θ, η) Cl xtx t ptxtx t ptxtx t p2 txtx t

2. E xtx t ptxtx t ptxtx t p2 txtx t |θt, ηt

Cp 2 xtx t 0 0 xtx t

The first property above relies on the exp-concavity of ℓt. Notice that the second property does not hold without the E notation, as the left hand side is a (a b)2 form while the right hand side is in a (a2 + b2) form. In general, there

Pricing with Contextual Elasticity and Heteroscedastic Valuation

exist no constant c > 0 such that (a b)2 c(a2 + b2). However, due to the randomness of pt, we have

E[p2 t|ˆpt] = E[pt|ˆpt]2 + 2. (12)

In this way, the conditional expectation of the left hand side turns to (a b)2 + λ b2 and we have

(a b)2 + λb2

2 b)2 + (1 1 1 + λ

2 (a2 + b2).

Similarly, we upper bound xtx t 0 0 xtx t

E[ 2ℓt(θ, η)|θt, ηt] up to a Cp 2 coefficient.

With those two properties above, along with a property of likelihood function that E[ ℓt(θ , η )] = 0, we can prove Lemma 4.3 by taking a Taylor expansion of ℓt at [θ ; η ].

Finally, we cite a theorem from Hazan (2016) as our Lemma 4.4 that reveals the surrogate regret rate on negative log-likelihood functions.

Lemma 4.4. With parameters G = supθ Hθ,η Hη lt(θ, η) 2, D = sup [θ1; η1] [θ2; θ2] 2, α = Ce, γ = 1

2 min{ 1 4GD, α} and ϵ = 1 γ2D2 and T > 4, Keep running Algorithm 2 for t = 1, 2, . . . , T guarantees:

t=1 ℓt(θt, ηt) min θ Hθ,η Hη

t=1 ℓt(θ, η)

α + GD)d log T.

With all these lemma above, we have proved every line of Eq. (9).

4.3. Lower Bounds We claim that Pw P is near-optimal in information theory, by proposing a matching regret lower bound in Theorem 4.5. We present the proof with valuation model to match with existing results.

Theorem 4.5. Consider the contextual pricing problem setting with Bernoulli demand model given in Eq. (1). With all assumptions in Section 3 hold, any pricing algorithm has to suffer a Ω(

d T) worst-case expected regret for T 2d2(1+log d), with T the time horizon and d the dimension of context.

Proof Sketch. We defer the proof details to Appendix A.5. The main idea is to reduce d numbers of 1-dimension problems to this problem setting. In fact, we may consider the following problem setting:

1. Construct set X = {ei := [0, . . . , 0, 1, 0, . . . , 0] Rd with only ith place being 1, i = 1, 2, . . . , d}.

2. Let θ = [ u1

σ3 , . . . , ud

σd ] , η = [ 1

σ3 , . . . , 1

σd ] , and therefore we have

e i η = ui + σi Nt.

3. At each time t = 1, 2, . . . , T, sample xt X independently and uniformly at random.

In this way, we divide the whole time series T into d separated sub-problems, where the Sub-Problem i has a valuation model yt(i) = ui + σi Nt, for i = 1, 2, . . . , d. Let Nt N(0, 1), t = 1, 2, . . . , T, and therefore yt(i) N(ui, σ2 i ) are independent Gaussian random variables. According to Hoeffding s Inequality, each Sub-Problem i has at least T

2d time periods with high probability. According to Xu & Wang (2021, Theorem 12) (originated from Broder & Rusmevichientong (2012, Theorem 3.1)), the regret lower

bound of each sub-problem is Ω( q

T 2d). Therefore, the total

expected regret lower bound is Ω(d q

5. Numerical Experiments Here we conduct numerical experiments to validate the low-regret performance of our algorithm Pw P. Since we are the first to study this heteroscadestic valuation model, we do not have a baseline algorithm working for exactly the same problem. However, we can modify the RMLP2 algorithm in Javanmard & Nazerzadeh (2019) by only replacing their max-likelihood estimator (MLE) for θ with a new MLE for both θ and η . This modified RMLP-2 algorithm does not have a regret guarantee in our setting, but it may still serve as a baseline to compare with. n the following part, we will compare the cumulative regrets of our ONSPP algorithm with the (modified) RMLP-2 in the following two scenarios:

1. The linear-fractional valuation yt = x t θ +Nt

2. A fully-linear valuation yt = x t θ + x t η Nt.

Notice that the second valuation model is also heteroscedastic, but the linear scalar only multiplies with the noise. We design the first experiment to show the regret performance of our algorithm, and the second experiment to imply the generality of our valuation model and our algorithm to model-misspecified settings. In order to show the regret dependence w.r.t. T, we plot all cumulative regret curves in log-log plots, where an α slope indicates an O(T α) dependence. We repeat each experiment for 20 times.

Pricing with Contextual Elasticity and Heteroscedastic Valuation

29 210 211 212 213 214 215 216 t=1,2,...,T

Pw P Pw P Linear fit, slope=0.557 RMLP-2 RMLP-2 Linear fit, slope=0.552

(a) Stochastic {xt} s

29 210 211 212 213 214 215 216 t=1,2,...,T

Pw P Pw P Linear fit, slope=0.513 RMLP-2 RMLP-2 Linear fit, slope=0.959

(b) Adversarial {xt} s

Figure 1. The regret of Pw P algorithm and a modified RMLP-2 algorithm on simulation data (generated according to Eq. (1)), plotted in log-log scales to indicate the regret dependence on T. Figure 1a and Figure 1b are for stochastic and adversarial {xt} sequences respectively. We also plot linear fits for those regret curves, where a slope-α line indicates an O(T α) regret. The error bands are drawn with 0.95 coverage using Wald s test. From the figures, we know that Pw P performs closely to its O( T log T) regret regardless of the types of input context sequences, whereas RMLP-2 fails in the attack of adversarial input.

5.1. Performances on well-assumed model In this part, we test our algorithm ONSPP and the modified RMLP-2 on the correct demand model (as assumed in Eq. (1)) with both stochastic and adversarial {xt} sequences, respectively.

We test Pw P and the modified RMLP-2 on the demand model assumed in Eq. (1) with both stochastic and adversarial {xt} sequences, respectively. Basically, we assume T = 216 d = 2, Nt N(0, σ2) with σ = 0.5, and we repeatedly run each algorithm for 20 times in each experiment setting. In order to show the regret dependence w.r.t. T, we plot all cumulative regret curves in log-log plots, where an α slope indicates an O(T α) dependence.

Stochastic {xt}. We implement and test Pw P and RMLP2 on stochastic {xt} s, where xt are iid sampled from N(µx, Σx) (for µx = [10, 10, . . . , 10] and some randomly sampled Σx) and then normalized s.t. xt 2 1. The numerical results are shown in Figure 1a. Numerical results show that both algorithms achieve O(T 0.56) regrets, which is close to the theoretic regret rate at O( T log T).

Adversarial {xt}. Here we design an adversarial {xt} sequence to attack both algorithms. Since RMLP-2 divides the whole time horizon T into epochs with length k = 1, 2, 3, . . . sequentially and then does pure exploration at the beginning of each epoch, we may directly attack those pureexploration rounds in the following way: (1) In each pureexploration round (i.e. when t = 1, 3, 6, . . . , k(k+1)

2 , . . .), let the context be xt = [1, 0] ; (2) In any other round, let the context be xt = [0, 1] . In this way, the RMLP-2 algorithm

will never learn θ [2] and η [2] since the inputs of pureexploration rounds do not contain this information. Under this oblivious adversarial context sequence, we implement Pw P and RMLP-2 and compare their performance. The results are shown in Figure 1b, indicating that Pw P can still guarantee O(T 0.513) regret (close to O( T log T)) while RMLP-2 runs into a linear regret.

As a high-level interpretation, the performance difference is because Pw P adopts a "distributed" exploration at every time t while RMLP-2 makes it more "concentrated". Although both Pw P and RMLP-2 take the same amount of exploration that optimally balance the reward loss and the information gain (and that is why they both perform well in stochastic inputs), randomly distributed exploration would save the algorithm from being "attacked" by oblivious adversary. In fact, this phenomenon is analog to ϵ-Greedy versus Exploration-first algorithms in multi-armed bandits. We will discuss more in Appendix C.

So far, we have presented the numerical results of running Pw P and a modified RMLP-2 on the well-assumed demand model as Eq. (1) (or Eq. (2) equivalently). Besides of that, we also conduct experiments on a model-misspecification setting to show the robustness, where the true demand (or valuation) distribution is not the same as Eq. (1) or Eq. (2). The numerical results are presented in Appendix B.

6. Discussion Here we discuss the motivation and the limitation of making Assumption 3.6. We leave the majority of discussion to Appendix C.

Assumption on linear elasticity. We assume that the price

Pricing with Contextual Elasticity and Heteroscedastic Valuation

elasticity α := x t η is linear on xt. The reasons are threefold. First, in accordance with the parsimony principle, we have opted for a linear model due to its simplicity and causality. This principle suggests that, all else being equal, the simpler model should be selected. Second, linear models provide a robust first-order approximation, are highly interpretable, and are computationally efficient for dynamic pricing where rapid decision-making is essential. Last but not least, the use of linear models to represent price elasticity is well-established in the literature, receiving empirical support from studies such as those by Bijmolt et al. (2005); Lijesen (2007) and Miller & Alberini (2016). While we recognize that price elasticity can exhibit complex behaviors, the linear approximation provides a strong foundation for initial analysis and has proven to be an effective tool in comparable research endeavors.

Necessity of lower-bounding x t η from 0. As we state in Assumption 3.6, the price elasticity coefficient x t η is lower bounded by a constant Cβ > 0. On the one hand, this is necessary since we cannot have an upper bound on the optimal price without this assumption. On the other hand, according to Eq. (3), we know that r(u, β, p) = r(u, 1, β p) 1

β , which indicates that the reward is rescaled by 1

β . As a result, the regret should be proportional to 1 Cβ . Although a larger (i.e. closer to 0) elasticity would lead to a more smooth demand curve, this actually reduce the information we could gather from customers feedback and slow down the learning process. We look forward to future researches getting rid of this assumption and achieve more adaptive regret rates. See more details in Appendix C.

Regret lower bounds for fixed unknown noise distributions. We claim a Ω(

d T) regret lower bound in Theorem 4.5 with customers demand model being Eq. (1). However, this result does not imply a Ω(

d T) lower bound for customers valuation being yt = x t θ + Nt adopted by Javanmard & Nazerzadeh (2019); Cohen et al. (2020); Xu & Wang (2021). This is because our problem setting is more general than theirs, and our construction of Ω(

d T) regret lower bounds are substantially beyond the scope of this specific subproblem. So far, the best existing regret lower bound for the linear noisy model (yt = x t θ + Nt) is still Ω(

T). However, we conjecture that this should also be Ω(

d T). The hardness of proving this lower bound comes from the fact that the noises are iid over time, which prevents a trivial separation into d independent sub-sequences as we do.

7. Conclusion In summary, our work focuses on the problem of contextual pricing with highly differentiated products. We propose a contextual elasticity model that unifies the linear demand and linear valuation camps and captures the price

effect and heteroscedasticity. To solve this problem, we develop an algorithm Pw P, which utilizes Online Newton Step (ONS) on a surrogate loss function and proposes perturbed prices for exploration. Our analysis show that it guarantees a O( d T log T) regret even for adversarial context sequences. We also provide a matching Ω(

d T) regret lower bound to show its optimality (up to log T factors). Besides, our numerical experiments also validate the regret bounds of Pw P and its advantage over existing method. We hope this work would shed lights on the research of contextual pricing as well as online decision-making problems.

Impact Statement This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here. Please refer to Appendix C for a detailed discussion.

Acknowledgement The authors are very grateful to Amazon Web Services for their gift funding support to this work.

Amin, K., Rostamizadeh, A., and Syed, U. Repeated contextual auctions with strategic buyers. In Advances in Neural Information Processing Systems (NIPS-14), pp. 622 630, 2014.

Anava, O. and Mannor, S. Heteroscedastic sequences: beyond gaussianity. In International Conference on Machine Learning, pp. 755 763. PMLR, 2016.

Anderson, P. L., Mc Lellan, R. D., Overton, J. P., and Wolfram, G. L. Price elasticity of demand. Mc Kinac Center for Public Policy. Accessed October, 13(2), 1997.

Baby, D., Xu, J., and Wang, Y.-X. Non-stationary contextual pricing with safety constraints. Transactions on Machine Learning Research, 2022.

Ban, G.-Y. and Keskin, N. B. Personalized dynamic pricing with machine learning: High-dimensional features and heterogeneous elasticity. Management Science, 67(9): 5549 5568, 2021.

Bijmolt, T. H., Van Heerde, H. J., and Pieters, R. G. New empirical generalizations on the determinants of price elasticity. Journal of marketing research, 42(2):141 156, 2005.

Breusch, T. S. and Pagan, A. R. A simple test for heteroscedasticity and random coefficient variation. Econometrica: Journal of the econometric society, pp. 1287 1294, 1979.

Pricing with Contextual Elasticity and Heteroscedastic Valuation

Broder, J. and Rusmevichientong, P. Dynamic pricing under a general parametric choice model. Operations Research, 60(4):965 980, 2012.

Bu, J., Simchi-Levi, D., and Wang, C. Context-based dynamic pricing with partially linear demand model. In Advances in Neural Information Processing Systems, 2022.

Chaudhuri, K., Jain, P., and Natarajan, N. Active heteroscedastic regression. In International Conference on Machine Learning, pp. 694 702. PMLR, 2017.

Chen, Q., Jasin, S., and Duenyas, I. Nonparametric selfadjusting control for joint learning and optimization of multiproduct pricing with finite resource capacity. Mathematics of Operations Research, 44(2):601 631, 2019.

Chen, X., Simchi-Levi, D., and Wang, Y. Utility fairness in contextual dynamic pricing with demand learning. ar Xiv preprint ar Xiv:2311.16528, 2023.

Chu, W., Li, L., Reyzin, L., and Schapire, R. Contextual bandits with linear payoff functions. In International Conference on Artificial Intelligence and Statistics (AISTATS11), pp. 208 214, 2011.

Cohen, M. C., Lobel, I., and Paes Leme, R. Feature-based dynamic pricing. Management Science, 66(11):4921 4943, 2020.

Cohen, M. C., Miao, S., and Wang, Y. Dynamic pricing with fairness constraints. Available at SSRN 3930622, 2021.

Cohen, M. C., Elmachtoub, A. N., and Lei, X. Price discrimination with fairness constraints. Management Science, 2022.

Cunia, T. Weighted least squares method and construction of volume tables. Forest Science, 10(2):180 191, 1964.

Fan, J., Guo, Y., and Yu, M. Policy optimization using semiparametric models for dynamic pricing. ar Xiv preprint ar Xiv:2109.06368, 2021.

Gale, D. The law of supply and demand. Mathematica scandinavica, pp. 155 169, 1955.

Golrezaei, N., Jaillet, P., and Liang, J. C. N. Incentiveaware contextual pricing with non-parametric market noise. ar Xiv preprint ar Xiv:1911.03508, 2019.

Goyal, V. and Perivier, N. Dynamic pricing and assortment under a contextual mnl demand. ar Xiv preprint ar Xiv:2110.10018, 2021.

Hazan, E. Introduction to online convex optimization. Foundations and Trends in Optimization, 2(3-4):157 325, 2016.

Hildenbrand, W. On the" law of demand". Econometrica: Journal of the Econometric Society, pp. 997 1019, 1983.

Hutter, M., Poland, J., et al. Adaptive online prediction by following the perturbed leader. 2005.

Javanmard, A. and Nazerzadeh, H. Dynamic pricing in highdimensions. The Journal of Machine Learning Research, 20(1):315 363, 2019.

Kaufman, R. L. Heteroskedasticity in regression: Detection and correction. Sage Publications, 2013.

Kleinberg, R. and Leighton, T. The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In IEEE Symposium on Foundations of Computer Science (FOCS-03), pp. 594 605. IEEE, 2003.

Krishnamurthy, A., Lykouris, T., Podimata, C., and Schapire, R. Contextual search in the presence of irrational agents. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing (STOC-21), pp. 910 918, 2021.

Lai, T. L. and Robbins, H. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6(1): 4 22, 1985.

Leme, R. P., Sivan, B., Teng, Y., and Worah, P. Learning to price against a moving target. In International Conference on Machine Learning, pp. 6223 6232. PMLR, 2021.

Lijesen, M. G. The real-time price elasticity of electricity. Energy economics, 29(2):249 258, 2007.

Luo, Y., Sun, W. W., et al. Distribution-free contextual dynamic pricing. ar Xiv preprint ar Xiv:2109.07340, 2021.

Luo, Y., Sun, W. W., and Liu, Y. Contextual dynamic pricing with unknown noise: Explore-then-ucb strategy and improved regrets. In Advances in Neural Information Processing Systems, 2022.

Marshall, A. Principles of economics: unabridged eighth edition. Cosimo, Inc., 2009.

Miao, S., Chen, X., Chao, X., Liu, J., and Zhang, Y. Contextbased dynamic pricing with online clustering. ar Xiv preprint ar Xiv:1902.06199, 2019.

Miller, M. and Alberini, A. Sensitivity of price elasticity of demand to aggregation, unobserved heterogeneity, price trends, and price endogeneity: Evidence from us data. Energy Policy, 97:235 249, 2016.

Nambiar, M., Simchi-Levi, D., and Wang, H. Dynamic learning and pricing with model misspecification. Management Science, 65(11):4980 5000, 2019.

Pricing with Contextual Elasticity and Heteroscedastic Valuation

Parkin, M., Powell, M., and Matthews, K. Economics. Addison-Wesley, Harlow, 2002.

Perakis, G. and Singhvi, D. Dynamic pricing with unknown nonparametric demand and limited price changes. Operations Research, 2023.

Qiang, S. and Bayati, M. Dynamic pricing with demand covariates. ar Xiv preprint ar Xiv:1604.07463, 2016.

Shah, V., Johari, R., and Blanchet, J. Semi-parametric dynamic contextual pricing. Advances in Neural Information Processing Systems, 32, 2019.

Wang, H., Talluri, K., and Li, X. On dynamic pricing with covariates. ar Xiv preprint ar Xiv:2112.13254, 2021a.

Wang, Y., Chen, B., and Simchi-Levi, D. Multimodal dynamic pricing. Management Science, 2021b.

White, H. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica: journal of the Econometric Society, pp. 817 838, 1980.

Xu, J. and Wang, Y.-X. Logarithmic regret in feature-based dynamic pricing. Advances in Neural Information Processing Systems, 34, 2021.

Xu, J. and Wang, Y.-X. Towards agnostic feature-based dynamic pricing: Linear policies vs linear valuation with unknown noise. International Conference on Artificial Intelligence and Statistics (AISTATS), 2022.

Xu, J., Qiao, D., and Wang, Y.-X. Doubly fair dynamic pricing. In International Conference on Artificial Intelligence and Statistics, pp. 9941 9975. PMLR, 2023.

Ye, Z. and Jiang, H. Smoothness-adaptive dynamic pricing with nonparametric demand learning. In International Conference on Artificial Intelligence and Statistics, pp. 1675 1683. PMLR, 2024.

Zhou, L. A survey on contextual multi-armed bandits. ar Xiv preprint ar Xiv:1508.03326, 2015.

Pricing with Contextual Elasticity and Heteroscedastic Valuation

A. Definition and Proof Details Here we show the proof details of the lemmas we have stated in Section 4.2. Before that, let us clarify some terminologies we mentioned in the main paper.

A.1. Definitions Firstly, we rigorously define the concept of price elasticity occurring in Section 1.

Definition A.1 (Price Elasticity (Parkin et al., 2002)). Suppose D(p) is a demand function of price p. Then the price elasticity Ed of demand is defined as

ED := D(p)/D(p)

p p D(p). (15)

With this definition, along with our generalized linear demand model given in Eq. (1), the price elasticity for the expected demand S(x t η pt x t θ ) is

ED = S(x t η pt x t θ ) pt pt S(x t η pt x t θ )

=x t η s(x t η pt x t θ ) S(x t η pt x t θ ) pt. (16)

Here s( ) = S ( ). Therefore, despite the effect of the link function and the price pt, the price elasticity is proportional to the price coefficient x t η . This is why we call x t η (or α in the general model D(p) = λ(α p + x T t β)) the elasticity coefficient or coefficient of elasticity in Section 1.

A.2. Proof of Lemma 4.2 Proof. In order to prove Lemma 4.2, we show the following lemma that indicates the Lipschitzness of J(u, β):

Lemma A.2 (Lipschitz of optimal price). There exists a constant CJ > 0 such that

|J(u1, β1) J(u2, β2)| CJ (|u1 u2| + |β1 β2|). (17)

With this lemma, we get the second inequality of Eq. (10). We will prove this lemma later in this subsection. Now, we focus on the first inequality. Notice that

Regt(pt) =r(x t θ , x t η , p t ) r(x t θ , x t η , pt)

p |u=x t θ ,β=x t η ,p=p t (p t pt)

2 inf p [c1,c2],β [Cβ,1],u [0,1] 2r(u, β, p)

p2 |u=x t θ ,β=x t η ,p=p t (p t pt)2

2 sup p [c1,c2],β [Cβ,1],u [0,1] {|2s(β p u) β + p s (β p u) β2|}(p t pt)2.

Here the first line is by the definition of Regt(pt), the second line is by smoothness, the third line is by the optimality of p t , and the last line is by calculus. Since |2s(β p u) β+p s (β p u) β2| is continuous on p [c1, c2], β [Cβ, 1], u [0, 1], we denote this maximum as 2Cr, which proves the first inequality of Eq. (10).

Now we show the proof of Lemma A.2.

Proof of Lemma A.2. Since J(u, β) = u+φ 1(u)

β where φ(w) = S(w)

s(w) w. Notice that

φ (w) = d S(w)

s(w) dw 1 = d2 log(S(w))

s(w)2 1 < 1 (19)

Pricing with Contextual Elasticity and Heteroscedastic Valuation

since S(w) is log-concave (as is assumed in Assumption 3.8). Given Eq. (19), we know that dφ 1(u)

d(u) = 1 dφ(w)

dw |w=φ 1(u)

( 1, 0). Therefore, we have:

u =1 + dφ 1(u)

β β = J(u, 1)

Therefore, we know that J(u, β) is Lipschitz with respect to u and β respectively. Take CJ = max{ 1

Cβ } and we get Eq. (17).

A.3. Proof of Lemma 4.3 Proof. We firstly show the convexity (and exp-concavity) of ℓt(θ, η) by the following lemma.

Lemma A.3 (exp-concavity). ℓt(θ, η) is convex and Ce-exp-concave with respect to [θ; η], where Ce > 0 is a constant dependent on F and Cβ. Equivalently, 2ℓt(θ, η) Ce ℓt(θ, η) ℓt(θ, η) . Also, we have 2ℓt(θ, η) Cl xtx t pt xtx t pt xtx t v2 t xtx t

for some constant Cl > 0.

The proof of Lemma A.3 is mainly straightforward calculus, and we defer the proof to the end of this subsection. According

to Lemma A.3, we have 2ℓt(θ, η) Cl xtx t pt xtx t pt xtx t p2 t xtx t

. Therefore, we know that

ℓt(θ, η) ℓt(θ , η ) + ℓt(θ , η ) θ θ

+ [(θ θ ) , (η η ) ]Cl

xtx t ptxtx t ptxtx t p2 txtx t

(21) According to the property of likelihood, we have E[ ℓt(θ , η )|θt, ηt] = 0 for any θt and ηt. Combining this with Eq. (21), we get

E[ℓt(θ, η) ℓt(θ , η )|θt, ηt] Cl[(θ θ ) , (η η ) ]E xtx t ptxtx t ptxtx t p2 txtx t |θt, ηt

Recall that ˆpt = J(x t θt, x t ηt) and that pt = ˆpt + t. Therefore, we know that the conditional expectations E[pt|θt, ηt] = ˆpt and E[p2 t|θt, ηt] = ˆp2 t + 2. Given this, we have

E xtx t ptxtx t ptxtx t p2 txtx t |θt, ηt

= xtx t ˆptxtx t ˆptxtx t (ˆp2 t + 2)xtx t

x t , ˆptx t + 0 0 0 2xtx t

" (1 1 1+ 2

Since = min d log T

, we have 1 1 1+ 2

10 . As a result, we have

E xtx t ptxtx t ptxtx t p2 txtx t |θt, ηt

10 xtx t 0 0 xtx t

This proves the lemma.

Pricing with Contextual Elasticity and Heteroscedastic Valuation

Finally, we show the proof of Lemma A.3.

Proof of Lemma A.3. Recall that ℓt(θ, η) = 1t log(S(x t (ptη θ))) (1 1t) log(1 S(x t (ptη θ))). We first calculate the gradient and Hessian of ℓt(θ, η) with respect to [θ; η]. For notation simplicity, denote wt := x t (ptη θ).

ℓt = 1t s(wt)

S(wt) (1 1t) s(wt) 1 S(wt)

2ℓt = 1t s (wt)S(wt) s(wt)2

S(wt)2 + (1 1t) s (wt)(1 S(wt)) s(wt)2

x t , pt x t

= 1t s (wt)S(wt) s(wt)2

S(wt)2 + (1 1t) s (wt)(1 S(wt)) s(wt)2

xtx t pt xtx t pt xtx t p2 t xtx t

(26) According to Assumption 3.8, we know that S(w) and (1 S(w)) are strictly log-concave, which indicates

d2 log(1 S(w))

dw2 = s (w)(1 S(w)) s(w)2

(1 S(w))2 < 0

d2 log(S(w))

dw2 =s (w)S(w) s(w)2

S(w)2 < 0, w R. (27)

Since wt = pt x t η x t θ where pt [c1, c2], we know that wt [ 1, c2]. Since d2 log(S(w))

dw2 and d2 log(1 S(w))

dw2 are continuous on [ 1, c2], we know that

1t s (wt)S(wt) s(wt)2

S(wt)2 + (1 1t) s (wt)(1 S(wt)) s(wt)2

sup w [ 1,c2] max s (wt)S(wt) s(wt)2

S(wt)2 , s (wt)(1 S(wt)) s(wt)2

Denote Cl = supw [ 1,c2] max n s (wt)S(wt) s(wt)2

S(wt)2 , s (wt)(1 S(wt)) s(wt)2

(1 S(wt))2 o > 0, and we know that

2ℓt(θ, η) Cl xtx t pt xtx t pt xtx t p2 t xtx t

Similarly, we know that s(w) S(w) and s(w) 1 S(w) are continuous on [ 1, c2]. Therefore, we may denote CG =

supw [ 1,c2] max n | s(w)

S(w)|, | s(w)

1 S(w)| o > 0 and get

ℓt(θ, η) ℓt(θ, η) C2 G xt pt xt

x t , pt x t . (30)

Given all these above, we have

2ℓt(θ, η) Cl xtx t pt xtx t pt xtx t p2 t xtx t

C2 G C2 G xt pt xt

x t , pt x t

C2 G ℓt(θ, η) ℓt(θ, η) .

Denote Ce := Cl

C2 G and we prove the lemma.

Pricing with Contextual Elasticity and Heteroscedastic Valuation

A.4. Proof of Theorem 4.1 Proof. With all lemmas above, we have

E[Regret] =E[

t=1 E[Regt(pt)|θt, ηt]]

t=1 Cr 2 CJ E[(x t (θt θ ))2 + (x T (ηt η ))2|θt, ηt] + T Cr 2 2]

t=1 2Cr CJ 10 Cl 2 E[ℓt(θt, ηt) ℓt(θ , η )|θt, ηt] + 2Cr T 2]

t=1 ℓt(θt, ηt) ℓt(θ , η )] + 2Cr T 2

d T log T).

Here the first line is by the law of total expectation, the second line is by Lemma 4.2, the third line is by Lemma 4.3, the fourth line is by equivalent transformation, the fifth line is by Lemma 4.4, and the sixth line is by the fact that

= min d log T

. This holds the theorem.

A.5. Proof of Theorem 4.5 Proof. Denote θ = [θ1, θ2, . . . , θd] and η = [η1, η2, . . . , ηd] . We firstly construct a context set X as

X = {ei, i = 1, 2, . . . , d|ei = [0, . . . , 0, 1, 0, . . . , 0] Rd, ei[i] = 1, ei[j] = 1, j = i}. (33)

Then we sample each xt i.i.d. UX, where UX is a uniform distribution defined on each element of X (i.e. Pr[xt = ei] = 1 d, i [d], t [T]). Denote it := i if xt = ei. Now we decompose the indexes set [T] of series {xt}T t=1 into d subsets:

Si := {t|xt = ei, t = 1, 2, . . . , T}, i = 1, 2, . . . , d. (34)

From the perspective of customers valuations, we have yt = e i θ +Nt

e i η = θit

ηit + 1 ηit Nt where Nt is an i.i.d. noise with

known distribution. Let Nt i.i.d. N(0, 1) as standard Gaussian noises. Therefore, each i [d] determines a sub-problem (denoted as Pi) that only happens when t Si and has a fixed valuation distribution, i.e. yt = θi

ηi Nt i.i.d. N( θi

η2 i ). For any t / Si, neither yt nor 1t is dependent on θi or ηi, which enables us to separately consider each Pi. Denote Ti := |Si|. In the following, we bound the least possible regret of this sub-problem as Ω( Ti).

Let θi = ui

σi and ηi = 1 σi where ui, σi > 0 are unknown parameters to be determined. Given this, customers valuation distribution is yt N(ui, σ2 i ) for t Si. According to Theorem 12 of Xu & Wang (2021), let ui = p π

4 i }, and any algorithm has to suffer at least 1 24000 Ti regret.

Then we show that Ti T

2d with high probability. Notice that

t=1 1[xt = ei]]

t=1 Pr[xt = ei]

Pricing with Contextual Elasticity and Heteroscedastic Valuation

27 28 29 210 211 212 213 214 t=1,2,...,T

When RMLP-2 does not model the heteroscedasticity

Pw P Pw P Linear fit, slope=0.536 RMLP-2 RMLP-2 Linear fit, slope=0.681

Figure 2. Regrets of Pw P versus the original homoscedastic RMLP-2 algorithm. In this log-log diagram, a O(T α) regret curve is shown as a straight line with slope α. From the figure, we notice that Pw P is optimal while RMLP-2 is sub-optimal, indicating the necessity of modeling homoscedasticity to achieve optimal regrets.

According to Hoeffding s Inequality, we have:

t=1 1[xt = ei] T

2d] 2 exp{ 2 T 2

2d] 2 exp{ T

According to a union bound on the failure probability, with Pr 1 2d exp{ T

2d2 }, we have Ti T

2d, i [d]. Therefore, the expected regret satisfies

E[Regret] = E[

i=1 Regret(Pi)]

T 2d (1 2d exp{ T

Here the last inequality comes from the assumption that T 2d2(1 + log d) and therefore 1 2d exp{ T

B. More Experiments B.1. Model Adaptivity In this section, we show that it is necessary to model the heteroscedasticity. In specific, we compare Pw P with the original RMLP-2 algorithm from Javanmard & Nazerzadeh (2019) that ignores heteroscedasticity in a heteroscedastic environment. We conduct both experiments for T = 214 rounds and repeat them for 10 epochs. The numerical results are displayed in

Pricing with Contextual Elasticity and Heteroscedastic Valuation

29 210 211 212 213 214 215 216 t=1,2,...,T

Pw P Pw P Linear fit, slope=0.538 RMLP-2 RMLP-2 Linear fit, slope=0.97

Figure 3. Regrets of misspecified Pw P with expanded contexts, in comparison with a baseline RMLP-2 knowing the correct model. The results show that Pw P still have a sub-linear regret in a certain period of time with context expansions, indicating that our linear demand model as Eq. (1) can be generalized to a linear valuation model as Eq. (38) in practice.

the lower figure, plotted in log-log diagrams. From the figure, we notice that the regret of RMLP-2 is much larger than Pw P. Also, the slope of regrets of RMLP-2 is 0.681 >> 0.5, indicating that it does not guarantee a O(

T) regret. In comparison, Pw P still performs well as it achieves a O(T 0.536) regret. This indicates that the algorithmic adaptivity of Pw P to both homoscedastic and heteroscedastic environments is highly non-trivial, and a failure of capturing it would result in a substantial sub-optimality.

B.2. Model Misspecification In Section 5, we compare the cumulative regrets of our Pw P algorithm with the (modified) RMLP-2 on the linear demand model (as Eq. (1) or equivalently, the linear fractional valuation model as Eq. (2)). In this section, we consider a modelmisspecific setting, where customer s true valuation is given by the following equation

yt = x t θ + x t η Nt (38)

and the demand Dt = 1t = 1[pt yt]. As a result, Eq. (38) captures a linear valuation model with heteroscedastic valuation.

Now, we design an experiment to show the generalizability of both our Pw P algorithm and our demand model as Eq. (1). In specific, we run the Pw P algorithm that still models a customer s valuation as yt = x t θ + Nt x t η , where xt Rq is an

expanded version of the original context xt (i.e. xt = π(xt) for some fixed expanding policy π) and θ , η Rq are some fixed parameters2. Therefore, Pw P is trying to learn those misspecified θ and η although there does not exist such an underground truth.

We are curious whether the expansion of context (from xt to xt) would leverage the hardness of model misspecification. For x = [x1, x2, . . . , xd] , denote xn := [xn 1, xn 2, . . . , xn d] . Then for any context x Rd, we specify each context-expanding policy as follows: π(x; x0, a)

:=[x; (x x0)a1; (x x0)a2; . . . ; (x x0)am] R(m+1)d. (39)

The policy π in Eq. (39) is a polynomial expansion of x with index list a = [a1, a2, . . . , am] Zm, where x0 Rd is a fixed start point of this expansion.

Now we consider the baseline to compare with. We claim that it is very challenging to solve the contextual pricing problem with customers valuations being Eq. (38) with theoretic regret guarantees (although the Ω(

T) lower bound given by

2We may assume q d without loss of generality.

Pricing with Contextual Elasticity and Heteroscedastic Valuation

Javanmard & Nazerzadeh (2019) still holds), and there are no existing algorithms targeting at this problem setting. However, there are still some straightforward algorithms that might approach it: For example, a max-likelihood estimate (MLE) of θ

and η . In fact, we may still reuse the framework of RMLP-2 by replacing its MLE oracle according to the distribution given by Eq. (38). In the following, we will compare the performances of

1. Pw P algorithm with the misspecified linear demand model as Eq. (1), with expanded context {xt} s, and

2. RMLP-2 algorithm on the correct linear valuation model as Eq. (38), with original context {xt} s.

We implement Pw P and RMLP-2 on stochastic {xt} sequences (since RMLP-2 has already failed in the adversarial setting) and get numerical results shown as Figure 3. Here we choose x0 = [0.5, 0.5] and a = [0, 1]. For a model-misspecified online-learning algorithm, there generally exists an O(ϵ T) term in the regret rate, where ϵ is a parameter measuring the distance between the global optimal policy and the best proper policy (i.e. the best policy in the hypothesis set). However, our numerical results imply that Pw P may still achieve a sub-linear regret within a certain time horizon T, whereas the baseline RMLP-2 that takes the correct model has a much worse regret. It is worth mentioning that Pw P may still run into Ω(T) regret as T gets sufficiently large, due to model misspecification. These results imply that

1. Our linear demand model Eq. (1) can be generalized to a linear valuation model as Eq. (38) in practice.

2. Our Pw P algorithm can still perform well in model-misspecification settings, and even better than a baseline MLE algorithm with a correct model in a certain period of time.

For the first phenomenon that our demand model can be generalized with context expansion tricks, we may understand it as a Taylor expansion (and we take a linear approximation) at x0 = [0.5, 0.5] . For the second phenomenon that Pw P outperforms RMLP-2, it might be caused by the non-convexity of the log-likelihood function of the valuation model specified in Eq. (38). As a result, while RMLP-2 is solving a non-convex MLE and getting estimates far from the true parameters, Pw P instead works on an online convex optimization problem within a larger space (which probably contain the underground truth) due to context expansions. Unfortunately, we do not have a rigorous analysis of those two phenomenons.

C. More Discussions As supplementary to Section 6, here we discuss some potential extensions and impacts of our work with more details.

Assumption on lower-bounding elasticity as Cβ > 0. Here we claim that the regret lower bound should have an Ω( 1

Cβ ) dependence on Cβ. We prove this by contradiction. Without loss of generality, assume Cβ (0, 1). In specific, we construct a counter example to show it is impossible to have an O(C 1+α β ) regret for any α > 0:

Firstly, let β = Cβ. Suppose there exists an algorithm A that proposes a series of prices {pt}T t=1 which achieve O(C 1+α β ) regret in any pricing problem instance under our assumptions.

Now, we consider another specific problem setting where β = 1 while all other quantities θ , η , {xt}T t=1 stay unchanged. Notice that the reward function has the following property:

r(u, β, p) = p S(βp u) = 1

β (βp) S(βp u) = 1

β r(u, 1, βp) (40)

Therefore, we construct another algorithm A which proposes Cβ pt at t = 1, 2, . . . , T. According to the O(C 1+α β ) regret bound of A, we know that A will suffer Cβ O(C 1+α β ) = O(Cα β ) regret. Let Cβ 0+ and observe the regret of A on the latter problem setting (where β = 1). On the one hand, this is a fixed problem setting with information-theoretic lower regret bound at Ω(log T). On the other hand, the regret will be bounded by lim Cβ 0+ O(Cα β ) = 0. They are contradictory to each other. Given this, we know that there does not exist such an α > 0 such that there exists an algorithm that can achieve O(C 1+α β ). As a result, it is necessary to lower bound the elasticities by Cβ from 0.

Adversarial attacks. Our Pw P algorithm achieves near-optimal regret even for adversarial context sequences, while the baseline (modified) RMLP-2 algorithm fails in an oblivious adversary and suffer a linear regret. This is mainly caused by the fact that RMLP-2 takes a pure-exploration step at a fixed time series, i.e. t = 1, 1 + 2, 1 + 2 + 3, . . . , k(k+1)

2 . This issue might be leveraged by randomizing the position of pure-exploration steps: In each Epoch k = 1, 2, . . ., it may firstly sample one out of all k rounds in this epoch uniformly at random, and then propose a totally random price at this specific round.

Pricing with Contextual Elasticity and Heteroscedastic Valuation

However, RMLP-2 still requires E[xx ] c Id even with this trick.

Nonstationarity in Pricing Although our Pw P algorithm is applicable on heteroscedastic valuations, we still benchmark with an optimal fixed pricing policy that knows η and θ in advance. In reality, customers valuations and elasticities might fluctuate according to the market environment, causing θ t and η different over t [T]. Existing work such as Leme et al. (2021) and Baby et al. (2022) studies similar settings but assumes i.i.d. noises. It is worth to further investigate the setting when heteroscedasticity and nonstationarity occur simultaneously.

Algorithm and analysis for unknown link function S( ). Unfortunately, our algorithm is unable to be generalized to the online contextual pricing problem with linear valuation and unknown noise distribution that has been studied by Fan et al. (2021). Indeed, the problem becomes substantially harder when the noise distribution is unknown to the agent. Existing works usually adopt bandits or bandit-like algorithms to tackle that problem. For example, Fan et al. (2021) approaches it with a combination of exploration-first and kernel method (or equivalently, local polynomial), and Luo et al. (2021) uses a UCB-styled algorithm. However, none of them close the regret gap even under the homoscedastic elasticity environment as they assumed, and the known lower bound is at least Ω(T 2 3 ), or Ω(T

m+1 2m+1 ) for smooth ones (Wang et al., 2021b). On the other hand, we study a parametric model, and it is not quite suitable for a bandit algorithm to achieve optimality in regret. In a nutshell, these two problems (known vs unknown noise distributions), although seem similar to each other, are indeed substantially different.

Comparing with Linear Bandits We claim that no inclusion relationship exists between our setting and linear bandits (Chu et al., 2011), and neither setting s lower bound can imply the other. There are mainly two substantial differences. On the one hand, it is our expected demand model (instead of the expected linear reward for linear bandits) that falls in a generalized-linear model, with a link function bounded by [0, 1]. Consequently, the expected reward function in our model (i.e., the multiplication of price and demand) is neither linear nor generalized linear. On the other hand, while each action in linear bandits is represented by a vector of dimension d, our action, specifically the price p, is a scalar. This distinction further prevents us from adopting the dependence on d in the linear bandits lower bound to our model.

Linear demand model vs linear valuation model. In this work, we adopt a generalized linear demand model with Boolean feedback, as assumed in Eq. (1). As we have stated in Appendix B, there exists a heteroscedastic linear valuation model as Eq. (38) that also captures a customer s behavior. However, this linear valuation model is actually harder to learn, as its log-likelihood function is non-convex. It is still an open problem to determine the minimax regret of an online contextual pricing problem with a valuation model like Eq. (38).

Determination of price perturbation. In this work, we adopt a Rademacher-style perturbation with = 1

T , which optimally balances the exploration cost T 2 against the condition number 1 2 of the surrogate loss function. Notice that this requires a pre-knowledge on T, which may not be available in real-world scenarios.To address this, we apply a "doubling-epoch" trick as suggested by Javanmard & Nazerzadeh (2019). Specifically, we define each epoch k with a length of τk = 2k, and we treat each epoch as an independent pricing problem instance with T = τk and run our Pw P algorithm accordingly. This approach allows our algorithm to adapt to an unknown T while preserving the O(

T) regret rate. Equivalently, we may set adaptive perturbations as t = (2 log2 t ) 1

2 in practice. It is reasonable to see a milder perturbation imposed on the greedy price as the time index t goes larger.

Ethic issues. Since we study a dynamic pricing problem, we have to consider the social impacts that our methodologies and results could have. The major concern in pricing is fairness, which attracts increasing research interests in recent years (Cohen et al., 2021; 2022; Chen et al., 2023; Xu et al., 2023). In general, we did not enforce or quantify the fairness of our algorithm. In fact, we might not guarantee an individual fairness since Pw P proposes random prices, which means even the same input xt s would lead to different output prices. Despite the perturbations t we add to the prices, the pricing model (i.e. the parameters θ and η ) is updating adaptively over time. This indicates that customers arriving later would have relatively fairer prices, since the model is evolving drastically at the beginning rounds and is converging to (local) optimal after a sufficiently long time period. We claim that our Pw P algorithm is still fairer than the baseline RMLP-2 algorithm we compare with, since RMLP-2 takes pure explorations at some specific time. As a result, those customers who are given a totally random price would have a either much higher or much lower expected price than those occurring in exploitation rounds. However, it is still worth mentioning that RMLP-2 satisfies individual fairness within each pure-exploitation epoch, since it does not update parameters nor adding noises then.