# regret_minimization_with_performative_feedback__4bda1060.pdf Regret Minimization with Performative Feedback Meena Jagadeesan 1 Tijana Zrnic 1 Celestine Mendler-Dünner 2 In performative prediction, the deployment of a predictive model triggers a shift in the data distribution. As these shifts are typically unknown ahead of time, the learner needs to deploy a model to get feedback about the distribution it induces. We study the problem of finding near-optimal models under performativity while maintaining low regret. On the surface, this problem might seem equivalent to a bandit problem. However, it exhibits a fundamentally richer feedback structure that we refer to as performative feedback: after every deployment, the learner receives samples from the shifted distribution rather than bandit feedback about the reward. Our main contribution is regret bounds that scale only with the complexity of the distribution shifts and not that of the reward function. The key algorithmic idea is careful exploration of the distribution shifts that informs a novel construction of confidence bounds on the risk of unexplored models. The construction only relies on smoothness of the shifts and does not assume convexity. More broadly, our work establishes a conceptual approach for leveraging tools from the bandits literature for the purpose of regret minimization with performative feedback. 1. Introduction Predictive models deployed in social settings are often performative. This means that the model s predictions by means of being used to inform consequential decisions influence the outcomes the model aims to predict in the first place. For example, travel time estimates influence routing decisions and thus realized travel times, stock price predictions influence trading activity and hence prices. Such 1University of California, Berkeley 2Max Planck Institute for Intelligent Systems, Tübingen. Correspondence to: Meena Jagadeesan , Tijana Zrnic , Celestine Mendler-Dünner < cmendler@tuebingen.mpg.de.>. Proceedings of the 39th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022. Copyright 2022 by the author(s). feedback-loop behavior arises in a variety of domains, including public policy, trading, traffic predictions, and recommendation systems. Perdomo et al. (2020) formalized this phenomenon under the name performative prediction. A key concept in this framework is the distribution map, which formalizes the dependence of the data distribution on the deployed predictive model. This object maps a model, encoded by a parameter vector θ, to a distribution D(θ) over instances. Naturally, in a performative environment, a model s performance is measured on the distribution that results from its deployment. That is, given a loss function ℓ(z;θ), which measures the learner s loss when they predict on instance z using model θ, we evaluate a model based on its performative risk: PR(θ) := Ez D(θ)ℓ(z;θ). (1) In contrast with the risk function studied in usual supervised learning, the performative risk takes an expectation over a model-dependent distribution. Importantly, this distribution is unknown ahead of time; for example, one can hardly anticipate the distribution of travel times induced by a traffic forecasting system without deploying the system first. Due to this inherent uncertainty about D(θ), it is not possible to find a model with low performative risk offline. The learner needs to interact with the environment and deploy models θ to explore the induced distributions D(θ). Given the online nature of this task, we measure the loss incurred by deploying a sequence of models θ1,...,θT by evaluating the performative regret: EPR(θt) min θ PR(θ) , where the expectation is taken over the possible randomness in the choice of {θt}T t=1. Performative regret measures the suboptimality of the deployed sequence of models relative to a performative optimum θPO argminθPR(θ). At first glance, performative regret minimization might seem equivalent to a classical bandit problem. Bandit solutions minimize regret while requiring only noisy zeroth-order access to the unknown reward function in our case PR. The resulting regret bounds generally grow with some notion of complexity of the reward function. Regret Minimization with Performative Feedback However, a naive application of bandit baselines misses out on a crucial fact: performative regret minimization exhibits significantly richer feedback than bandit feedback. When deploying a model θ, the learner gains access to samples from the induced distribution D(θ), rather than only a noisy estimate of the risk PR(θ). We call this feedback model performative feedback. Together with the fact that the learner knows the loss ℓ(z;θ), performative feedback can be used to inform the reward of unexplored arms. For instance, it allows the computation of an unbiased estimate of Ez D(θ)ℓ(z;θ ) for any point θ . To illustrate the power of this feedback model, consider the limiting case in which the performative effects entirely vanish and the distribution map is constant, i.e. D(θ) D for some fixed distribution D independent of θ. With zeroth-order feedback, the learner would still need to deploy different models to explore the landscape of PR and find a point with low risk. However, with performative feedback, a single deployment gives samples from D , thus resolving all uncertainty in the objective (1) apart from finite-sample uncertainty. This raises the question: with performative feedback, can one achieve regret bounds that scale only with the complexity of the distribution map, and not that of the performative risk? 1.1. Our Contribution We study the problem of performative regret minimization based on performative feedback. Our main contribution is performative regret bounds that scale primarily with the complexity of the distribution map. The key conceptual idea is to apply bandits tools to carefully explore the distribution map, and then propagate this knowledge to the objective (1) in order to minimize performative regret. Performative confidence bounds algorithm. Our main focus is on a setting where the distribution map is Lipschitz in an appropriate sense. We propose a new algorithm that takes advantage of performative feedback in order to construct non-trivial confidence bounds on the performative risk in unexplored regions of the parameter space and thus guide exploration. A crucial implication of these bounds is that the algorithm can discard highly suboptimal regions of the parameter space without ever deploying a model nearby. We summarize the regret guarantee of our performative confidence bounds algorithm: Theorem 1.1 (Informal). Suppose that the distribution map D(θ) is ϵ-Lipschitz and that the loss ℓ(z;θ) is Lz-Lipschitz in z. Then, after T deployments, the performative confidence bounds algorithm achieves a regret bound of Reg(T ) = O T + T d+1 d+2 (Lzϵ) d d+2 , where d denotes the zooming dimension of the problem. We compare the bound in Theorem 1.1 to a baseline Lipschitz bandits regret bound. The concept of zooming dimension stems from the work of Kleinberg et al. (2008) and serves as an instance-dependent notion of dimensionality. Kleinberg et al. showed that sublinear regret O T d +1 d +2 L d d +2 can be achieved if the reward function is L-Lipschitz, where d is a zooming dimension. The performative risk can be guaranteed to be Lipschitz if the distribution map is Lipschitz and the loss ℓ(z;θ) is Lipschitz in both arguments. The primary benefit of Theorem 1.1 is that our regret bound scales only with the Lipschitz constant of the distribution map, rather than the Lipschitz constant of the performative risk. In particular, our result allows PR(θ) to be highly irregular as a function of θ, seeing that ℓ(z;θ) as a function of θ is unconstrained. This difference becomes salient when ϵ 0, meaning that the performative effects vanish: our regret bound grows as O( T ) in an essentially dimensionindependent manner. More precisely, the dimension can only arise implicitly through a model class complexity term. On the other hand, the rate of classical Lipschitz bandits remains exponential in the dimension. Another difference between our regret bound and that of Lipschitz bandits is in the zooming dimension. In particular, d is a zooming dimension no smaller than the zooming dimension we obtain in Theorem 1.1. As we will elaborate on in later sections, the benefit we derive from the zooming dimension comes from the fact that it implicitly depends on the Lipschitz constant driving the objective, which is smaller when making full use of performative feedback. Extension to location families. In addition, we study performative regret minimization for the special case where the distribution map has a location family form (Miller et al., 2021). We again prove regret bounds that scale only with the complexity of the distribution map, rather than the complexity of the performative risk. We adapt the Lin UCB algorithm (Li et al., 2010) to learn the hidden parameters of the location family. This enables us to achieve a O( T ) regret without placing any strong convexity assumptions on the performative risk that are required in prior work (Miller et al., 2021). In particular, our result again allows PR(θ) to be highly irregular as a function of θ. More broadly, our work establishes a connection between performative prediction and the bandits literature, which we believe is a worthwhile direction for future inquiry. Consequences for finding performative optima. While we have contextualized our work within online regret minimization, our performative confidence bounds algorithm has the additional property that it converges to the set of performative optima. Thus, if run for sufficiently many time steps, it generates a model with near-minimal performative risk. From this perspective, our algorithm offers superior guaran- Regret Minimization with Performative Feedback tees over retraining methods (e.g., (Perdomo et al., 2020)), which are unlikely to reach minima of the performative risk. 1.2. Related Work Performative prediction. Prior work on performative prediction has largely studied gradient-based optimization methods (Perdomo et al., 2020; Mendler-Dünner et al., 2020; Drusvyatskiy and Xiao, 2020; Brown et al., 2022; Miller et al., 2021; Izzo et al., 2021; Maheshwari et al., 2022; Li and Wai, 2022; Ray et al., 2022; Dong and Ratliff, 2021). Many of the studied procedures only converge to performatively stable points, that is, points θ that satisfy the fixedpoint condition θ argminθ Ez D(θ)ℓ(z;θ ). In general, stable points are not minimizers of the performative risk (Perdomo et al., 2020; Miller et al., 2021), which implies that procedures converging to stable points do not achieve sublinear performative regret. There are exceptions in the literature that focus on finding performative optima (Miller et al., 2021; Izzo et al., 2021), but those algorithms rely on proving or assuming convexity of the performative risk; in this work we make no convexity assumptions. In fact, it is known that the performative risk can be nonconvex even when the loss ℓ(z;θ) is convex and the performative effects are relatively weak (Perdomo et al., 2020; Miller et al., 2021). One other work that studies performative optimality, without imposing convexity, is that of Dong and Ratliff (2021), but they focus on optimization heuristics that are not guaranteed to minimize performative regret. Learning in Stackelberg games. Performative prediction is closely related to learning in Stackelberg games: if D(θ) is thought of as a best response to the deployment of θ according to some unspecified utility function, then performative optima can be thought of as Stackelberg equilibria. There have been many works on learning dynamics in Stackelberg games in recent years (Balcan et al., 2015; Jin et al., 2020; Fiez et al., 2020; Fiez and Ratliff, 2020). Notably, Balcan et al. (2015) also study the benefit of a richer feedback model: they assume the agent s type is revealed after taking an action. When combined with a known agent-response model, this allows them to directly infer the loss of unexplored strategies. In contrast, performative feedback does not imply full-information feedback. One instance of performative prediction that has an explicit Stackelberg structure, meaning D(θ) is defined as a best response, is strategic classification (Hardt et al., 2016). Several works have studied learning dynamics in strategic classification (Dong et al., 2018; Chen et al., 2020; Bechavod et al., 2021; Zrnic et al., 2021); notably, Dong et al. (2018) and Chen et al. (2020) provide solutions that minimize Stackelberg regret, of which performative regret is an analog in the performative prediction context. However, all of these works rely on strong structural assumptions, such as linearity of the predictor or convexity of the risk function, which significantly reduce the amount of necessary exploration compared to the mild Lipschitzness conditions we impose in our work. Continuum-armed bandits. Particularly inspiring for our work is the literature on continuum-armed bandits (Agrawal, 1995; Kleinberg, 2004; Auer et al., 2007; Kleinberg et al., 2008; Podimata and Slivkins, 2021). As we will elaborate on in Section 2, performative prediction can be cast as a Lipschitz continuum-armed bandit problem. However, while this means that one can use an off-the-shelf Lipschitz bandit algorithm to minimize performative regret, this would generally be a conservative solution. After pulling an arm θ in performative prediction the learner observes samples from D(θ). As explained earlier, in combination with the structure of our objective, this feedback model is more powerful than classical bandit feedback, where a noisy version of the mean reward at θ is observed. Moreover, it is fundamentally different from partial-feedback and side-information models studied in the literature, e.g. (Mannor and Shamir, 2011; Kocák et al., 2014; Wu et al., 2015; Cohen et al., 2016). 1.3. Preliminaries Performative prediction, set up as an online learning problem, can be formalized as follows. At every step the learner chooses a model θ in the parameter space Θ RdΘ. We assume1 max{ θ : θ Θ} 1 for simplicity. The expected loss of model θ is given by PR(θ) = Ez D(θ)ℓ(z;θ). We assume that the objective function is bounded so that ℓ(z;θ) [0,1] for all z and θ. At every time step t, the learner chooses a model θt and observes a constant number m0 of i.i.d. samples, {z(i) t }i [m0], where z(i) t D(θt). The regret incurred by choosing θt at time step t is (θt) := PR(θt) PR(θPO), where θPO is the performative optimum. The constant m0 quantifies how many samples the learner can collect in a time window determined by how often they incur regret. For example, at the beginning of each week the learner might update the model, and thus at the end of each week they incur regret for the model they chose to deploy. In that case, m0 is the number of samples the learner collects per week. Note that a learner with larger m0 collects an empirical distribution that more accurately reflects D(θt) and thus naturally minimizes regret at a faster rate. To formally disentangle the effects of the parameter vector θ on the performative risk through the distribution map and the loss function, we use the notion of the decoupled performative risk (Perdomo et al., 2020): DPR(θ,θ ) := Ez D(θ)ℓ(z;θ ). 1Throughout we use to denote the ℓ2-norm for vectors and the operator norm for matrices. Regret Minimization with Performative Feedback This object captures the risk incurred by a model θ on the distribution D(θ). Note that PR(θ) = DPR(θ,θ) by definition. To measure the complexity of the distribution map we consider how much the distribution D(θ) can change with changes in θ, as formalized by ϵ-sensitivity. Assumption 1.2 (ϵ-sensitivity (Perdomo et al., 2020)). A distribution map D( ) is ϵ-sensitive if for any pair θ,θ Θ it holds that W(D(θ),D(θ )) ϵ θ θ , where W denotes the Wasserstein-1 distance. In the context of a traffic forecasting app, ϵ can be thought of as being proportional to the size of the user base of the app. When D(θ) arises from the aggregate behavior of strategic agents manipulating their features in response to a model θ, ϵ grows when features are more easily manipulable. 2. A Black-Box Bandits Approach Performative regret minimization can be set up as a continuum-armed bandits problem where an arm corresponds to a choice of model parameters θ. Performative feedback is sufficient to simulate noisy zeroth-order feedback about the reward function, as assumed in bandits. When we deploy θt, the samples from D(θt) enable us to compute an unbiased estimate c PR(θt) = 1 i=1 ℓ z(i) t ;θt of the risk PR(θt). Moreover, since we assume the loss function is bounded, the noise in the estimate c PR(θt) is subgaussian, as typically required in bandits. A standard condition that makes continuum-armed bandit problems tractable is a bound on how fast the reward can change when moving from one arm to a nearby arm. Formally, this regularity is ensured by assuming Lipschitzness of the reward function in our case, Lipschitzness of the performative risk. The dependence of PR(θ) on θ is twofold, as seen in Equation (1). Thus, the most natural way to ensure that PR(θ) is Lipschitz is to ensure that each of these two dependencies is Lipschitz. This yields the following bound: Lemma 2.1 (Lipschitzness of PR). If the loss ℓ(z;θ) is Lz Lipschitz in z and Lθ-Lipschitz in θ and the distribution map is ϵ-sensitive, then the performative risk is (Lθ + ϵLz)- Lipschitz. The intuition behind Lemma 2.1 is that PR(θ) is guaranteed to be Lipschitz if DPR(θ,θ ) is Lipschitz in each argument individually. Lipschitzness in the second argument follows from requiring that the loss be Lipschitz in θ. Lipschitzness in the first argument follows from combining Lipschitzness of the loss in z and ϵ-sensitivity of the distribution map. 2.1. Adaptive Zooming Once we have established Lipschitzness of the performative risk, we can apply techniques from the Lipschitz bandits literature. Kleinberg et al. (2008) proposed a bandit algorithm that adaptively discretizes promising regions of the space of arms, using Lipschitzness of the reward function to bound the additional loss due to discretization. Their method, called the zooming algorithm, will serve as a baseline for our problem. The algorithm enjoys an instance-dependent regret that takes advantage of nice problem instances, while maintaining tight guarantees in the worst case. The rate depends on the zooming dimension, which is upper bounded in the worst case by the dimension of the full space dΘ. Proposition 2.2 (Zooming algorithm (Kleinberg et al., 2008)). Suppose m0 = o(log T ). Then, after T deployments, the zooming algorithm achieves a regret bound of Reg(T ) = O T d+1 d+2 log T ! 1 d+2 (Lθ + ϵLz) d d+2 where d denotes the (Lθ + ϵLz)-zooming dimension. The zooming dimension quantifies the niceness of a problem instance by measuring the size of a covering of near-optimal arms, instead of the entire parameter space. Roughly speaking, if the reward function is very flat in that there are many near-optimal points, then the zooming dimension is close to the dimension dΘ of the parameter space. However, if the reward has sufficient curvature, then the zooming dimension can be much smaller than dΘ. The zooming dimension is defined formally as follows: Definition 2.3 (α-zooming dimension). A performative prediction problem instance has α-zooming dimension equal to d if any minimal s-cover of any subset of {θ : (θ) 16αs} includes at most a constant multiple of (3/s)d elements from {θ : 16αr (θ) < 32αr}, for all 0 < r s 1. For well-behaved instances, the definition intuitively requires every minimal s-cover of {θ : 16αr (θ) < 32αr} to have size at most of order (3/s)d. Definition 2.3 slightly differs from the definition presented in (Kleinberg et al., 2008) and makes the dependence on the Lipschitz constant explicit; we use Definition 2.3 to later ease the comparison to our new algorithm. The differences between the two definitions are minor technicalities that we do not expect to alter the zooming dimension in a meaningful way, neither formally nor conceptually. See Appendix E for a discussion. Regret Minimization with Performative Feedback DPR( 1, ) DPR( 2, ) PR( ) confidence set Figure 1. Confidence bounds after deploying θ1 and θ2. (left) Confidence bounds via Lipschitzness, Equation (2). (right) Performative confidence bounds, Equation (3). The performative feedback model used for this illustration can be found in Appendix G. 3. Making Use of Performative Feedback In this section, we illustrate how we can take advantage of performative feedback beyond computing a point estimate of the deployed model s risk. For now, we ignore finite-sample considerations and assume access to the entire distribution D(θ) after deploying a model θ. We will address finitesample uncertainty when presenting our main algorithm in the next section. 3.1. Constructing Performative Confidence Bounds First, we demonstrate how performative feedback allows constructing tighter confidence bounds on the performative risk of unexplored models, compared to only relying on Lipschitzness of the risk function PR(θ). Suppose we deploy a set of models S Θ and for each θ S we observe D(θ). Then, under the regularity conditions of Lemma 2.1, we can bound the risk of any θ Θ as max θ S PR(θ) (Lθ + Lzϵ) θ θ PR(θ ) min θ S PR(θ) + (Lθ + Lzϵ) θ θ . (2) These confidence bounds only use D(θ) for the purpose of computing PR(θ) and rely on Lipschitzness to construct confidence sets around the risk of unexplored models. However, in light of the structure of our objective function (1), the bounds in Equation (2) do not make full use of performative feedback; in particular, access to D(θ) actually allows us to evaluate DPR(θ,θ ) for any θ . Importantly, this information can further reduce our uncertainty about PR(θ ), and we can bound: PR(θ ) = DPR(θ,θ ) + (DPR(θ ,θ ) DPR(θ,θ )) DPR(θ,θ ) + Lzϵ θ θ . Thus we can get tighter bounds on the performative risk at PR( ) PRmin confidence set discarded region Figure 2. Performative feedback allows discarding unexplored suboptimal models even in regions that have not been explored. A model θ is discarded if PRLB(θ) > PRmin. The loss function and feedback model are the same as in Figure 1. an unexplored parameter θ : max θ S DPR(θ,θ ) Lzϵ θ θ PR(θ ) min θ S DPR(θ,θ ) + Lzϵ θ θ . (3) We call the confidence bounds computed in (3) performative confidence bounds. In Figure 1, we visualize and contrast these confidence bounds with the confidence bounds obtained via Lipschitzness. We observe that by computing DPR we can significantly tighten the confidence regions. The tightness of the confidence bounds depends on the set S of deployed models. By choosing a cover of the parameter space, we can get an estimate of the performative risk that has low approximation error on the whole parameter space. Proposition 3.1. Let Sγ be a γ-cover of Θ and suppose we deploy all models θ Sγ. Then, using performative feedback we can compute an estimate c PR(θ) such that for any θ Θ it holds that |PR(θ) c PR(θ)| γLzϵ. Proposition 3.1 implies that after exploring the cover Sγ, we can find a model whose suboptimality is at most O(γLzϵ). To contextualize the bound in Proposition 3.1, consider an approach that uses the same cover Sγ but only relies on zeroth-order feedback, that is, {PR(θ) : θ Sγ}. Then, the only feasible estimate of PR over the whole space is c PR(θ) = PR(ΠSγ(θ)), where ΠSγ(θ) = argminθ Sγ θ θ is the projection onto the cover Sγ. This zeroth-order approach only guarantees an accuracy of |PR(θ) c PR(θ)| (Lzϵ + Lθ)γ, a strictly weaker approximation than the one in Proposition 3.1. 3.2. Sequential Elimination of Suboptimal Models Now we show how performative confidence bounds can guide exploration. Specifically, we show that every deployment informs the risk of unexplored models, which allows us to sequentially discard suboptimal regions of the space. To develop a formal procedure for discarding points, let Regret Minimization with Performative Feedback PRLB(θ) denote a lower confidence bound on PR(θ) and PRmin denote an upper confidence bound on PR(θPO) based on the information from the models deployed so far: PRLB(θ) = max θ already deployed(DPR(θ ,θ) Lzϵ θ θ ), PRmin = min θ Θ min θ already deployed (DPR(θ ,θ) + Lzϵ θ θ ). It is not difficult to see that the following lower bound on the suboptimality of model θ holds: Proposition 3.2. We have (θ) PRLB(θ) PRmin, θ. In particular, models θ with PRLB(θ) > PRmin cannot be optimal. We recall our toy example from Figure 1 and illustrate in Figure 2 the parameter configurations we can discard after the deployment of two models, θ1 and θ2. We can see that access to DPR allows us to discard a large portion of the parameter space, and, in contrast to the baseline black-box approach, it is possible to discard regions of the space that have not been explored. 4.Performative Confidence Bounds Algorithm We introduce our main algorithm that builds on the two insights from the previous section. We furthermore provide a rigorous, finite-sample analysis of its guarantees. 4.1. Algorithm Overview Our performative confidence bounds algorithm, formally stated in Algorithm 1, takes advantage of performative feedback by assessing the risk of unexplored models and thus guiding exploration. We give an overview of the main steps. Inspired by the successive elimination algorithm (Even-Dar et al., 2002), the algorithm keeps track of and refines an active set of models A Θ. Roughly speaking, active models are those that are estimated to have low risk and only they are admissible to deploy. To deal with finitesample uncertainty, the algorithm proceeds in phases which progressively refine the precision of the finite-sample risk estimates. More precisely, in phase p the algorithm chooses an error tolerance γp and deploys a model for np steps. In each step m0 samples induced by the deployed model are collected, and np is chosen so that the inferred estimates of DPR are γp-accurate. Formally, if θ is deployed in phase p, we collect an empirical distribution b D(θ) of npm0 samples so that |[ DPR(θ,θ ) DPR(θ,θ )| γp for all θ with high probability, where [ DPR(θ,θ ) := Ez b D(θ)ℓ(z;θ ). These estimates of DPR are used to construct performative confidence bounds and refine A. Algorithm 1 Performative Confidence Bounds Algorithm Require: time horizon T , number of samples m0, sensitivity ϵ, Lipschitz constant Lz, complexity bound C 1: Initialize A = Θ 2: for phase p = 0,1,... do 3: Set error tolerance γp = 2 p and net radius rp = γp Lzϵ 4: Let np = 2C + 3 p log T 2 . (γ2 pm0) 5: Initialize Sp Nrp(A), Pp 6: while Sp , do 7: Draw θnet Sp uniformly at random 8: Deploy θnet for np steps to form [ DPR(θnet, ) 9: Update Sp Sp \ θnet and Pp Pp θnet 10: Update estimate of upper bound on PR(θPO): PRmin = min θ Θ min θ Pp [ DPR(θ ,θ) + Lzϵ θ θ 11: Update lower bound for all models θ A: PRLB(θ) = max θ Pp [ DPR(θ ,θ) Lzϵ θ θ 12: A A \ {θ A : PRLB(θ) > PRmin + 2γp} 13: Sp Sp \ {θ Sp : Ballrp(θ) A = } 14: end while 15: end for Each phase begins by constructing a net of the current active set A. The points in the net are sequentially deployed in the phase, unless they are deemed to be suboptimal based on previous deployments in that phase and are in that case eliminated. During phase p, we denote by Pp the running set of deployed points and by Sp the running set of net points that have not been discarded. We initialize Sp to a minimal rp-net of the current set of active points A, denoted Nrp(A), where rp is proportional to γp. A net point θ gets eliminated from Sp if no point in Ballrp(θ) := {θ Θ : θ θ rp} is active. This means that we may deploy suboptimal points in the net if they help inform active points nearby. 4.2. Comparison with Adaptive Zooming Algorithm While we borrow the idea of an instance-dependent zooming dimension from Kleinberg et al. (2008), Algorithm 1 and its analysis are substantially different from prior work. In particular, Kleinberg et al. (2008) study an adaptive zooming algorithm which combines a UCB-based approach with an arm activation step. Adapting this method to our setting encounters several obstacles that we describe below. First, a naive application of the adaptive zooming algorithm proposed by Kleinberg et al. (2008) does not lead to sublinear regret in our setting, unless we assume Lipschitzness of PR. Their rule for activating new arms requires that the Regret Minimization with Performative Feedback reward of arms within a given radius in Euclidean distance of the pulled arm is similar. However, without Lipschitzness of PR, there is no radius that would ensure this property. Given the shortcomings of this exploration strategy, one might imagine that selecting a better distance between arms, e.g. one based on performative confidence bounds, would result in a better algorithm. A natural distance function would be d(θ,θ ) taken as (an empirical estimate of) PR(θ) DPR(θ,θ ) + Lzϵ θ θ . The challenge is that the analysis in Kleinberg et al. (2008) explicitly requires symmetry of the distance function, which d(θ,θ ) violates. Therefore, to single out the Lzϵ dependence, it is necessary to disentangle learning the structure of the distribution map from the elimination of arms based on reward, which is in stark contrast with UCB-style adaptive zooming algorithms. Algorithm 1 achieves this by relying on a novel adaptation of successive elimination. 4.3. Regret Bound Before we state the regret bound for Algorithm 1, let us comment on an important component in the analysis. Recall that throughout the algorithm we operate with finite-sample estimates of the decoupled performative risk to bound the risk of unexplored models. Specifically, for any deployed θ, we make use of [ DPR(θ,θ ) for all θ . Since we need these estimates to be valid simultaneously for all θ , we rely on uniform convergence. As such, the Rademacher complexity of the loss function class naturally enters the bound. Definition 4.1 (Rademacher complexity). Given a loss function ℓ(z;θ), we define C (ℓ) to be: C (ℓ) = sup θ Θ sup n N j=1 ϵjℓ(zθ j ;θ ) where ϵj Rademacher and zθ j D(θ), j [n], which are all independent of each other. Now we can state our regret guarantee for Algorithm 1. Theorem 4.2 (Main regret bound). Assume the loss ℓ(z;θ) is Lz-Lipschitz in z and let ϵ denote the sensitivity of the distribution map. Suppose that C is any value such that C (ℓ) C and m0 = o(B2 log T ,C), where Blog T ,C := p log T + C. Then, after T time steps, Algorithm 1 achieves a regret bound of Reg(T ) = O (Lzϵ)d B2 log T ,C m0 T Blog T ,C m0 where d is the (Lzϵ)-sequential zooming dimension. The sequential zooming dimension, formally defined in Definition 4.4, accounts for the sequential elimination of models within each phase. It will become clear in the next section that the sequential zooming dimension is upper bounded by the usual zooming dimension from Definition 2.3. In Appendix F, we provide an example where the sequential zooming is strictly smaller than the zooming dimension. Proposition 4.3. For all α > 0, the α-zooming dimension is at least as large as the α-sequential zooming dimension. The primary advantage of Theorem 4.2 over the Lipschitz bandit baseline can be seen by examining the first term in the regret bound. This term resembles the black-box regret bound from Section 2; however, the key difference is that that the bound of Theorem 4.2 depends on the complexity of the distribution map rather than that of the performative risk. In particular, the Lipschitz constant is Lzϵ and not Lθ +Lzϵ. The advantage is pronounced when ϵ 0, making the first term of the bound in Theorem 4.2 vanish so only the O( T ) term remains. On the other hand, the bound in Proposition 4.3 maintains an exponential dimension dependence. Taking the limit as ϵ 0 also reveals why the second term in the bound emerges. Even if the distribution map is constant, there is regret arising from finite-sample error. This is a key conceptual difference in the meaning of Lipschitzness of the distribution map versus that of the performative risk: Lθ + Lzϵ being 0 implies that PR is flat and thus all models are optimal, while performative regret minimization is nontrivial even if Lzϵ = 0. Unlike the first term, the second term due to finite samples is dimension-independent apart from any dependence implicit in the Rademacher complexity. We note that the presence of the Rademacher complexity term C (ℓ) makes a direct comparison of the bound in Theorem 4.2 and the bound in Proposition 4.3 subtle. When the Rademacher complexity is very high, the regret bound in Theorem 4.2 may be worse. Nonetheless, for many natural function classes, the Rademacher complexity is polynomial in the dimension; in these cases, Theorem 4.2 can substantially outperform the regret bound in Proposition 4.3. Another key feature of the regret bound in Theorem 4.2 worth highlighting is the zooming dimension. Definition 2.3 allows us to directly compare the dimension in Theorem 4.2 with the dimension in Proposition 4.3: the (Lzϵ)-zooming dimension of Algorithm 1 is no larger than, and most likely smaller than, the (Lθ+Lzϵ)-zooming dimension in the blackbox approach. Moreover, the sequential variant of zooming dimension in Theorem 4.2 can further reduce the dimension. Finally, the main assumption underpinning the bound in Theorem 4.2 is that DPR is (Lzϵ)-Lipschitz in its first argument. Assumption 1.2 coupled with Lipschitzness of the loss in the data achieves this. However, this property can hold with different regularity assumptions on the distribution map and loss function; e.g., if the loss is bounded and the distribution map is Lipschitz in total variation distance. Regret Minimization with Performative Feedback net, 2 net, 1 net, 2 net, 3 net, 1 net, 2 net, 3 PR( ) PRmin discarded region active region confidence set Figure 3. Sequential deployment of models allows Algorithm 1 to eliminate points from Sp, reducing the number of deployments. The deployment of θnet,1 and θnet,2 allows one to eliminate θnet,3. 4.4. Sequential Zooming Dimension The zooming dimension of Definition 2.3 does not take into account that, using performative feedback, our algorithm can eliminate unexplored models within a phase. We illustrate the benefits of this sequential exploration strategy in Figure 3, where the deployment of two models is sufficient to eliminate the remaining model in the cover. This motivates a sequential definition of zooming dimension that captures the benefits of sequential exploration. To set up this definition, we need to introduce some notation. For a set of points S, enumeration π : S {1,...,|S|} that specifies an ordering on S, and number k {1,...,|S|}, let PRLB(θ;k) := max θ S:π(θ )