# model_transferability_with_responsive_decision_subjects__cc2da3ed.pdf Model Transferability with Responsive Decision Subjects Yatong Chen 1 Zeyu Tang 2 Kun Zhang 2 3 Yang Liu 1 4 Given an algorithmic predictor that is accurate on some source population consisting of strategic human decision subjects, will it remain accurate if the population respond to it? In our setting, an agent or a user corresponds to a sample (X, Y ) drawn from a distribution D and will face a model h and its classification result h(X). Agents can modify X to adapt to h, which will incur a distribution shift on (X, Y ). Our formulation is motivated by applications where the deployed machine learning models are subjected to human agents, and will ultimately face responsive and interactive data distributions. We formalize the discussions of the transferability of a model by studying how the performance of the model trained on the available source distribution (data) would translate to the performance on its induced domain. We provide both upper bounds for the performance gap due to the induced domain shift, as well as lower bounds for the trade-offs that a classifier has to suffer on either the source training distribution or the induced target distribution. We provide further instantiated analysis for two popular domain adaptation settings, including covariate shift and target shift. 1. Introduction Decision-makers are increasingly required to be transparent on their decision-making rules to offer the right to explanation (Goodman & Flaxman, 2017; Selbst & Powles, 2018; Ustun et al., 2019). Being transparent also invites potential adaptations from the population, leading to potential shifts. We are motivated by settings where the deployed machine 1Department of Computer Science and Engineering, University of California, Santa Cruz, California, United States. 2Department of Philosophy, Carnegie Mellon University, Pennsylvania, United States. 3Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates. 4Byte Dance Research. Correspondence to: Yang Liu . Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright 2023 by the author(s). learning models interact with human agents, and will ultimately face data distributions that reflect human agents responses to the models. For instance, when a model is used to decide loan applications, candidates may adapt their features based on the model specification in order to maximize their chances of approval; thus the loan decision classifier observes a new shifted distribution caused by its own deployment (e.g., see Figure 1 for a demonstration). Similar observations can be articulated for application in the insurance sector, e.g., insurance companies may develop policy such that customers behaviors might adapt to lower premium (Haghtalab et al., 2020), the education sector, e.g., teachers may want to design courses in a way that students are less incentivized to cheat (Kleinberg & Raghavan, 2020), and so on. FEATURE WEIGHT ORIGINAL VALUE ADAPTED VALUE Income 2 $ 6,000 $ 6,000 Education Level 3 College College Debt -10 $40,000 $20,000 Savings 5 $20,000 $0 Figure 1. An example of an agent who originally has both savings and debt, observes that the classifier penalizes debt (weight -10) more than it rewards savings (weight +5), and concludes that their most efficient adaptation is to use their savings to pay down debt. In this paper, we provide a general framework for quantifying the transferability of a decision rule when facing responsive decision subjects. What we would like to achieve is some characterizations of the performance guarantee of a classifier that is, given a model primarily trained on the source distribution DS, how good or bad will it perform on the distribution it induces D(h), which depends on the model h itself. A key concept in our setting is the induced risk, defined as the error a model incurs on the distribution induced by itself: Induced Risk : Err D(h)(h) := PD(h)(h(X) = Y ) (1) Most relevant to the above formulation are the works of literature on strategic classification (Hardt et al., 2016a), and performative prediction (Perdomo et al., 2020). In strategic classification, agents are modeled as rational utility maximizers, and under a specific agent s response model, game Model Transferability with Responsive Decision Subjects theoretical solutions were proposed to model the interactions between the agents and the decision-maker. In performative prediction, a similar notion of risk called the performative prediction risk is introduced to measure a given model s performance on the distribution itself induces. Different from ours, one of their main focus is to find the optimal classifier that achieves minimum induced risk after a sequence of model deployments and observing the corresponding response datasets, which might be computationally expensive. In particular, our results are motivated by the following challenges in more general scenarios: Modeling assumptions being restrictive In many practical situations, it is often hard to accurately characterize the agents utilities. Furthermore, agents might not be fully rational when they respond. All the uncertainties can lead to a far more complicated distribution change in (X, Y ), as compared to often-made assumptions that agents only change X but not Y (Hardt et al., 2016a; Chen et al., 2020a; Dong et al., 2018b). Lack of access to response data During training, machine learning practitioners may only have access to data from the source distribution, and even when they can anticipate changes in the population due to human agents responses, they cannot observe the newly shifted distribution until the model is actually deployed. Retraining being costly Even when samples from the induced data distribution are available, retraining the model from scratch may be impractical due to computational constraints, and will result in another round of agents response at its deployment. The above observations motivate us to focus on understanding the transferability of a model before diving into finding the optimal solutions that achieve the minimum induced risk the latter problem often requires more specific knowledge on the mapping between the model and its induced distribution, which might not be available during the training process. Another related research problem is to find models that will perform well on both the source and the induced distribution. This question might be solved using techniques from domain generalization (Zhou et al., 2021; Sheth et al., 2022). We add detailed discussions on how our work relates to and differs from these fields in related works (Section 1.2), as well as on how to use existing techniques to solve these two questions in Appendix A.2. and Appendix F. We leave the full discussions of these topics to future work. 1.1. Our Contributions In this paper, we aim to provide answers to the following fundamental questions: Source risk Induced risk For a given model h, how different is Err D(h)(h), the error on the distribution induced by h, from Err DS(h) := PDS(h(X) = Y ), the error on the source? Induced risk Minimum induced risk How much higher is Err D(h)(h), the error on the induced distribution, than minh Err D(h )(h ), the minimum achievable induced error? Induced risk of source optimal Minimum induced risk Of particular interest, and as a special case of the above, how does Err D(h S)(h S), the induced error of the optimal model trained on the source distribution h S := arg minh Err DS(h), compare to h T := arg minh Err D(h)(h)? Lower bound for learning tradeoffs What is the minimum error a model must incur on either the source distribution Err DS(h) or its induced distribution Err D(h)(h)? For the first three questions, we prove upper bounds on the additional error incurred when a model trained on a source distribution is transferred over to its induced domain. We also provide lower bounds for the trade-offs a classifier has to suffer on either the source training distribution or the induced target distribution. We then show how to specialize our results to two popular domain adaptation settings: covariate shift (Shimodaira, 2000; Zadrozny, 2004; Sugiyama et al., 2007; 2008; Zhang et al., 2013) and target shift (Lipton et al., 2018; Guo et al., 2020; Zhang et al., 2013). 1.2. Related Work Our work most closely relates to the fields of strategic classification, domain adaptation, and performative prediction. In particular, our work considers a setting similar to the studies of strategic classification (Hardt et al., 2016a; Chen et al., 2020a; Dong et al., 2018a; Chen et al., 2020b; Miller et al., 2020), which primarily focus on developing robust classifiers in the presence of strategic agents, rather than characterizing the transferability of a given model s performance on the distribution itself induces. Our work also builds on efforts in domain adaptation (Jiang, 2008; Ben David et al., 2010; Sugiyama et al., 2008; Zhang et al., 2019; Kang et al., 2019; Zhang et al., 2020). The major difference between our setting and those from previous works is that the changes in distribution are not passively provided by the environment, but rather an active consequence of model deployment. We reference specific prior work in these two domains in Appendix A.2, and here provide more detailed discussions on the existing work in performative prediction. Performative Prediction Performative prediction is a new type of supervised learning problem in which the underlying data distribution shifts in response to the deployed model (Perdomo et al., 2020; Mendler-D unner et al., 2020; Brown Model Transferability with Responsive Decision Subjects et al., 2020; Drusvyatskiy & Xiao, 2020; Izzo et al., 2021; Li & Wai, 2022; Maheshwari et al., 2021). In particular, Perdomo et al. (2020) first propose the notion of the performative risk defined as PR(θ) := Ez D(θ)[ℓ(θ; z)], where θ is the model parameter, and D(θ) is the induced distribution as a result of the deployment of θ. Similar to our definition of induced risk, performative risk also measures a given model s performance on the distribution itself induces. The major difference between our work and performative prediction is that we focus on different aspects of the induced domain adaptation problem. One of the primary focuses of performative prediction is to find the optimal model θOPT which achieves the minimum performative prediction risk, or performative stable model θST, which is optimal under its own induced distribution. In particular, one way to find a performative stable model θST is to perform repeated retraining (Perdomo et al., 2020). In order to get meaningful theoretical guarantees on any proposed algorithms, works in this field generally require particular assumptions on the mapping between the model parameter and its induced distribution (e.g., the smoothness of the mapping), or requires multiple rounds of deployments and observing the corresponding induced distributions, which can be costly in practice (Jagadeesan et al., 2022; Mendler-D unner et al., 2020). On the contrary, our work s primary focus is to study the transferability of a particular model trained primarily on the source distribution and provide theoretical bounds on its performance on its induced distribution, which is useful for estimating the effect of a given classifier when repeated retraining is unavailable. As a result, our work does not assume the knowledge of the supervision/label information on the transferred domain. Also related are the recently developed lines of work on the multiplayer version of the performative prediction problem (Piliouras & Yu, 2022; Narang et al., 2022), and the economic aspects of performative prediction (Hardt et al., 2022; Mendler-D unner et al., 2022). The details for reproducing our experimental results can be found at https://github.com/UCSC-REAL/Model_ Transferability. 2. Notation and Formulation All proofs of our results can be found in the Appendix. Suppose we are given a parametric model h H primarily trained on the training data set S := {xi, yi}N i=1, which is drawn from a source distribution DS, where xi Rd and yi { 1, +1}. However, h will then be deployed in a setting where the samples come from a test or target distribution DT that can differ substantially from DS. Therefore, instead of finding a classifier that minimizes the prediction error on the source distribution Err DS(h) := PDS(h(X) = Y ), ideally the decision maker would like to find h that minimizes Err DT (h) := PDT (h(X) = Y ). This is often re- ferred to as the domain adaptation problem, where typically, the transition from DS to DT is assumed to be independent of the model h being deployed. We consider a setting in which the distribution shift depends on h, or is thought of as being induced by h. We will use D(h) to denote the induced domain by h: DS encounters model h D(h) Strictly speaking, the induced distribution is a function of both DS and h and should be better denoted by DS(h). To ease the notation, we will stick with D(h), but we shall keep in mind its dependency of DS. For now, we do not specify the dependency of D(h) as a function of D and h, but later in Section 4 and 5 we will further instantiate D(h) under specific domain adaptation settings. The challenge in the above setting is that when training h, the learner needs to carry the thoughts that D(h) should be the distribution it will be evaluated on and that the training cares about. Formally, we define the induced risk of a classifier h as the 0-1 error on the distribution h induces:1 Induced risk : Err D(h)(h) := PD(h)(h(X) = Y ). (2) Denote by h T := arg minh H Err D(h)(h) the classifier with minimum induced risk. More generally, when the loss may not be the 0-1 loss, we define the induced ℓ-risk as Induced ℓ-risk : Errℓ,D(h)(h) := Ez D(h)[ℓ(h; z)] The induced risks will be the primary quantities we are interested in quantifying. The following additional notation will also help present our theoretical results in the following few sections: Distributions of Y on a distribution D: DY := PD(Y = y), and in particular DY (h) := PD(h)(Y = y), DY |S := PDS(Y = y). Distribution of h on a distribution D: Dh := PD(h(X) = y), and in particular Dh(h) := PD(h)(h(X) = y), Dh|S := PDS(h(X) = y). Marginal distribution of X for a distribution D: DX := PD(X = x), and in particular DX(h) := PD(h)(X = x), DX|S := PDS(X = x).2 Total variation distance (Ali & Silvey, 1966): d TV(D, D ) := sup O|PD(O) PD (O)|. 2.1. Example Induced Domain Adaptation Settings We provide two example models to demonstrate the use cases of the distribution shift models described in our paper. We provide more detailed descriptions of both settings 1The := defines the RHS as the probability measure function for the LHS. 2For continuous X, the probability measure shall be read as the density function. Model Transferability with Responsive Decision Subjects and instantiate our bounds in Section 4.3 and Section 5.3, respectively. Strategic Response As mentioned before, one example of induced distribution shift is when human agents perform strategic response to a decision rule. In particular, it is natural to assume that the mapping between feature vector X and the qualification Y before and after the human agents best response satisfies covariate shift: the feature distribution P(X) will change, but P(Y |X), the mapping between Y and X, remain unchanged. Notice that this is different from the assumption made in the classic strategic classification setting (Hardt et al., 2016a), where any adaptations are considered malicious, which means any changes in the feature vector X do not change the underlying true qualification Y . In this example, we assume that changes in feature X could potentially lead to changes in the true qualification Y and that the mapping between Y and X remains the same before and after the adaptation. This is a common assumption made in a recent line of work on incentivizing improvement behaviors from human agents (see, e.g., Chen et al., 2020b; Shavit et al., 2020). We use Figure 2 (top) as a demonstration of how distribution might shift for the strategic response setting. In Section 4.3, we will use the strategic classification setup to verify our obtained results. X2 X2 X3 X3 Y h(X ) h(X ) X 2 X 2 X 3 X 3 X1 X1 X3 X3 X2 X2 h(X ) h(X ) X 1 X 1 X 3 X 3 Figure 2. Example causal graph annotated to demonstrate covariate shift (the top panel) and target shift (the bottom panel) as a result of the deployment of h. Grey nodes indicate observable variables and transparent nodes are not observed at the training stage. Red arrow emphasizes h induces changes in certain variables. Replicator Dynamics Replicator dynamics is a commonly used model to study the evolution of an adopted strategy in evolutionary game theory (Tuyls et al., 2006; Friedman & Sinervo, 2016; Taylor & Jonker, 1978; Raab & Liu, 2021). The core notion of it is the growth or decline of the population of each strategy depends on its fitness . Consider the label Y = { 1, +1} as the strategy, and the following behavioral response model to capture the induced target shift: PD(h)(Y = +1) PDS(Y = +1) = Fitness(Y = +1) EDS[Fitness(Y )] The intuition behind the above equation is that the change of the Y = +1 population depends on how predicting Y = +1 fits a certain utility function. For instance, the fitness can take the form of the prediction accuracy of h for class +1, namely Fitness(Y = +1) := PDS(h(X) = +1|Y = +1). Intuitively speaking, a higher fitness describes more success of agents who adopted a certain strategy (Y = 1 or Y = +1). Therefore, agents will imitate or replicate their successful peers by adopting the same strategy, resulting in an increase in the population (PD(h)(Y )). With the assumption that P(X|Y ) stays unchanged, this instantiates one example of a specific induced target shift. We will provide detailed conditions for target shift in Section 5. We also use Figure 2 (bottom) as a demonstration of how distribution might shift for the replicator dynamic setting. In Section 5.3, we will use a detailed replicator dynamics model to further instantiate our results. 3. General Bounds In this section, we first provide upper and lower bounds for any induced domain without specifying the particular type of distribution shift. In particular, we first provide upper bounds for the transfer error of any classifier h (that is, the difference between Err D(h)(h) and Err DS(h)), as well as between Err D(h)(h) and the minimum induced risk Err D(h T )(h T ). We then provide lower bounds for max{Err DS(h), Err D(h)(h)}, that is, the minimum error a model h must incur on either the source distribution DS or the induced distribution D(h). 3.1. Upper Bound We first investigate the upper bounds for the transfer errors. We begin by showing generic bounds and further instantiate the bound for specific domain adaptation settings in Section 4 and 5. We begin by answering the following question: How does a model h trained on its training data set fare on the induced distribution D(h)? To that end, we define the minimum and h-dependent combined error of any two distributions D and D as: Model Transferability with Responsive Decision Subjects λD D := min h H Err D (h ) + Err D(h ) ΛD D (h) := max h H Err D (h) + Err D(h) and their corresponding H-divergence as d H H(D, D ) = 2 suph,h H |PD(h(X) = h (X)) PD (h(X) = h (X))| . The H-divergence is a celebrated measure proposed in the domain adaptation literature (Ben-David et al., 2010) which will be useful for bounding the difference in errors of any two classifiers. Following the classical arguments from Ben-David et al. (2010), we can easily prove the following: Theorem 3.1 (Source risk Induced risk). The difference between Err D(h)(h) and Err DS(h) is upper bounded by: Err D(h)(h) Err DS(h) + λDS D(h) + 1 2d H H(DS, D(h)). The transferability of a model h between Err D(h)(h) and Err DS(h) looks precisely the same as in the classical domain adaptation setting (Ben-David et al., 2010). An arguably more interesting quantity in our setting to understand is the difference between the induced error of any given model h and the error induced by the optimal model h T : Err D(h)(h) Err D(h T )(h T ). We get the following bound, which differs from the one in Theorem 3.1: Theorem 3.2 (Induced risk Minimum induced risk). The difference between Err D(h)(h) and Err D(h T )(h T ) is upper bounded by: Err D(h)(h) Err D(h T )(h T ) λD(h) D(h T )+ΛD(h) D(h T )(h) 2 d H H(D(h T ), D(h)). The above theorem informs us that the induced transfer error is generally bounded by the average achievable error on both distributions D(h) and D(h T ), as well as the H divergence between the two distributions. The major benefit of the results in Theorem 3.2 is that it provides the decision maker a way to estimate the minimum induced risk Err D(h T )(h T ) even when she only has access to the induced risk of some available classifier h, as long as she can characterize the statistical difference between the two induced distribution. The latter, however, might not seem to be a trivial task itself. Later in Section 3.3, we briefly discuss how our bounds can still be useful even when we do not have the exact characterizations of this quantity. 3.2. Lower Bound Now we provide a lower bound on the induced transfer error. We particularly want to show that at least one of the two errors Err DS(h), and Err D(h)(h), must be lower-bounded by a certain quantity. Theorem 3.3 (Lower bound for learning tradeoffs ). Any model h must incur the following error on either the source or induced distribution: max{Err DS(h), Err D(h)(h)} d TV(DY |S,DY (h)) d TV(Dh|S,Dh(h)) The proof leverages the triangle inequality of d TV. This bound is dependent on h; however, by the data processing inequality of d TV (and f-divergence functions in general) (Liese & Vajda, 2006), we have d TV(Dh|S, Dh(h)) d TV(DX|S, DX(h)). Applying this to Theorem 3.3 yields: Corollary 3.4. For any model h, max{Err DS(h), Err D(h)(h)} d TV(DY |S, DY (h)) d TV(DX|S, DX(h)) The benefit of Corollary 3.4 is that the bound does not contain any quantities that are functions of the induced distribution; as a result, for any classifier h, we can estimate the learning tradeoffs between its source risk and its induced risk using values that are computable without actually deploying the classifier at the first place. 3.3. How to Use Our Bounds The upper and lower bounds we derived in the previous sections (Theorem 3.2 and Theorem 3.3) depend on the following two quantities either explicitly or implicitly: (1) the distribution D(h) induced by the deployment of the model h in question, and (2) the optimal target classifier h T as well as the distribution D(h T ) it induces. The bounds may therefore seem to be of only theoretical interest since in reality we generally cannot compute D(h) without actual deployment, let alone compute h T . Thus in general it is unclear how to compute the value of these bounds. Nevertheless, our bounds can still be useful and informative in the following ways: General modeling framework with flexible hypothetical shifting models The bounds can be evaluated if the decision maker has a particular shift model in mind, which specifies how the population would adapt to a model. A common special case is when the decision maker posits an individual-level agent response model (e.g. the strategic agent (Hardt et al., 2016a) - we demonstrate how to evaluate in Section 4.3). In these cases, the H-divergence can be consistently estimated from finite samples of the population (Wang et al., 2005), allowing the decision maker to estimate the performance gap of a given h without deploying it. The general bounds provided can thus be viewed as a framework by which specialized, computationally tractable bounds can be derived. Estimate the optimal target classifier h T from a set of imperfect models Secondly, when the decision maker Model Transferability with Responsive Decision Subjects has access to a set of imperfect models h1, h2 ht HT that will predict a range of possible shifted distribution D( h1), D( ht) DT and a range of possibly optimal target distribution h T HT , the bounds on h T can be further instantiated by calculating the worst case in this predicted set :3 Err D(h)(h) Err D(h T )(h T ) max D DT ,h HT Upper Bound(D , h ), max{Err DS(h), Err D(h T )(h T )} min D DT ,h HT Lower Bound(D , h ). We provide discussions on the tightness of our bounds in Appendix H. 4. Covariate Shift In this section, we focus on a particular distribution shift model known as covariate shift, in which the distribution of features changes, but the distribution of labels conditioned on features remains the same: PD(h)(Y = y|X = x) = PDS(Y = y|X = x) (3) PD(h)(X = x) = PDS(X = x) (4) Thus with covariate shift, we have PD(h)(X = x, Y = y) =PD(h)(Y = y|X = x) PD(h)(X = x) =PDS(Y = y|X = x) PD(h)(X = x) Let ωx(h) := PD(h)(X=x) PDS (X=x) be the importance weight at x, which characterizes the amount of adaptation induced by h at instance x. Then for any loss function ℓwe have: Proposition 4.1 (Expected Loss on D(h) Under Covariate Shift). ED(h)[ℓ(h; X, Y )] = EDS[ωx(h) ℓ(h; x, y)]. The above derivation is a classic trick and offers the basis for performing importance reweighting when learning under covariate shift (Sugiyama et al., 2008). The particular form informs us that ωx(h) controls the generation of D(h) and encodes its dependency on both DS and h, and is critical for deriving our results below. 4.1. Upper Bound We now derive an upper bound for transferability under covariate shift. We will particularly focus on the optimal model trained on the source data DS, which we denote as h S := arg minh H Err S(h). Recall that the classifier with minimum induced risk is denoted as h T := 3Upper Bound and Lower Bound are the RHS expressions in Theorem 3.2 and Theorem 3.3. arg minh H Err D(h)(h). We can upper bound the difference between h S and h T as follows: Theorem 4.2 (Suboptimality of h S). Let X be distributed according to DS. We have: Err D(h S)(h S) Err D(h T )(h T ) Err DS(h T ) q Var(ωX(h S)) + q Var(ωX(h T )) . This result can be interpreted as follows: h T incurs an irreducible amount of error on the source data set, represented by p Err DS(h T ). Moreover, the difference in induced risks between h S and h T is at its maximum when the two classifiers induce adaptations in opposite directions; this is represented by the sum of the standard deviations of their importance weights, p Var(ωX(h S)) + p Var(ωX(h T )). 4.2. Lower Bound Recall that in Theorem 3.3, for the general setting, it is unclear whether the lower bound is strictly positive or not. In this section, we provide further understanding for when the lower bound d TV(DY |S,DY (h)) d TV(Dh|S,Dh(h)) 2 is indeed positive under covariate shift. Under several assumptions, our previously provided lower bound in Theorem 3.3 is strictly positive with covariate shift. Assumption 4.3. |EX X+(h),Y =+1[1 ωX(h)]| |EX X (h),Y =+1[1 ωX(h)]| . where X+(h) = {x : ωx(h) 1} and X (h) = {x : ωx(h) < 1}. This assumption states that increased ωx(h) value points are more likely to have positive labels. Assumption 4.4. |EX X+(h),h(X)=+1[1 ωX(h)]| |EX X (h),h(X)=+1[1 ωX(h)]|. This assumption states that increased ωx(h) value points are more likely to be classified as positive. Assumption 4.5. Cov PDS(Y = +1|X = x) PDS(h(x) = +1|X = x), ωx(h) > 0. This assumption is stating that for a classifier h, within all h(X) = +1 or h(X) = 1, a higher PD(Y = +1|X = x) associates with a higher ωx(h). Theorem 4.6. Under Assumption 4.3 - Assumption 4.5, the following lower bound is strictly positive under covariate shift: max{Err DS(h), Err D(h)(h)} d TV(DY |S, DY (h)) d TV(Dh|S, Dh(h)) 4.3. Covariate Shift via Strategic Response As introduced in Section 2.1, we consider a setting caused by strategic response in which agents are classified by and Model Transferability with Responsive Decision Subjects adapt to a binary threshold classifier. In particular, each agent is associated with a d dimensional continuous feature x Rd and a binary true qualification y(x) { 1, +1}, where y(x) is a function of the feature vector x. Consistent with the literature in strategic classification (Hardt et al., 2016a), a simple case where after seeing the threshold binary decision rule h(x) = 2 1[x τh] 1, the agents will best response to it by maximizing the following utility function: u(x, x ) = h(x ) h(x) c(x, x ), where c(x, x ) is the cost function for decision subjects to modify their feature from x to x . We assume all agents are rational utility maximizers: they will only attempt to change their features when the benefit of manipulation is greater than the cost (i.e. when c(x, x ) 2) and the agent will not change their feature if they are already accepted (i.e. h(x) = +1). For a given threshold τh and manipulation budget B, the theoretical best response of an agent with original feature x is: (x) = arg max x u(x, x ) s.t. c(x, x ) B. To make the problem tractable and meaningful, we further specify the following setups: Setup 1. (Initial Feature) Agents initial features are uniformly distributed between [0, 1] R1. Setup 2. (Agent s Cost Function) The cost of changing from x to x is proportional to the distance between them: c(x, x ) = x x . Setup 2 implies that only agents whose features are in between [τh B, τh) will attempt to change their feature. We also assume that feature updates are probabilistic, such that agents with features closer to the decision boundary τh have a greater chance of updating their feature and each updated feature x is sampled from a uniform distribution depending on τh, B, and x (see Setup 3 & 4): Setup 3. (Agent s Success Manipulation Probability) For agents who attempt to update their features, the probability of a successful feature update is P(X = X) = 1 |x τh| Intuitively this setup means that the closer the agent s original feature x is to the decision boundary τh, the more likely they can successfully change their feature to cross the decision boundary. Setup 4 (Adapted Feature s Distribution). An agent s updated feature x , given original x, manipulation budget B, and classification boundary τh, is sampled as X Unif(τh, τh + |B x|). Setup 4 aims to capture the fact that even though agent targets to change their feature to the decision boundary τh (i.e. the least cost action to get a favorable prediction outcome), they might end up reaching a feature that is beyond the decision boundary. With the above setups, we can specify the bound in Theorem 4.2 for the strategic response setting as follows: Proposition 4.7. For our assumed setting of strategic response described above, Theorem 4.2 implies Err D(h S)(h S) Err D(h T )(h T ) q 3 Err DS(h T ). We can see that the upper bound for strategic response depends on the manipulation budget B, and the error the ideal classifier made on the source distribution Err DS(h T ). This aligns with our intuition that the smaller the manipulation budget is, the fewer agents will change their features, thus leading to a tighter upper bound on the difference between Errh S(h S) and Errh T (h T ). This expression also allows us to provide bounds even without the knowledge of the mapping between D(h) and h, since we can directly compute Err DS(h T ) from the source distribution and an estimated optimal classifier h T . 5. Target Shift We consider another popular domain adaptation setting known as target shift, in which the distribution of labels changes, but the distribution of features conditioned on the label remains the same: PD(h)(X = x|Y = y) = PDS(X = x|Y = y) (5) PD(h)(Y = y) = PDS(Y = y) (6) For binary classification, let p(h) := PD(h)(Y = +1), and PD(h)(Y = 1) = 1 p(h). Notice that p(h) encodes the full adaptation information from DS to D(h), since the mapping between Y and X, P(X = x|Y = y), is known and remains unchanged during target shift. We have for any proper loss function ℓ: ED(h)[ℓ(h; X, Y )] =p(h) ED(h)[ℓ(h; X, Y )|Y = +1] + (1 p(h)) ED(h)[ℓ(h; X, Y )|Y = 1] =p(h) EDS[ℓ(h; X, Y )|Y = +1] + (1 p(h)) EDS[ℓ(h; X, Y )|Y = 1] We will adopt the following shorthands: Err+(h) := EDS[ℓ(h; X, Y )|Y = +1], Err (h) := EDS[ℓ(h; X, Y )|Y = 1]. Note that Err+(h), Err (h) are both defined on the conditional source distribution, which is invariant under the target shift assumption. 5.1. Upper Bound We first provide characterizations of the upper bound on the transferability of h S under target shift. Denote by D+ the Model Transferability with Responsive Decision Subjects positive label distribution on DS (PDS(X = x|Y = +1)) and D the negative label distribution on DS (PDS(X = x|Y = 1)). Let p := PDS(Y = +1). Theorem 5.1. For target shift, the difference between Err D(h S)(h S) and Err D(h T )(h T ) bounds as: Err D(h S)(h S) Err D(h T )(h T ) |ω(h S) ω(h T )| +(1 + p) (d TV(D+(h S), D+(h T )) + d TV(D (h S), D (h T )) . The above bound consists of two components. The first quantity captures the difference between the two induced distributions D(h S) and D(h T ). The second quantity characterizes the difference between the two classifiers h S, h T on the source distribution. 5.2. Lower Bound Now we discuss lower bounds. Denote by TPRS(h) and FPRS(h) the true positive and false positive rates of h on the source distribution DS. We prove the following: Theorem 5.2. For target shift, any model h must incur the following error on either DS or D(h): max{Err DS(h), Err D(h)(h)} |p p(h)| (1 |TPRS(h) FPRS(h)|) The proof extends the bound of Theorem 3.3 by further explicating each of d TV(DY |S, DY (h)), d TV(Dh|S, Dh(h)) under the assumption of target shift. Since |TPRS(h) FPRS(h)| < 1 unless we have a trivial classifier that has either TPRS(h) = 1, FPRS(h) = 0 or TPRS(h) = 0, FPRS(h) = 1, the lower bound is strictly positive. Taking a closer look, the lower bound is determined linearly by how much the label distribution shifts: p p(h). The difference is further determined by the performance of h on the source distribution through 1 |TPRS(h) FPRS(h)|. For instance, when TPRS(h) > FPRS(h), the quality becomes FNRS(h) + FPRS(h), that is the more error h makes, the larger the lower bound will be. 5.3. Target Shift via Replicator Dynamics We now further instantiate our theoretical bound for target shift (Theorem 5.1) using a particular replicator dynamics model previously used in (Raab & Liu, 2021). In particular, the fitness function is specified as the prediction accuracy of h for class y: Fitness(Y = y) := PDS(h(X) = y|Y = y) (7) Then we have E [Fitness(Y )] = 1 Err DS(h), and p(h) PDS (Y =+1) = Pr DS (h(X)=+1|Y =+1) 1 Err DS (h) . Plugging the result back into Theorem 5.1 we get the following bound for the above replicator dynamic setting: 0 1 2 3 4 5 K 0 1 2 3 4 5 K Figure 3. Results for synthetic experiments on real-world data. Diff := Err D(h S)(h S) Err D(h T )(h T ), Max := max{Err DS(h T ), Err D(h T )(h T )}, UB := upper bound specified in Theorem 4.2, and LB := lower bound specified in Theorem 4.6. For each time step K = k, we compute and deploy the source optimal classifier h S and update the credit score for each individual according to the received decision as the new reality for time step K = k + 1. Details of the data generation are deferred to Appendix D. Proposition 5.3. Under the replicator dynamics model described in Equation (7), |ω(h S) ω(h T )| bounds as: |ω(h S) ω(h T )| PDS(Y = +1) |Err DS(h S) Err DS(h T )| |TPRS(h S) TPRS(h T )| Err DS(h S) Err DS(h T ) . The above result shows that the difference between the induced risks Err D(h S)(h S) and Err D(h T )(h T ) only depends on the difference between the two classifiers performances on the source data DS. This offers the decision maker a great opportunity to evaluate the performance gap by using their corresponding evaluations on the source data only without observing their corresponding induced distributions. 6. Experiments We present synthetic experimental results on both simulated and real-world data sets. Synthetic experiments using simulated data We generate synthetic data sets from the structural equation models described on simple causal DAG in Figure 2 for covariate shift and target shift. To generate the induced distribution D(h), we posit a specific adaptation function : Rd H Rd, so that when an input x encounters classifier h H, its induced features are precisely x = (x, h). We provide details of the data generation Model Transferability with Responsive Decision Subjects processes and adaptation functions in Appendix D. We take our training data set {x1, . . . , xn} and learn a base logistic regression model h(x) = σ(w x).4 We then consider the hypothesis class H := {hτ | τ [0, 1]}, where hτ(x) := 2 1[σ(w x) > τ] 1. To compute h S, the model that performs best on the source distribution, we simply vary τ and take the hτ with the lowest prediction error. Then, we posit a specific adaptation function (x, hτ). Finally, to compute h T , we vary τ from 0 to 1 and find the classifier hτ that minimizes the prediction error on its induced data set { (x1, hτ), . . . , (xn, hτ)}. We report our results in Figure 4. For all four datasets, we do observe positive gaps Err D(h S)(h S) Err D(h T )(h T ), indicating the suboptimality of training on DS. The gaps are well bounded by the theoretical results. For the lower bound, the empirical observation and the theoretical bounds are roughly within the same magnitude except for one target shift dataset, indicating the effectiveness of our theoretical result. Regarding the upper bound, for target shift, the empirical observations are well within the same magnitude of the theoretical bounds while the results for the covariate shift are relatively loose. Figure 4. Results for synthetic experiments on simulated and realworld data. Diff := Err D(h S)(h S) Err D(h T )(h T ), Max := max{Err DS(h T ), Err D(h T )(h T )}, UB := upper bound specified in Theorem 4.2, and LB := lower bound specified in Theorem 4.6. Synthetic experiments using real-world data We also perform synthetic experiments using real-world data to demonstrate our bounds. In particular, we use the FICO credit score data set (Board of Governors of the Federal Reserve System (US), 2007) which contains more than 300k records of Trans Union credit scores of clients from different demographic groups. For our experiment on the preprocessed FICO data set (Hardt et al., 2016b), we convert the cumulative distribution function (CDF) of Trans Risk score among different groups into group-wise credit score densities, from which we generate a balanced sample to represent a population where groups have equal representations. We 4σ( ) is the logistic function and w R3 denotes the weights. demonstrate the application of our results in a series of resource allocations. Similar to the synthetic experiments on simulated data, we consider the hypothesis class of threshold classifiers and treat the classification outcome as the decision received by individuals. For each time step K = k, we compute h S, the statistical optimal classifier on the source distribution (i.e., the current reality for step K = k), and update the credit score for each individual according to the received decision as the new reality for time step K = k + 1. Details of the data generation are again deferred to Appendix D. We report our results in Figure 3. We do observe positive gaps Err D(h S)(h S) Err D(h T )(h T ), indicating the suboptimality of training on DS. The gaps are well bounded by the theoretical upper bound (UB). Our lower bounds (LB) do return meaningful positive gaps, demonstrating the trade-offs that a classifier has to suffer on either the source distribution or the induced target distribution. We also provide additional experimental results using synthetic datasets generated according to causal graphs defined in Figure 2. Due to page limits, we defer the detailed discussions of these results to Appendix D.2.2. 7. Conclusions and Future Directions Unawareness of the potential distribution shift might lead to unintended consequences when training a machine learning model. One goal of our paper is to raise awareness of this issue for the safe deployment of machine learning methods in high-stake scenarios. We also provide a general framework for characterizing the performance difference for a fixed-trained classifier when the decision subjects respond to it. Our contributions are mostly theoretical. A natural extension of our work is to collect real human experiment data to verify the usefulness and tightness of our bounds. Another potential future direction is to develop algorithms to find an optimal model that achieves minimum induced risk, which has been an exciting ongoing research problem in the field of performative prediction. Furthermore, using techniques from general domain adaptation to find robust classifiers that perform well in both the source and induced distribution is another promising direction. Ackowledgement Y. Chen is partially supported by the National Science Foundation (NSF) under grants IIS2143895 and IIS-2040800. The work is also supported in part by the NSF-Convergence Accelerator Track-D award #2134901, by the National Institutes of Health (NIH) under Contract R01HL159805, by grants from Apple Inc., KDDI Research, Quris AI, and IBT, and by generous gifts from Amazon, Microsoft Research, and Salesforce. Model Transferability with Responsive Decision Subjects Ali, S. M. and Silvey, S. D. A general class of coefficients of divergence of one distribution from another. Journal of the Royal Statistical Society: Series B (Methodological), 28(1):131 142, 1966. Azizzadenesheli, K., Liu, A., Yang, F., and Anandkumar, A. Regularized learning for domain adaptation under label shifts. ar Xiv preprint ar Xiv:1903.09734, 2019. Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., and Vaughan, J. A theory of learning from different domains. Machine Learning, 2010. Board of Governors of the Federal Reserve System (US). Report to the congress on credit scoring and its effects on the availability and affordability of credit. Board of Governors of the Federal Reserve System, 2007. Brown, G., Hod, S., and Kalemaj, I. Performative prediction in a stateful world, 2020. Chakraborty, A., Alam, M., Dey, V., Chattopadhyay, A., and Mukhopadhyay, D. Adversarial attacks and defences: A survey, 2018. Chen, Y., Liu, Y., and Podimata, C. Learning strategy-aware linear classifiers, 2020a. Chen, Y., Wang, J., and Liu, Y. Strategic recourse in linear classification. ar Xiv preprint ar Xiv:2011.00355, 2020b. Crammer, K., Kearns, M., and Wortman, J. Learning from multiple sources. Journal of Machine Learning Research, 9(8), 2008. Dong, J., Roth, A., Schutzman, Z., Waggoner, B., and Wu, Z. S. Strategic classification from revealed preferences. In Proceedings of the 2018 ACM Conference on Economics and Computation, EC 18, New York, NY, USA, 2018a. Association for Computing Machinery. Dong, J., Roth, A., Schutzman, Z., Waggoner, B., and Wu, Z. S. Strategic classification from revealed preferences. In Proceedings of the 2018 ACM Conference on Economics and Computation, 2018b. Drusvyatskiy, D. and Xiao, L. Stochastic optimization with decision-dependent distributions. ar Xiv preprint ar Xiv:2011.11173, 2020. Flaxman, A. D., Kalai, A. T., and Mc Mahan, H. B. Online convex optimization in the bandit setting: Gradient descent without a gradient. In The Sixteenth Annual ACMSIAM Symposium on Discrete Algorithms, pp. 385 394, 2005. Friedman, D. and Sinervo, B. Evolutionary games in natural, social, and virtual worlds. Oxford University Press, 2016. Gong, M., Zhang, K., Liu, T., Tao, D., Glymour, C., and Sch olkopf, B. Domain adaptation with conditional transferable components. In International conference on machine learning, pp. 2839 2848. PMLR, 2016. Goodman, B. and Flaxman, S. European union regulations on algorithmic decision-making and a right to explanation . AI Magazine, 38(3):50 57, Oct 2017. Gretton, A., Smola, A., Huang, J., Schmittfull, M., Borgwardt, K., and Sch olkopf, B. Covariate shift by kernel mean matching. Dataset shift in machine learning, 3(4): 5, 2009. Guo, J., Gong, M., Liu, T., Zhang, K., and Tao, D. Ltf: A label transformation framework for correcting label shift. In International Conference on Machine Learning, pp. 3843 3853. PMLR, 2020. Haghtalab, N., Immorlica, N., Lucier, B., and Wang, J. Z. Maximizing welfare with incentive-aware evaluation mechanisms. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 2020. Hardt, M., Megiddo, N., Papadimitriou, C., and Wootters, M. Strategic classification. In Proceedings of the 2016 ACM Conference on Innovations in Theoretical Computer Science, New York, NY, USA, 2016a. Association for Computing Machinery. Hardt, M., Price, E., and Srebro, N. Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems, pp. 3315 3323, 2016b. Hardt, M., Jagadeesan, M., and Mendler-D unner, C. Performative power. ar Xiv preprint ar Xiv:2203.17232, 2022. Huang, L., Joseph, A. D., Nelson, B., Rubinstein, B. I., and Tygar, J. D. Adversarial machine learning. In ACM Workshop on Security and Artificial Intelligence, 2011. Izzo, Z., Ying, L., and Zou, J. How to learn when data reacts to your model: performative gradient descent. In International Conference on Machine Learning, pp. 4641 4650. PMLR, 2021. Jagadeesan, M., Zrnic, T., and Mendler-D unner, C. Regret minimization with performative feedback. In International Conference on Machine Learning, pp. 9760 9785. PMLR, 2022. Jiang, J. A literature survey on domain adaptation of statistical classifiers. URL: http://sifaka. cs. uiuc. edu/jiang4/domainadaptation/survey, 3:1 12, 2008. Model Transferability with Responsive Decision Subjects Kang, G., Jiang, L., Yang, Y., and Hauptmann, A. G. Contrastive adaptation network for unsupervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4893 4902, 2019. Kleinberg, J. and Raghavan, M. How do classifiers induce agents to invest effort strategically? ACM Transactions on Economics and Computation (TEAC), 8(4):1 23, 2020. Li, D., Yang, Y., Song, Y.-Z., and Hospedales, T. M. Learning to generalize: Meta-learning for domain generalization, 2017. Li, Q. and Wai, H.-T. State dependent performative prediction with stochastic approximation. In International Conference on Artificial Intelligence and Statistics, pp. 3164 3186. PMLR, 2022. Liese, F. and Vajda, I. On divergences and informations in statistics and information theory. IEEE Transactions on Information Theory, 52(10):4394 4412, 2006. Lipton, Z., Wang, Y.-X., and Smola, A. Detecting and correcting for label shift with black box predictors. In International conference on machine learning, pp. 3122 3130. PMLR, 2018. Liu, Y. and Liu, M. An online learning approach to improving the quality of crowd-sourcing. ACM SIGMETRICS Performance Evaluation Review, 2015. Long, M., Zhu, H., Wang, J., and Jordan, M. I. Unsupervised domain adaptation with residual transfer networks. ar Xiv preprint ar Xiv:1602.04433, 2016. Lowd, D. and Meek, C. Adversarial learning. In ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, 2005. Maheshwari, C., Chiu, C.-Y., Mazumdar, E., Sastry, S. S., and Ratliff, L. J. Zeroth-order methods for convexconcave minmax problems: Applications to decisiondependent risk minimization, 2021. Mendler-D unner, C., Perdomo, J., Zrnic, T., and Hardt, M. Stochastic optimization for performative prediction. In Advances in Neural Information Processing Systems, pp. 4929 4939. Curran Associates, Inc., 2020. Mendler-D unner, C., Ding, F., and Wang, Y. Anticipating performativity by predicting from predictions. In Advances in Neural Information Processing Systems, 2022. Miller, J., Milli, S., and Hardt, M. Strategic classification is causal modeling in disguise. In International Conference on Machine Learning, pp. 6917 6926. PMLR, 2020. Muandet, K., Balduzzi, D., and Sch olkopf, B. Domain generalization via invariant feature representation, 2013. Nado, Z., Padhy, S., Sculley, D., D Amour, A., Lakshminarayanan, B., and Snoek, J. Evaluating prediction-time batch normalization for robustness under covariate shift, 2021. Narang, A., Faulkner, E., Drusvyatskiy, D., Fazel, M., and Ratliff, L. J. Multiplayer performative prediction: Learning in decision-dependent games. ar Xiv preprint ar Xiv:2201.03398, 2022. Papernot, N., Mc Daniel, P., and Goodfellow, I. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples, 2016. Perdomo, J., Zrnic, T., Mendler-D unner, C., and Hardt, M. Performative prediction. In International Conference on Machine Learning, pp. 7599 7609. PMLR, 2020. Piliouras, G. and Yu, F.-Y. Multi-agent performative prediction: From global stability and optimality to chaos. ar Xiv preprint ar Xiv:2201.10483, 2022. Raab, R. and Liu, Y. Unintended selection: Persistent qualification rate disparities and interventions. Advances in Neural Information Processing Systems, 2021. Selbst, A. and Powles, J. meaningful information and the right to explanation. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency, Proceedings of Machine Learning Research. PMLR, 2018. Shavit, Y., Edelman, B., and Axelrod, B. Causal strategic linear regression. International Conference on Machine Learning, pp. 8676 8686, 2020. Sheth, P., Moraffah, R., Candan, K. S., Raglin, A., and Liu, H. Domain generalization a causal perspective. ar Xiv preprint ar Xiv:2209.15177, 2022. Shimodaira, H. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of statistical planning and inference, 90(2):227 244, 2000. Song, C., He, K., Wang, L., and Hopcroft, J. E. Improving the generalization of adversarial training with domain adaptation, 2019. Sugiyama, M., Krauledat, M., and M uller, K.-R. Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research, 8(5), 2007. Sugiyama, M., Suzuki, T., Nakajima, S., Kashima, H., von B unau, P., and Kawanabe, M. Direct importance estimation for covariate shift adaptation. Annals of the Institute of Statistical Mathematics, 60(4):699 746, 2008. Model Transferability with Responsive Decision Subjects Taylor, P. D. and Jonker, L. B. Evolutionary stable strategies and game dynamics. Mathematical Biosciences, 1978. Tuyls, K., Hoen, P. J., and Vanschoenwinkel, B. An evolutionary dynamical analysis of multi-agent learning in iterated games. Autonomous Agents and Multi-Agent Systems, 2006. Ustun, B., Spangher, A., and Liu, Y. Actionable recourse in linear classification. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 10 19, 2019. Varsavsky, T., Orbes-Arteaga, M., Sudre, C. H., Graham, M. S., Nachev, P., and Cardoso, M. J. Test-time unsupervised domain adaptation, 2020. Vorobeychik, Y. and Kantarcioglu, M. Adversarial Machine Learning. Morgan & Claypool Publishers, 2018. Wang, D., Shelhamer, E., Liu, S., Olshausen, B., and Darrell, T. Tent: Fully test-time adaptation by entropy minimization, 2021a. Wang, J., Lan, C., Liu, C., Ouyang, Y., Qin, T., Lu, W., Chen, Y., Zeng, W., and Yu, P. S. Generalizing to unseen domains: A survey on domain generalization, 2021b. Wang, Q., Kulkarni, S., and Verdu, S. Divergence estimation of continuous distributions based on data-dependent partitions. IEEE Transactions on Information Theory, 51 (9):3064 3074, 2005. doi: 10.1109/TIT.2005.853314. Xie, R., Wei, H., Feng, L., and An, B. Gearnet: Stepwise dual learning for weakly supervised domain adaptation. AAAI Conference on Artificial Intelligence, 2022. Zadrozny, B. Learning and evaluating classifiers under sample selection bias. In Proceedings of the twenty-first international conference on Machine learning, pp. 114, 2004. Zhang, K., Sch olkopf, B., Muandet, K., and Wang, Z. Domain adaptation under target and conditional shift. In International Conference on Machine Learning, pp. 819 827. PMLR, 2013. Zhang, K., Gong, M., Stojanov, P., Huang, B., LIU, Q., and Glymour, C. Domain adaptation as a problem of inference on graphical models. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 4965 4976. Curran Associates, Inc., 2020. Zhang, Y., Liu, T., Long, M., and Jordan, M. Bridging theory and algorithm for domain adaptation. In International Conference on Machine Learning, pp. 7404 7413. PMLR, 2019. Zhou, K., Liu, Z., Qiao, Y., Xiang, T., and Loy, C. C. Domain generalization in vision: A survey. ar Xiv preprint ar Xiv:2103.02503, 2021. Supplement to Model Transferability with Responsive Decision Subjects We arrange the appendix as follows: Appendix A.1 provides some real-life scenarios where transparent models are useful or required. Appendix A.2 provides additional related work on strategic classification and domain adaptation, as well as a detailed comparison of our setting and other sub-areas in domain adaptation. Appendix B.1 provides proof for Theorem 3.1. Appendix B.2 provides proof for Theorem 3.2. Appendix B.3 provides proof of Theorem 3.3. Appendix B.4 provides proof for Proposition 4.1. Appendix B.5 provides proof for Theorem 4.2. Appendix B.6 provides proof for Theorem 4.6. Appendix B.7 provides omitted assumptions and proof for Section 4.3. Appendix B.8 provides proof for Theorem 5.1. Appendix B.9 provides proof for Theorem 5.2. Appendix B.10 provides proof for Proposition 5.3. Appendix C provides additional lower bound and examples for the target shift setting. Appendix D provides missing experimental details. Appendix E discusses challenges in minimizing induced risk. Appendix F provides discussions on how to directly minimize the induced risk. Appendix G provides discussions on adding regularization to the objective function. Appendix H provides discussions on the tightness of our theoretical bounds. A. Additional Discussions A.1. Example Usages of Transparent Models As we mentioned in Section 1, there is an increasing requirement of making the decision rule to be transparent due to its potential consequences impacts to individual decision subject. Here we provide the following reasons for using transparent models: Government regulation may require the model to be transparent, especially in public services; In some cases, companies may want to disclose their models so users will have explanations and are incentivized to better use the provided services. Regardless of whether models are published voluntarily, model parameters can often be inferred via well-known query attacks . In addition, we name some concrete examples of some real-life applications: Consider the Medicaid health insurance program in the United States, which serves low-income people. There is an Model Transferability with Responsive Decision Subjects obligation to provide transparency/disclose the rules (model to automate the decisions) that decide whether individuals qualify for the program in fact, most public services have terms that are usually set in stone and explained in the documentation. Agents can observe the rules and will adapt their profiles to be qualified if needed. For instance, an agent can decide to provide additional documentation they need to guarantee approval. For more applications along these lines, please refer to this report.5 Credit score companies directly publish their criteria for assessing credit risk scores. In loan application settings, companies actually have the incentive to release criteria to incentivize agents to meet their qualifications and use their services.Furthermore, making decision models transparent will gain the trust of users. It is also known that it is possible to steal model parameters, if agents have incentives to do so.6 For instance, spammers frequently infer detection mechanisms by sending different email variants; they then adjust their spam content accordingly. A.2. Additional Related Work Strategic Classification Strategic Classification focuses on the problem of how to make predictions in the presence of agents who behave strategically in order to obtain desirable outcomes (Hardt et al., 2016a; Chen et al., 2020a; Dong et al., 2018a; Chen et al., 2020b; Miller et al., 2020). In particular, (Hardt et al., 2016a) first formalizes strategic classification tasks as a two-player sequential game (i.e., a Stackelberg game) between a model designer and strategic agents. Agent best response behavior is typically viewed as malicious in the traditional setting; as a result, the model designer seeks to disincentivize this behavior or limit its impact by publishing classifiers that are robust to any agent s adaptations. In our work, the agents strategic behaviors are not necessarily malicious; instead, we aim to provide a general framework that works for any distribution shift resulting from the human agency. Most existing work in strategic classification assumes that human agents are fully rational and will always perform best response to any given classifier. As a result, their behaviors can be fully characterized based on pre-specified human response models (Hardt et al., 2016a; Chen et al., 2020a). While we are also interested in settings where agents respond to a decision rule, we focus on the distribution shift of human agents at a population level and characterize the induced distribution as a function of the deployed model. Instead of specifying a particular individual-level agent s response model, we only require the knowledge of the source data DS, as well as some characterizations of the relationship between the source and the induced distribution, e.g., they satisfy some particular distribution shift models, like covariate shift (see Section 4), or target shift (see Section 5), or we have access to some data points from the induced distribution so we can estimate their statistical differences like H-divergence (see Section 3). In addition, the focus of our work is different from strategic classification. Instead of designing models robust to strategic behavior, we primarily study the transferability of a given model s performance on the distribution itself induces. Domain Adaptation There has been tremendous work in domain adaptation studying different distribution shifts and learning from shifting distributions (Jiang, 2008; Ben-David et al., 2010; Sugiyama et al., 2008; Zhang et al., 2019; Kang et al., 2019; Zhang et al., 2020; Xie et al., 2022). Our results differ from these previous works since in our setting, changes in distribution are not passively provided by the environment, but rather an active consequence of model deployment. Part of our technical contributions is inspired by the transferability results in domain adaptations (Ben-David et al., 2010; Zadrozny, 2004; Gretton et al., 2009; Sugiyama et al., 2008; Lipton et al., 2018; Azizzadenesheli et al., 2019). Our work, at first sight, looks similar to several sub-areas within the literature of domain adaptation, e.g., domain generalization, adversarial attack, and test-time adaptation, to name a few. For instance, the notion of observing an induced distribution resembles similarity of the adversarial machine learning literature (Lowd & Meek, 2005; Huang et al., 2011; Vorobeychik & Kantarcioglu, 2018). One of the major differences between ours and adversarial machine learning is that in adversarial machine learning, the true label Y stays the same for the attacked feature, while in our paper, both X and Y might change in the induced distribution D(h). In Appendix A.2, we provide detailed comparisons between our setting and the other subfields in domain adaptation mentioned above. Comparisons of our setting and Some Areas in Domain Adaptation We compare our setting (We address it as IDA, representing induced domain adaptation ) with the following areas: 5https://datasociety.net/library/poverty-lawgorithms/ 6https://www.wired.com/2016/09/how-to-steal-an-ai/ Model Transferability with Responsive Decision Subjects Adversarial attack (Chakraborty et al., 2018; Papernot et al., 2016; Song et al., 2019): in adversarial attack, the true label Y stays the same for the attacked feature, while in IDA, we allow the true label to change as well. One can think of adversarial attack as a specific form of IDA where the induced distribution has a specific target, that is to maximize the classifier s error by only perturbing/modifying. Our transferability bound does, however, provide insights into how standard training results transfer to the attack setting. Domain generalization (Wang et al., 2021b; Li et al., 2017; Muandet et al., 2013): the goal of domain generalization is to learn a model that can be generalized to any unseen distribution; Similar to our setting, one of the biggest challenges in domain generalization also the lack of target distribution during training. The major difference, however, is that our focus is to understand how the performance of a classifier trained on the source distribution degrades when evaluated on the induced distribution (which depends on how the population of decision subjects responds); this degradation depends on the classifier itself. Test-time adaptation (Varsavsky et al., 2020; Wang et al., 2021a; Nado et al., 2021): the issue of test-time adaptation falls into the classical domain adaptation setting where the adaptation is independent of the model being deployed. Applying this technique to solve our problem requires accessing data (either unsupervised or supervised) drawn from DS(h) for each h being evaluated during different training epochs. Remark We believe that techniques from Domain Adaptation can potentially be applied to our setting when the decisionmaker is interested in producing a classifier that performs well on the induced distribution. In general, we suspect that it will require the decision maker to know certain information about how the induced distribution and the source distribution differ, e.g. some potential characterizations of their statistical differences. Here we provide two possible directions: Data Augmentation: The basic idea of data augmentation is to augment the original data point (x, y) pairs with new pairs (A(x), B(x))where A( ) and B( ) denote a pair of transformations. Then we add the new pairs to the training dataset. A( ) and B( ) are usually seen as a way of simulating domain shift, and the design of A( ) and B( ) is key to performance. In our case, A and B are functions of the classifier h, and they capture how the classifier influences the potential response from the decision subjects. This requires the decision maker to have a specific response model in mind when designing the augmented data points. Learning Disentangled Representations: Instead of forcing the entire model or features to be domain invariant, which is challenging, one can relax this constraint by allowing some parts to be domain-specific, essentially learning disentangled representations. In our induced domain adaptation setting, this means separating features into two sets, one set contains features that are invariant to the deployment of a classifier, and the other set contains features that will be potentially affected by. Then we can decompose a classifier into domain-specific biases and domain-agnostic weights, and only keep the latter when dealing with the unseen induced domain. B. Proof of Results B.1. Proof of Theorem 3.1 Proof. We first establish two lemmas that will be helpful for bounding the errors of a pair of classifiers. Both are standard results from the domain adaption literature (Ben-David et al., 2010). Lemma B.1. For any hypotheses h, h H and distributions D, D , |Err D(h, h ) Err D (h, h )| d H H(D, D ) Proof. Define the-cross prediction disagreement between two classifiers h, h on a distribution D as Err D(h, h ) := PD(h(X) = h (X)). By the definition of the H divergence, d H H(D, D ) = 2 sup h,h H |PD(h(X) = h (X)) PD (h(X) = h (X))| = 2 sup h,h H |Err D(h, h ) Err D (h, h )| 2 |Err D(h, h ) Err D (h, h )| . Model Transferability with Responsive Decision Subjects Another helpful lemma for us is the well-known fact that the 0-1 error obeys the triangle inequality (see, e.g., (Crammer et al., 2008)): Lemma B.2. For any distribution D over instances and any labeling functions f1, f2, and f3, we have Err D(f1, f2) Err D(f1, f3) + Err D(f2, f3). Denote by h the ideal joint hypothesis, which minimizes the combined error: h := arg min h H Err D(h)(h ) + Err DS(h ) Err D(h)(h) Err D(h)( h ) + Err D(h)(h, h ) (Lemma B.2) Err D(h)( h ) + Err DS(h, h ) + Err D(h)(h, h ) Err DS(h, h ) Err D(h)( h ) + Err DS(h) + Err DS( h ) + 1 2d H H(DS, D(h)) (Lemma B.1) = Err DS(h) + λDS D(h) + 1 2d H H(DS, D(h)). (Definition of h ) B.2. Proof of Theorem 3.2 Proof. Invoking Theorem 3.1, and replacing h with h T and S with D(h T ), we have Err D(h)(h T ) Err D(h T )(h T ) + λD(h) D(h T ) + 1 2d H H(D(h T ), D(h)) (8) Now observe that Err D(h)(h) Err D(h)(h T ) + Err D(h)(h, h T ) Err D(h)(h T ) + Err D(h T )(h, h T ) + Err D(h)(h, h T ) Err D(h T )(h, h T ) Err D(h)(h T ) + Err D(h T )(h, h T ) + 1 2d H H(D(h T ), D(h)) (by Lemma B.1) Err D(h)(h T ) + Err D(h T )(h) + Err D(h T )(h T ) + 1 2d H H(D(h T ), D(h)) (by Lemma B.2) Err D(h T )(h T ) + λD(h) D(h T ) + 1 2d H H(D(h T ), D(h)) (by equation 8) + Err D(h T )(h) + Err D(h T )(h T ) + 1 2d H H(D(h T ), D(h)) Adding Err D(h)(h) to both sides and rearranging terms yields 2Err D(h)(h) 2Err D(h T )(h T ) Err D(h)(h) + Err D(h T )(h) + λD(h) D(h T ) + d H H(D(h T ), D(h)) = ΛD(h) D(h T )(h) + λD(h) D(h T ) + d H H(D(h T ), D(h)) Dividing both sides by 2 completes the proof. B.3. Proof of Theorem 3.3 Proof. Using the triangle inequality of d TV, we have d TV(DY |S, DY (h)) d TV(DY |S, Dh|S) + d TV(Dh|S, Dh(h)) + d TV(Dh(h), DY (h)) (9) Model Transferability with Responsive Decision Subjects and by the definition of d TV, the divergence term d TV(DY |S, DY (h)) becomes d TV(DY |S, Dh|S) = |PDS(Y = +1) PDS(h(x) = +1)| = EDS[Y ] + 1 2 EDS[h(X)] + 1 2 EDS[h(X)] 2 EDS [|Y h(X)|] = Err DS(h) Similarly, we have d TV(Dh(h), DY (h)) Err D(h)(h) As a result, we have Err DS(h) + Err D(h)(h) d TV(DY |S, Dh|S) + d TV(Dh(h), DY (h)) d TV(DY |S, DY (h)) d TV(Dh|S, Dh(h)) (by equation 9) which implies max{Err DS(h), Err D(h)(h)} d TV(DY |S, DY (h)) d TV(Dh|S, Dh(h)) B.4. Proof of Proposition 4.1 ED(h)[ℓ(h; X, Y )] = Z PD(h)(X = x, Y = y)ℓ(h; x, y) dxdy = Z PDS(Y = y|X = x) PD(h)(X = x)ℓ(h; x, y) dxdy = Z PDS(Y = y|X = x) PDS(X = x) PD(h)(X = x) PDS(X = x) ℓ(h; x, y) dxdy = Z PDS(Y = y|X = x) PDS(X = x) ωx(h) ℓ(h; x, y) dxdy =EDS[ωx(h) ℓ(h; x, y)] B.5. Proof of Theorem 4.2 Proof. We start from the error induced by h S. Let the average importance weight induced by h S be ω(h S) = EDS[ωx(h S)]; we add and subtract this from the error: Err D(h S)(h S) = EDS [ωx(h S) 1(h S(x) = y)] = EDS [ ω(h S) 1(h S(x) = y)] + EDS [(ωx(h S) ω(h S)) 1(h S(x) = y)] In fact, ω(h S) = 1, since ω(h S) =EDS[ωx(h S)] = Z ωx(h S)PDS(X = x)dx = Z PD(h)(X = x) PDS(X = x) PDS(X = x)dx = Z PD(h)(X = x)dx = 1 Model Transferability with Responsive Decision Subjects Now consider any other classifier h. We have Err D(h S)(h S) = EDS [1(h S(x) = y)] + EDS [(ωx(h S) ω(h S)) 1(h S(x) = y)] EDS [1(h(x) = y)] + EDS [(ωx(h S) ω(h S)) 1(h S(x) = y)] (by optimality of h S on DS) = EDS [ ω(h) 1(h(x) = y)] + EDS [(ωx(h S) ω(h S)) 1(h S(x) = y)] (multiply by ω(h S) = 1) = EDS [ωx(h) 1(h(x) = y)] + EDS [( ω(h) ωx(h)) 1(h(x) = y)] (add and subtract ω(h S)) + EDS [(ωx(h S) ω(h S)) 1(h S(x) = y)] = Err D(h)(h) + Cov(ωx(h S), 1(h S(x) = y)) Cov(ωx(h), 1(h(x) = y)) Moving the error terms to one side, we have Err D(h S)(h S) Err D(h)(h) Cov(ωx(h S), 1(h S(x) = y)) Cov(ωx(h), 1(h(x) = y)) Var(ωx(h S)) Var(1(h S(x) = y)) (|Cov(X, Y )| p Var(X) Var(Y )) Var(ωx(h)) Var(1(h(x) = y)) Var(ωx(h S)) Err S(h S)(1 Err S(h S)) + p Var(ωx(h)) Err DS(h)(1 Err DS(h)) Var(ωx(h S)) Err S(h S) + p Var(ωx(h)) Err DS(h) (1 Err DS(h) 1) Err DS(h) q Var(ωx(h S)) + p Since this holds for any h, it certainly holds for h = h T . B.6. Omitted Assumptions and Proof of Theorem 4.6 Denote X+(h) = {x : ωx(h) 1} and X (h) = {x : ωx(h) < 1}. First, we observe that Z X+(h) PDS(X = x)(1 ωx(h))dx X (h) PDS(X = x)(1 ωx(h))dx = 0 This is simply because of R x PDS(X = x) ωx(h)dx = R x PD(h)(X = x)dx = 1. Proof. Notice that in the setting of binary classification, we can write the total variation distance between DY |S and DY (h) as the difference between the probability of Y = +1 and the probability of Y = 1: d TV(DY |S, DY (h)) = PDS(Y = +1) PD(h)(Y = +1) Z PDS(Y = +1|X = x)PDS(X = x)dx Z PDS(Y = +1|X = x)PDS(X = x)ωx(h)dx Z PDS(Y = +1|X = x)PDS(X = x) (1 ωx(h))dx (10) Similarly we have d TV(Dh|S, Dh(h)) = Z PDS(h(x) = +1|X = x)PDS(X = x) (1 ωx(h))dx (11) Model Transferability with Responsive Decision Subjects We can further expand the total variation distance between DY |S and DY (h) as follows: d TV(DY |S, DY (h)) Z PDS(Y = +1|X = x)PDS(X = x) (1 ωx(h))dx X+(h) PD(Y = +1|X = x)PDS(X = x) (1 ωx(h))dx X (h) PDS(Y = +1|X = x)PDS(X = x) (1 ωx(h))dx X+(h) PDS(Y = +1|X = x)PDS(X = x) (1 ωx(h))dx X (h) PDS(Y = +1|X = x)PDS(X = x) (1 ωx(h))dx (by Assumption 4.3) X+(h) PDS(Y = +1|X = x)PDS(X = x) (ωx(h) 1)dx X (h) PDS(Y = +1|X = x)PDS(X = x) (ωx(h) 1)dx (by equation 10) = Z PDS(Y = +1|X = x)PDS(X = x) (ωx(h) 1)dx Similarly, by assumption 4.4 and equation equation 11, we have d TV(Dh|S, Dh(h)) = Z PDS(h(x) = +1|X = x)PDS(X = x) (ωx(h) 1)dx Thus we can bound the difference between d TV(DY |S, DY (h)) and d TV(Dh|S, Dh(h)) as follows: d TV(DY |S, DY (h)) d TV(Dh|S, Dh(h)) = Z PDS(Y = +1|X = x)PDS(X = x) (ωx(h) 1)dx Z PD(h(x) = +1|X = x)PDS(X = x) (ωx(h) 1)dx = Z [PDS(Y = +1|X = x) PDS(h(x) = +1|X = x)]PDS(X = x) (ωx(h) 1)dx = EDS[(PDS(Y = +1|X = x) PDS(h(x) = +1|X = x)) (ωx(h) 1)] (by Assumption 4.5) > EDS[PDS(Y = +1|X = x) PDS(h(x) = +1|X = x)]EDS[ωx(h) 1] Combining the above with Theorem 3.3, we have max{Err DS(h), Err D(h)(h)} d TV(DY |S, DY (h)) d TV(Dh|S, Dh(h)) B.7. Omitted details for Section 4.3 With Setup 2 - Setup 4, we can further specify the important weight wx(h) for the strategic response setting: Model Transferability with Responsive Decision Subjects Lemma B.3. Recall the definition for the covariate shift important weight coefficient ωx(h) := PD(h)(X=x) PDS (X=x) , for our strategic response setting, we have, 1, x [0, τh B) τh x B , x [τh B, τh) 1 B ( x + τh + 2B), x [τh, τh + B) 1, x [τh + B, 1] Proof for Lemma B.3: Proof. We discuss the induced distribution D(h) by cases: For the features distributed between [0, τh B]: since we assume the agents are rational, under assumption 2, agents with feature that is smaller than [0, τh B] will not perform any kinds of adaptations, and no other agents will adapt their features to this range of features either, so the distribution between [0, τh B] will remain the same as before. For the target distribution between [τh B, τh] can be directly calculated from assumption 3. For distribution between [τh, τh + B], consider a particular feature x [τh, τh + B], under Setup 4, we know its new distribution becomes: PD(h)(x = x ) = 1 + Z τh B B τh + z dz B ( x + τh + 2B) For the target distribution between [τh + B, 1]: under assumption 2 and 4, we know that no agents will change their feature to this feature region. So the distribution between [τh + B, 1] remains the same as the source distribution. Recall the definition for the covariate shift important weight coefficient ωx(h) := PD(h)(X=x) PDS (X=x) , the distribution of ωx(h) after agents strategic responding becomes: 1, x [0, τh B) and x [τh + B, 1] τh x B , x [τh B, τh) 1 B ( x + τh + 2B), x [τh, τh + B) 0, otherwise Proof for Proposition 4.7: Proof. According to Lemma B.3, we can compute the variance of wx(h) as Var(wx(h)) = E(wx(h)2) E(wx(h)2) = 2 3B. Then plugging it into the general bound for Theorem 4.2 gives us the desired result. B.8. Proof of Theorem 5.1 Proof. Defining p := PDS(Y = +1), p(h) = PD(h)(Y = +1), we have Err D(h S)(h S) = p(h S) Err+(h S) + (1 p(h S)) Err (h S) (by definitions of p(h S), Err+(h S), and Err (h S)) = p Err+(h S) + (1 p) Err (h S) | {z } (I) +(p(h S) p)[Err+(h S) Err (h S)] (14) Model Transferability with Responsive Decision Subjects We can expand (I) as follows: p Err+(h S) + (1 p) Err (h S) p Err+(h T ) + (1 p) Err (h T ) (by optimality of h S on DS) = p(h T ) Err+(h T ) + (1 p(h T )) Err (h T ) + (p p(h T )) [Err+(h T ) Err (h T )] = Err D(h T )(h T ) + (p p(h T )) [Err+(h T ) Err (h T )] . Plugging this back into equation 14, we have Err D(h S)(h S) Err D(h T )(h T ) (p(h S) p)[Err+(h S) Err (h S)] + (p p(h T )) [Err+(h T ) Err (h T )] Notice that 0.5(Err+(h) Err (h)) = 0.5 1 0.5 P(h(X) = +1|Y = +1) 0.5 P(h(X) = +1|Y = 1) = 0.5 PDu(h(X) = +1) where Du is a distribution with a uniform prior. Then (p(h S) p)[Err+(h S) Err (h S)] = 2(p(h S) p) (0.5 PDu(h(X) = +1)) (p p(h T ))[Err+(h T ) Err (h T )] = 2(p p(h T )) (0.5 PDu(h(X) = +1)) Adding together these two equations yields (p(h S) p)[Err+(h S) Err (h S)] + (p p(h T )) [Err+(h T ) Err (h T )] = 2(p(h S) p) (0.5 PDu(h S(X) = +1)) + 2(p p(h T )) (0.5 PDu(h T (X) = +1)) = (p(h S) p(h T )) 2 (p(h S)PDu(h S(X) = +1) p(h T )PDu(h T (X) = +1)) + 2p (PDu(h S(X) = +1) PDu(h T (X) = +1)) |p(h S) p(h T )| (1 + 2|PDu(h S(X) = +1) PDu(h T (X) = +1)|) + 2p |PDu(h S(X) = +1) PDu(h T (X) = +1)| (15) |PDu(h S(X) = +1) PDu(h T (X) = +1)| 0.5 |PD|Y =+1(h S(X) = +1) PD|Y =+1(h T (X) = +1)| + 0.5 |PD|Y = 1(h S(X) = +1) PD|Y = 1(h T (X) = +1)| = 0.5 (d TV(D+(h S), D+(h T )) + d TV(D (h S), D (h T )) (16) Combining equation 15 and equation 16 gives |p(h S) p(h T )| (1 + 2 |PDu(h S(X) = +1) PDu(h T (X) = +1)|) + 2p |PDu(h S(X) = +1) PDu(h T (X) = +1)| |p(h S) p(h T )| (1 + d TV(D+(h S), D+(h T )) + d TV(D (h S), D (h T )) + p (d TV(D+(h S), D+(h T )) + d TV(D (h S), D (h T )) |p(h S) p(h T )| + (1 + p) (d TV(D+(h S), D+(h T )) + d TV(D (h S), D (h T )) . B.9. Proof of Theorem 5.2 We will make use of the following fact: Lemma B.4. Under label shift, TPRS(h) = TPRh(h) and FPRS(h) = FPRh(h). Model Transferability with Responsive Decision Subjects Proof. We have TPRh(h) =PD(h)(h(X) = +1|Y = +1) = Z PD(h)(h(X) = +1, X = x|Y = +1)dx = Z PD(h)(h(X) = +1|X = x, Y = +1)PD(h)(X = x|Y = +1)dx = Z 1(h(x) = +1)PD(h)(X = x|Y = +1)dx = Z 1(h(x) = +1)PDS(X = x|Y = +1)dx (by definition of label shift) = Z PDS(h(X) = +1|X = x, Y = +1)PDS(X = x|Y = +1)dx The argument for TPRh(h) = TPRS(h) is analogous. Now we proceed to prove the theorem. Proof of Theorem 5.2. In section 3.2 we showed a general lower bound on the maximum of Err DS(h) and Err D(h)(h): max{Err DS(h), Err D(h)(h)} d TV(DY |S, DY (h)) d TV(Dh|S, Dh(h)) In the case of label shift, and by the definitions of p and p(h), d TV(DY |S, DY (h)) = |PDS(Y = +1) PD(h)(Y = +1)| = |p p(h)| (17) In addition, we have Dh|S = PS(h(X) = +1) = p TPRS(h) + (1 p) FPRS(h) (18) Dh(h) = PD(h)(h(X) = +1) = p(h) TPRh(h) + (1 p(h)) FPRh(h) = p(h) TPRS(h) + (1 p(h)) FPRS(h) (by Lemma B.4) (19) d TV(Dh|S, Dh(h)) =|PDS(h(X) = +1) PD(h)(h(X) = +1)| =|(p p(h)) TPRS(h) + (p(h) p) FPRS(h)| (By equation 19 and equation 18) =|p p(h)| |TPRS(h) FPRS(h)| (20) which yields: d TV(DY |S, DY (h)) d TV(Dh|S, Dh(h)) = |p p(h)|(1 |TPRS(h) FPRS(h)|) (By equation 17 and equation 20) completing the proof. Model Transferability with Responsive Decision Subjects B.10. Proof of Proposition 5.3 |p(h S) p(h T )| 1 PDS(Y = +1) =|(1 Err DS(h S))TPRS(h S) (1 Err DS(h T ))TPRS(h T )| (1 Err DS(h S)) (1 Err DS(h T )) |Err DS(h S) Err DS(h T )| |TPRS(h S) TPRS(h T )| (1 Err DS(h S)) (1 Err DS(h T )) (21) The inequality above is due to Lemma 7 of (Liu & Liu, 2015). C. Lower Bound and Example for Target Shift C.1. Lower Bound Now we discuss lower bounds. Denote by TPRS(h) and FPRS(h) the true positive and false positive rates of h on the source distribution DS. We prove the following: Theorem C.1. Under target shift, any model h must incur the following error on either the DS or D(h): max{Err DS(h), Err D(h)(h)} |p p(h)| (1 |TPRS(h) FPRS(h)|) The proof extends the bound of Theorem 3.3 by further explicating each of d TV(DY |S, DY (h)), d TV(Dh|S, and Dh(h)) under the assumption of target shift. Since |TPRS(h) FPRS(h)| < 0 unless we have a trivial classifier that has either TPRS(h) = 1, FPRS(h) = 0 or TPRS(h) = 0, FPRS(h) = 1, the lower bound is strictly positive. Taking a closer look, the lower bound is determined linearly by how much the label distribution shifts: p p(h). The difference is further determined by the performance of h on the source distribution through 1 |TPRS(h) FPRS(h)|. For instance, when TPRS(h) > FPRS(h), the quality becomes FNRS(h) + FPRS(h), that is the more error h makes, the larger the lower bound will be. C.2. Example Using Replicator Dynamics Let us instantiate the discussion using a specific fitness function for the replicator dynamics model (Section 2.1), which is the prediction accuracy of h for class +1: [Fitness of Y = +1] := PDS(h(X) = +1|Y = +1) (22) Then we have E [Fitness of Y ] = Err DS(h), and p(h) PDS(Y = +1) = PDS(h(X) = +1|Y = +1) Plugging the result back to our Theorem 5.1 we have Proposition C.2. Under the replicator dynamics model in Eqn. (22), |p(h S) p(h T )| further bounds as: |p(h S) p(h T )| PDS(Y = +1) |Err DS(h S) Err DS(h T )| |TPRS(h S) TPRS(h T )| Err DS(h S) Err DS(h T ) . That is, the difference between Err D(h S)(h S) and Err D(h T )(h T ) is further dependent on the difference between the two classifiers performances on the source data DS. This offers an opportunity to evaluate the possible error transferability using the source data only. Model Transferability with Responsive Decision Subjects D. Missing Experimental Details D.1. Synthetic Experiments Using DAG Here we provide details in terms of the data-generating process for the simulated dataset. Covariate Shift We specify the causal DAG for covariate shift setting in the following way: X1 Unif( 1, 1) X2 1.2X1 + N(0, σ2 2) X3 X2 1 + N(0, σ2 3) Y := 2sign(X2 > 0) 1 where σ2 2 and σ2 3 are parameters of our choices. Adaptation function We assume the new distribution of feature X 1 will be generated in the following way: X 1 = (X) = X1 + c (h(X) 1) where c R1 > 0 is the parameter controlling how much the prediction h(X) affect the generating of X 1, namely the magnitude of distribution shift. Intuitively, this adaptation function means that if a feature x is predicted to be positive (h(x) = +1), then decision subjects are more likely to adapt to that feature in the induced distribution; Otherwise, decision subjects are more likely to be moving away from x since they know it will lead to a negative prediction. Target Shift We specify the causal DAG for target shift setting in the following way: (Y + 1)/2 Bernoulli(α) X1|Y = y N[0,1](µy, σ2) X2 = 0.8X1 + N(0, σ2 2) X3 = 0.2Y + N(0, σ2 3) where N[0,1] represents a truncated Gaussian distribution taken value between 0 and 1. α, µy, σ2,σ2 2 and σ2 3 are parameters of our choices. Adaptation function We assume the new distribution of the qualification Y will be updated in the following way: P(Y = +1|h(X) = h, Y = y) = chy, where {h, y} { 1, +1} where 0 chy R1 1 represents the likelihood for a person with original qualification Y = y and get predicted as h(X) = h to be qualified in the next step (Y = +1). D.2. Synthetic Experiments Using Real-world Data On the preprocessed FICO credit score data set (Board of Governors of the Federal Reserve System (US), 2007; Hardt et al., 2016b), we convert the cumulative distribution function (CDF) of Trans Risk score among demographic groups (denoted as A, including Black, Asian, Hispanic, and White) into group-dependent densities of the credit score. We then generate a balanced sample where each group has equal representation, with credit scores (denoted as Q) initialized by sampling from the corresponding group-dependent density. The value of attributes for each data point is then updated under a specified dynamics (detailed in Appendix D.2.1) to model the real-world scenario of repeated resource allocation (with decision denoted as D). Model Transferability with Responsive Decision Subjects D.2.1. PARAMETERS FOR DYNAMICS Since we are considering the dynamic setting, we further specify the data generating process in the following way (from time step T = t to T = t + 1): Xt,1 1.5Qt + U[ ϵ1, ϵ1] Xt,2 0.8At + U[ ϵ2, ϵ2] Xt,3 At + N(0, σ2) Yt Bernoulli(qt) for a given value of Qt = qt Dt = ft(At, Xt,1, Xt,2, Xt,3) Qt+1 = {Qt [1 + αD(Dt) + αY (Yt)]}(0,1] At+1 = At (fixed population) where { }(0,1] represents truncated value between the interval (0, 1], ft( ) represents the decision policy from input features, and ϵ1, ϵ2, σ are parameters of choices. In our experiments, we set ϵ1 = ϵ2 = σ = 0.1. Within the same time step, i.e., for variables that share the subscript t, Qt and At are root causes for all other variables (Xt,1, Xt,2, Xt,3, Dt, Yt). At each time step T = t, the institution first estimates the credit score Qt (which is not directly visible to the institution, but is reflected in the visible outcome label Yt) based on (At, Xt,1, Xt,2, Xt,3), then produces the binary decision Dt according to the optimal threshold (in terms of the accuracy). For different time steps, e.g., from T = t to T = t + 1, the new distribution at T = t + 1 is induced by the deployment of the decision policy Dt. Such impact is modeled by a multiplicative update in Qt+1 from Qt with parameters (or functions) αD( ) and αY ( ) that depend on Dt and Yt, respectively. In our experiments, we set αD = 0.01 and αY = 0.005 to capture the scenario where one-step influence of the decision on the credit score is stronger than that for ground truth label. D.2.2. ADDITIONAL EXPERIMENTAL RESULTS In this section, we present additional experimental results on the real-world FICO credit score data set. With the initialization of the distribution of credit score Q and the specified dynamics, we present results comparing the influence of vanilla regularization terms in decision-making (when estimating the credit score Q) on the calculation of bounds for induced risks.7 In particular, we consider L1 norm (Figure 5) and L2 norm (Figure 6) regularization terms when optimizing decision-making policies on the source domain. As we can see from the results, applying vanilla regularization terms (e.g., L1 norm and L2 norm) on source domain without specific considerations of the inducing-risk mechanism does not provide significant performance improvement in terms of smaller induced risk. For example, there is no significant decrease of the term Diff as the regularization strength increases, for both L1 norm (Figure 5) and L2 norm (Figure 6) regularization terms. E. Challenges in Minimizing Induced Risk In this section, we provide discussion on the challenges in performing induced domain adaptation. E.1. Computational Challenges The literature of domain adaptation has provided us solutions to minimize the risk on the target distribution via a nicely developed set of results (Sugiyama et al., 2008; 2007; Shimodaira, 2000). This allows us to extend the solutions to minimize the induced risk too. Nonetheless we will highlight additional computational challenges. We focus on the covariate shift setting. The scenario for target shift is similar. For covariate shift, recall that earlier we derived the following fact: ED(h)[ℓ(h; X, Y )] = ED[ωx(h) ℓ(h; x, y)] This formula informs us that a promising solution that uses ωx(h) to perform reweighted ERM. Of course, the primary challenge that stands in the way is how do we know ωx(h). There are different methods proposed in the literature to estimate ωx(h) when one has access to D(h) (Zhang et al., 2013; Long et al., 2016; Gong et al., 2016). How any of the specific techniques work in our induced domain adaptation setting will be left for a more thorough future study. In this section, we 7The regularization that involves induced risk considerations will be discussed in Appendix G. Model Transferability with Responsive Decision Subjects 0 1 2 3 4 5 K (a) L1 penalty, strong regularization strength. 0 1 2 3 4 5 K (b) L1 penalty, strong regularization strength. 0 1 2 3 4 5 K (c) L1 penalty, medium regularization strength. 0 1 2 3 4 5 K (d) L1 penalty, medium regularization strength. 0 1 2 3 4 5 K (e) L1 penalty, weak regularization strength. 0 1 2 3 4 5 K (f) L1 penalty, weak regularization strength. Figure 5. Results of applying L1 penalty with different strength when constructing h S. The left column consisting of panels (a), (c), and (e) compares Max := max{Err DS(h T ), Err D(h T )(h T )} and LB := lower bound specified in Theorem 4.6. The right column consisting of panels (b), (d), and (f) compares Diff := Err D(h S)(h S) Err D(h T )(h T ) and UB := upper bound specified in Theorem 4.2. For each time step K = k, we compute and deploy the source optimal classifier h S and update the credit score for each individual according to the received decision as the new reality for time step K = k + 1. focus on explaining the computational challenges even when such knowledge of ωx(h) can be obtained for each model h being considered during training. Though ωx(h), ℓ(h; x, y) might both be convex with respect to (the output of) the classifier h, their product is not necessarily convex. Consider the following example: Example 1 (ωx(h) ℓ(h; x, y) is generally non-convex). Let X = (0, 1]. Let the true label of each x X be y(x) = 1 x 1 2 . Let ℓ(h; x, y) = 1 2(h(x) y)2, and let h(x) = x (simple linear model). Notice that ℓis convex in h. Let D be the uniform distribution, whose density function is f D = ( 1, 0 < x 1 0, otherwise . Notice that if the training data is drawn from D, then h is the linear classifier that minimizes the expected loss. Suppose that, since h rewards large values of x, it induces decision subjects to shift towards higher feature values. In particular, let D(h) have density function ( 2x, 0 < x 1 0, otherwise Model Transferability with Responsive Decision Subjects 0 1 2 3 4 5 K (a) L2 penalty, strong regularization strength. 0 1 2 3 4 5 K (b) L2 penalty, strong regularization strength. 0 1 2 3 4 5 K (c) L2 penalty, medium regularization strength. 0 1 2 3 4 5 K (d) L2 penalty, medium regularization strength. 0 1 2 3 4 5 K (e) L2 penalty, weak regularization strength. 0 1 2 3 4 5 K (f) L2 penalty, weak regularization strength. Figure 6. Results of applying L2 penalty with different strength when constructing h S. The left column consisting of panels (a), (c), and (e) compares Max := max{Err DS(h T ), Err D(h T )(h T )} and LB := lower bound specified in Theorem 4.6. The right column consisting of panels (b), (d), and (f) compares Diff := Err D(h S)(h S) Err D(h T )(h T ) and UB := upper bound specified in Theorem 4.2. For each time step K = k, we compute and deploy the source optimal classifier h S and update the credit score for each individual according to the received decision as the new reality for time step K = k + 1. Then for all x X, ωx(h) = f D(h)(x) f D(x) = 2x. Notice that ωx(h) = 2x is convex in h(x) = x. Then ωx(h) ℓ(h; x, y) = 2x 1 = x(x y)2 = ( x3, 0 < x < 1 2 x(x 1)2, 1 2 x 1 which is clearly non-convex. Nonetheless, we provide sufficient conditions under which ωx(h) ℓ(h; x, y) is in fact convex: Proposition E.1. Suppose ωx(h) and ℓ(h; x, y) are both convex in h, and ωx(h) and ℓ(h; x, y) satisfy h, h , x, y: (ωx(h) ωx(h )) (ℓ(h; x, y) ℓ(h ; x, y)) 0. Then ωx(h) ℓ(h; x, y) is convex. Proof. Let us use the shorthand ω(h) := ωx(h) and ℓ(h) := ℓ(h; x, y). To show that ω(h) ℓ(h) is convex, it suffices to show that for any α [0, 1] and any two hypotheses h, h we have ω(α h + (1 α) h ) ℓ(α h + (1 α) h ) α ω(h) ℓ(h) + (1 α) ω(h ) ℓ(h ) Model Transferability with Responsive Decision Subjects By the convexity of ω, ω(α h + (1 α) h ) α ω(h) + (1 α) ω(h ) and by the convexity of ℓ, ℓ(α h + (1 α) h ) α ℓ(h) + (1 α) ℓ(h ) Therefore it suffices to show that [α ω(h) + (1 α) ω(h )] [α ℓ(h) + (1 α) ℓ(h )] α ω(h) ℓ(h) + (1 α) ω(h ) ℓ(h ) 0 α(α 1) ω(h)ℓ(h) α(α 1) [ω(h)ℓ(h ) + ω(h )ℓ(h)] + α(α 1) ω(h )ℓ(h ) 0 α(α 1) [ω(h) ω(h )] [ℓ(h) ℓ(h )] 0 [ω(h) ω(h )] [ℓ(h) ℓ(h )] 0 By the assumed condition, the left-hand side is indeed non-negative, which proves the claim. This condition is intuitive when each x belongs to a rational agent who responds to a classifier h to maximize her chance of being classified as +1: For y = +1, the higher loss point corresponds to the ones that are close to decision boundary, therefore, more 1 negative label points might shift to it, resulting to a larger ωx(h). For y = 1, the higher loss point corresponds to the ones that are likely mis-classified as +1, which attracts instances to deviate to. E.2. Challenges due to the lack of access to data In the standard domain adaptation settings, one often assumes the access to a sample set of X, which already poses challenges when there is no access to label Y after the adaptation. Nonetheless, the literature has observed a fruitful development of solutions (Sugiyama et al., 2008; Zhang et al., 2013; Gong et al., 2016). One might think the above idea can be applied to our IDA setting rather straightforwardly by assuming observing samples from D(h), the induced distribution under each model h during the training. However, we often do not know precisely how the distribution would shift under a model h until we deploy it. This is particularly true when the distribution shifts are caused by human responding to a model. Therefore, the ability to predict accurately how samples react to h plays a very important role (Ustun et al., 2019). Indeed, the strategic classification literature enables this capability by assuming full rational human agents. For a more general setting, building robust domain adaptation tools that are resistant to the above prediction error is also going to be a crucial criterion. F. Discussions On Performing Direct Induced Risk Minimization In this section, we provide discussions on how to directly perform induced risk minimization for our induced domain adaptation setting. We first provide a gradient descent based method for a particular label shift setting where the underlying dynamic is replicator dynamic described in Section 5.3. Then we propose a solution for a more general induced domain adaptation setting where we do not make any particular assumptions on the undelying distribution shift model. F.1. Gradient descent based method Here we provide a toy example of performing direct induced risk minimization under the assumption of label shift with underlying dynamics as the replicator dynamics described in Section 5.3. Setting Consider a simple setting in which each decision subject is associated with a 1-dimensional continuous feature x R and a binary true qualification y { 1, +1}. We assume label shift setting, and the underlying population dynamic evolves the replicator dynamic setting described in Section 5.3. We consider a simple threshold classifier, where ˆY = h(x) = 1[X θ], meaning that the classifier is completely characterized by the threshold parameter θ. Below we will use ˆY and h(X) interchangeably to represent the classification outcome. Recall that the replicator dynamics is specified as follows: PD(h)(Y = y) PDS(Y = y) = Fitness(Y = y) EDS[Fitness(Y )] (23) Model Transferability with Responsive Decision Subjects Figure 7. Experimental results of directly optimizing for the induced risk under the assumption of replicator dynamic. The X-axis denotes the prediction accuracy of Err D(h S)(h S), where h S is the source optimal classifier under each settings. The Y-axis is the percent of performance improvement using the classifier that optimize for h T = arg min Err D(h)(h), which the decision maker considers the underlying response dynamics (according to replicator dynamics in Equation (23)) of the decision subjects. Different color represents different utility function, which is reflected by the specifications of values in Uy,ˆy; within each color, different dots represent different initial qualification rate. where EDS[Fitness(Y )] = Fitness(Y = y)PDS(Y = y) + Fitness(Y = y)(1 PDS(Y = y)). Fitness(Y = y) is the fitness of strategy Y = y, which is further defined in terms of the expected utility Uy,ˆy of each qualification-classification outcome pair (y, ˆy): Fitness(Y = y) := X ˆy P[ ˆY = ˆy|Y = y] Uy,ˆy where Uy,ˆy is the utility (or reward) for each qualification-classification outcome combination.P(X|Y = y) is sampled according to a Gaussian distribution, and will be unchanged since we consider a label shift setting. We initialize the distributions we specify the initial qualification rate PDS(Y = +1). To test different settings, we vary the specification of the utility matrix Uy,ˆy and generate different dynamics. Formulate the induced risk as a function of h To minimize the induced risk, we first formulate the induced risk as a function of the classifier h s parameter θ taking into account of the underlying dynamic, and then perform gradient descent to solve for locally optimal classifier h T . Recall from Section 5, under label shift, we can rewrite the induced risk as the following form: ED(h)[ℓ(h; X, Y )] =p(h) EDS[ℓ(h; X, Y )|Y = +1] + (1 p(h)) EDS[ℓ(h; X, Y )|Y = 1] where p(h) = PD(h)(Y = +1). Since EDS[ℓ(h; X, Y )|Y = +1] and EDS[ℓ(h; X, Y )|Y = 1] are already functions of both h and DS, it suffices to show that the accuracy on D(h), p(h) = PD(h)(Y = +1), can also be expressed as a function of θ and DS. To see this, recall that for a threshold classifier ˆY = 1[X > θ], it means that the prediction accuracy can be written as a Model Transferability with Responsive Decision Subjects function of the threshold θ and target distribution D(h): PD(h)(Y = +1) = PD(h)( ˆY = +1, Y = +1) + PD(h)( ˆY = 1, Y = 1) = PD(h)(X θ, Y = +1) + PD(h)(X θ, Y = 1) θ PD(h)(Y = +1) P(X = x|Y = 1) | {z } unchanged because of label shift PD(h)(Y = 1) P(X = x|Y = 1) | {z } unchanged because of label shift where P(X|Y = y) remains unchanged over time, and PD(h)(Y = y) evolves over time according to Equation (23), namely PD(h)(Y = y) =PDS(Y = y) Fitnessg(Y = y) EDS[Fitnessg(Y )] =PDS(Y = y) ˆy PDS[ ˆY = ˆy|Y = y, G = g] Uˆy,y P ˆy PDS[ ˆY = ˆy|Y = y, G = g] Uˆy,y)PDS[Y = y] (25) Notice that ˆY is only a function of θ, and Uy,ˆy are fixed quantities, the above derivation indicates that we can express PD(h)(Y = y) as a function of θ and DS. Plugging it back to Equation (24), we can see that the accuracy can also be expressed as a function of the classifier s parameter θ, indicating that the induced risk can be expressed as a function of θ. Thus we can use gradient descent using automatic differentiation w.r.t θ to find a optimal classifier h T that minimize the induced risk. Experimental Results Figure 7 shows the experimental results for this toy example. We can see that for each setting, compared to the baseline classifier h S, the proposed gradient based optimization procedure returns us a classifier that achieves a better prediction accuracy (thus lower induced risk) compared to the accuracy of the source optimal classifier. F.2. General Setting: Induced Risk Minimization with Bandit Feedback In general, finding the optimal classifier that achieves the optimal induced risk h T is a hard problem due to the interactive nature of the problem (see, e.g. the literature of performative prediction (Perdomo et al., 2020) for more detailed discussions). Without making any assumptions on the mapping between h and D(h), one can only potentially rely on the bandit feedbacks from the decision subjects to estimate the influence of h on D(h): when the induced risk is a convex function of the classifier h s parameter θ, one possible approach is to use the standard techniques from bandit optimization (Flaxman et al., 2005) to iteratively find induced optimal classifier h T . The basic idea is: at each step t = 1, , T, the decision maker deploy a classifier ht, then observe data points sampled from D(ht) and their losses, and use them to construct an approximate gradient for the induced risk as a function of the model parameter θt. When the induced risk is a convex function in the model parameter θ, the above approach guarantees to converge to h T , and have sublinear regret in the total number of steps T. The detailed description of the algorithm for finding h T is as follows: Model Transferability with Responsive Decision Subjects Algorithm 1 One-point bandit gradient descent for performative prediction Result: return θT after T rounds θ1 0 foreach time step t 1, . . . , T do Sample a unit vector ut Unif(S) θ+ t θt + δut Observe data points z1, . . . , znt D(θ+ t ) e IR(θ+ t ) 1 nt Pnt i=1 ℓ(zi; θ+ t ) δ e IR(θ+ t ) ut gt(θt) is an approximation of θ b IR(θt) θt+1 Π(1 δ)Θ(θt η gt(θt)) Take gradient step; project onto (1 δ)Θ := {(1 δ)θ | θ Θ} end G. Regularized Training In this section, we discuss the possibility that indeed minimizing regularized risk will lead to a tighter upper bound. Consider the target shift setting. Recall that p(h) := PD(h)(Y = +1) and we have for any proper loss function ℓ: ED(h)[ℓ(h; X, Y )] = p(h) EDS[ℓ(h; X, Y )|Y = +1] + (1 p(h)) EDS[ℓ(h; X, Y )|Y = 1] Suppose p < p(h T ), now we claim that minimizing the following regularized/penalized risk leads to a smaller upper bound. EDS[ℓ(h; X, Y )] + α EDuniform||h(X) + 1 where in above Duniform is a distribution with uniform prior for Y . We impose the following assumption: The number of predicted +1 for examples with Y = +1 and for examples with Y = 1 are monotonic with respect to α. Consider the easier setting with ℓ= 0-1 loss. Then EDuniform||h(X)|| = 0.5 (PX|Y =+1[h(X) = +1] + PX|Y = 1[h(X) = +1]) 0.5 = 0.5 (EX|Y =+1[ℓ(h(X), +1)] EX|Y = 1[ℓ(h(X), 1]) The above regularized risk minimization problem is equivalent to (p + 0.5 α) EX|Y =+1[ℓ(h(X), +1)] + (p 0.5 α) EX|Y = 1[ℓ(h(X), 1] Recall the upper bound in Theorem 5.1: Err D(h S)(h S) Err D(h T )(h T ) |p(h S) p(h T )| | {z } Term 1 + (1 + p) (d TV(D+(h S), D+(h T )) + d TV(D (h S), D (h T )) | {z } Term 2 With a properly specified α > 0, this leads to a distribution with a smaller gap of |p( h S) p(h T )|, where h S denotes the optimal classifier of the penalized risk minimization - this leads to a smaller Term 1 in the bound of Theorem 5.1. Furthermore, the induced risk minimization problem will correspond to an α s.t. α = p(h T ) p 0.5 , and the original h S corresponds to a distribution of α = 0. Using the monotonicity assumption, we will establish that the second term in Theorem 5.1 will also smaller when we tune a proper α. Model Transferability with Responsive Decision Subjects H. Discussion on the tightness of our theoretical bounds General Bounds in Section 3 For the general bounds reported in Section 3, it is not trivial to fully quantify the tightness without further quantifying the specific quantities of the terms, e.g. the H divergence of the source and the induced distribution, and the average error a classifier have to incur for both distribution. This part of our results adapted from the classical literature in learning from multiple domains (Ben-David et al., 2010). The tightness of using H-divergence and other terms seem to be partially validated therein. Bounds in Section 4 and Section 5 For more specific bounds provided in Section 4 (for covariate shift) and Section 5 (target shift), however, it is relatively easier to argue about the tightness: the proofs there are more transparent and are easier to back out the conditions where the inequalities are relaxed. For example, in Theorem 5.1, the inequalities of our bound are introduced primarily in the following two places: 1) one is using the optimiality of h S on the source distribution. 2) the other is bounding the statistical difference in h S and h T s predictions on the positive and negative examples. Both are saying that if the differences in the two classifiers predictions are bounded in a range, then the result in Theorem 5.1 is relatively tight.