# stateful_strategic_regression__b0189d52.pdf Stateful Strategic Regression Keegan Harris School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 keeganh@cmu.edu Hoda Heidari School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 hheidari@cmu.edu Zhiwei Steven Wu School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 zstevenwu@cmu.edu Automated decision-making tools increasingly assess individuals to determine if they qualify for high-stakes opportunities. A recent line of research investigates how strategic agents may respond to such scoring tools to receive favorable assessments. While prior work has focused on the short-term strategic interactions between a decision-making institution (modeled as a principal) and individual decisionsubjects (modeled as agents), we investigate interactions spanning multiple timesteps. In particular, we consider settings in which the agent s effort investment today can accumulate over time in the form of an internal state impacting both his future rewards and that of the principal. We characterize the Stackelberg equilibrium of the resulting game and provide novel algorithms for computing it. Our analysis reveals several intriguing insights about the role of multiple interactions in shaping the game s outcome: First, we establish that in our stateful setting, the class of all linear assessment policies remains as powerful as the larger class of all monotonic assessment policies. While recovering the principal s optimal policy requires solving a non-convex optimization problem, we provide polynomial-time algorithms for recovering both the principal and agent s optimal policies under common assumptions about the process by which effort investments convert to observable features. Most importantly, we show that with multiple rounds of interaction at her disposal, the principal is more effective at incentivizing the agent to accumulate effort in her desired direction. Our work addresses several critical gaps in the growing literature on the societal impacts of automated decisionmaking by focusing on longer time horizons and accounting for the compounding nature of decisions individuals receive over time. 1 Introduction Automated decision-making tools increasingly assess individuals to determine whether they qualify for life-altering opportunities in domains such as lending [27], higher education [32], employment [41], and beyond. These assessment tools have been widely criticized for the blatant disparities they produce through their scores [43, 3]. This overwhelming body of evidence has led to a remarkably active area of research into understanding the societal implications of algorithmic/data-driven automation. Much of the existing work on the topic has focused on the immediate or short-term societal effects of automated decision-making. (For example, a thriving line of work in Machine Learning (ML) addresses the unfairness that arises when ML predictions inform high-stakes decisions [18, 22, 31, 8, 1, 16, 11] by defining it as a form of predictive disparity, e.g., inequality in false-positive rates [22, 3] across social groups.) With the exception of several noteworthy recent articles (which we discuss shortly), prior work has largely ignored the processes through which algorithmic decision-making systems can induce, perpetuate, or amplify undesirable choices and behaviors. 35th Conference on Neural Information Processing Systems (Neur IPS 2021). Our work takes a long-term perspective toward modeling the interactions between individual decision subjects and algorithmic assessment tools. We are motivated by two key observations: First, algorithmic assessment tools often provide predictions about the latent qualities of interest (e.g., creditworthiness, mastery of course material, or job productivity) by relying on imperfect but observable proxy attributes that can be directly evaluated about the subject (e.g., past financial transactions, course grades, peer evaluation letters). Moreover, their design ignores the compounding nature of advantages/disadvantages individual subjects accumulate over time in pursuit of receiving favorable assessments (e.g., debt, knowledge, job-related skills). To address how individuals respond to decisions made about them through modifying their observable characteristics, a growing line of work has recently initiated the study of the strategic interactions between decision-makers and decision-subjects (see, e.g., [15, 26, 36, 30, 21]). This existing work has focused mainly on the short-term implications of strategic interactions with algorithmic assessment tools e.g., by modeling it as a single round of interaction between a principal (the decision-maker) and agents (the decision-subjects) [30]. In addition, existing work that studies interactions over time assumes that agents are myopic in responding to the decision-maker s policy [4, 42, 38, 15]. We expand the line of inquiry to multiple rounds of interactions, accounting for the impact of actions today on the outcomes players can attain tomorrow. Our multi-round model of principal-agent interactions. We take the model proposed by Kleinberg and Raghavan [30] as our starting point. In Kleinberg and Raghavan s formulation, a principal interacts with an agent once, where the interaction takes the form of a Stackelberg game. The agent receives a score y = f( , o), in which is the principal s choice of assessment parameters, and o is the agent s observable characteristics. The score is used to determine the agent s merit with respect to the quality the principal is trying to assess. (As concrete examples, y could correspond to the grade a student receives for a class, or the FICO credit score of a loan applicant.) The principal moves first, publicly announcing her assessment rule used to evaluate the agent. The agent then best responds to this assessment rule by deciding how to invest a fixed amount of effort into producing a set of observable features o that maximize his score y. Kleinberg and Raghavan characterize the assessment rules that can incentivize the agent to invest in specific types of effort (e.g., those that lead to real improvements in the quality of interest as opposed to gaming the system). We generalize the above setting to T > 1 rounds of interactions between the principal and the agent and allow for the possibility of certain effort types rolling over from one step to the next. Our key finding is that longer time horizon provides the principal additional latitude in the range of effort sequences she can incentivize the agent to produce. To build intuition as to why repeated interactions lead to the expansion of incentivizable efforts, consider the following stylized example: Example 1.1. Consider the classroom example of Kleinberg and Raghavan where a teacher (modeled as a principal) assigns a student (modeled as an agent) an overall grade y based on his observable features; in this case test and homework score. Assume that the teacher chooses an assessment rule and assigns a score y = T ETE + HW HW, where TE is the student s test score HW is his homework score, and T , HW 2 R are the weight of each score in the student s overall grade. The student can invest effort into any of three activities: copying answers on the test, studying, and looking up homework answers online. In a one-round setting where the teacher only evaluates the student once, the student may be more inclined to copy answers on the test or look up homework answers online, since these actions immediately improve the score with relatively lower efforts. However, in a multiple-round setting, these two actions do not improve the student s knowledge (which impacts the student s future grades as well), and so these efforts do not carry over to future time steps. When there are multiple rounds of interaction, the student will be incentivized to invest effort into studying, as knowledge accumulation over time takes less effort in the long-run compared to cheating every time. We revisit this example in further detail in Appendix A. Summary of our findings and techniques. We formalize settings in which the agent s effort investment today can accumulate over time in the form of an internal state impacting both his future rewards and that of the principal. We characterize the Stackelberg equilibrium of the resulting game and provide novel algorithmic techniques for computing it. We begin by establishing that for the principal, the class of all linear assessment policies remains as powerful as the larger class of all monotonic assessment policies. In particular, we prove that if there exists an assessment policy that can incentivize the agent to produce a particular sequence of effort profiles, there also exists a linear assessment policy which can incentivize the exact same effort sequence. Figure 1: Average effort spent studying vs. average effort spent cheating over time for the example in Appendix A. The line x + y = 1 represents the set of all possible Pareto optimal average effort profiles. The shaded region under the line represents the set of average effort profiles which can be incentivized with a certain time horizon. Darker shades represent longer time horizons. In the case where T = 1, it is not possible to incentivize the agent to spend any effort studying. Arrows are used to demonstrate the additional set of Pareto optimal average effort profiles that can be incentivized with each time horizon. As the time horizon increases, it becomes possible to incentivize a wider range of effort profiles. We then study the equilibrium computation problem, which in general involves optimizing non-convex objectives. Despite the initial non-convexity, we observe that when the problem is written as a function of the agent s incentivized efforts, the principal s non-convex objective becomes convex. Moreover, under a common assumption on agent s conversion mapping from efforts to observable features, the set of incentivizable effort policies is also convex. Given this structure, we provide a polynomial-time algorithm that directly optimizes the principal s objective over the set of incentivizable effort policies, which subsequently recovers agent s and principal s equilibrium strategies. Even though prior work [39, 40] has also taken this approach for solving other classes of non-convex Stackelberg games, our work has to overcome an additional challenge that the agent s set of incentivizable efforts is not known a-priori. We resolve this challenge by providing a membership oracle (that determines whether a sequence of agent efforts can be incentivized by any assessment policy), which allows us to leverage the convex optimization method due to Kalai and Vempala [28]. Our analysis reveals several intriguing insights about the role of repeated interactions in shaping the long-term outcomes of decision-makers and decision subjects: For example, we observe that with multiple rounds of assessments, both parties can be better off employing dynamic/time-sensitive strategies as opposed to static/myopic ones. Crucially, perhaps our most significant finding is that by considering the effects of multiple time-steps, the principal is significantly more effective at incentivizing the agent to accumulate effort in her desired direction (as demonstrated in Figure 1 for a stylized teacher-student example). In conclusion, our work addresses two critical gaps in the growing literature on the societal impacts of automated decision-making by focusing on longer time horizons and accounting for the compounding nature of decisions individuals receive over time. 1.1 Related work A growing line of work at the intersection of computer science and social sciences investigates the impacts of algorithmic decision-making models on people (see, e.g., [25, 44, 34, 15]). As we outline below, significant attention has been devoted to settings in which decision-subjects are strategic and respond to the decision-maker s choice of assessment rules. Liu et al. [34] and Kannan et al. [29] study how a utility-maximizing decision-maker may respond to the predictions made by a predictive rule (e.g., the decision-maker may interpret/utilize the predictions a certain way or decide to update the model entirely.) Mouzannar et al. [37] and Heidari et al. [23] propose several dynamics for how individuals within a population may react to predictive rules by changing their qualifications. Dong et al. [15], Hu et al. [26], Milli et al. [36] address strategic classification a setting in which decision subjects are assumed to respond strategically and potentially untruthfully to the choice of the predictive model, and the goal is to design classifiers that are robust to strategic manipulation. Generalizing strategic classification, Perdomo et al. [38] propose a risk-minimization framework for performative predictions, which broadly refers to settings in which the act of making a prediction influences the prediction target. Incentive-aware learning [45, 2] is another generalization that, at a high-level, seeks to characterize the conditions under which one can train predictive rules that are robust to training data manipulations. Two additional topics that are conceptually related to our work but differ in their motivating problems and goals are adversarial prediction and strategy-proof regression. The adversarial prediction prob- lem [6, 13] is motivated by settings (e.g., spam detection) in which an adversary actively manipulates data to increase the false-negative rate of the classifier. Adversarial predictions have been modeled and analyzed as a zero-sum game [13] or a Stackelberg competition [6]. Strategyproof/truthful linear regression [14, 12, 9] offers mechanisms for incentivizing agents to report their data truthfully. As mentioned earlier, many of our modeling choices closely follow Kleinberg and Raghavan [30]. Below, we provide a summary of Kleinberg and Raghavan s results and briefly mention some of the recent contributions following their footsteps. While much of prior work on strategic classification views all feature manipulation as undesirable [15, 26, 36], Kleinberg and Raghavan made a distinction between feature manipulation via gaming (investing effort to change observable features in a way that has no positive impact on the quality the principal is trying to measure) and feature manipulation via improvement (investing effort in such a way that the underlying characteristics the principal is trying to measure are improved). Their model consists of a single round of interaction between a principal and an agent, and their results establish the optimality and limits of linear assessment rules in incentivizing desired effort profiles. Several papers since then have studied similar settings (see, e.g., Miller et al. [35], Frankel and Kartik [19]) with goals that are distinct from ours. (For example, Frankel and Kartik find a fixed-point assessment rule that improves accuracy by under-utilizing the observable data and flattening the assessment rule.) Finally, we mention that principle-agent games [33] are classic economic tools to model interactions in which a self-interested entity (the agent) responds to the policy/contract enacted by another (the principal) in ways that are contrary to the principle s intentions. The principal must, therefore, choose his/her strategy accounting for the agent s strategic response. Focusing on linear strategies is a common practice in this literature [24, 7, 17]. For simplicity, we present our analysis for linear assessment rules, but later show that the class of all linear assessment policies is equally as powerful as the class of all monotone assessment policies (Theorem 3.4). 2 Problem formulation In our stateful strategic regression setting, a principal interacts with the same agent over the course of T time-steps, modeled via a Stackelberg game.1 The principal moves first, announcing an assessment policy, which consists of a sequence of assessment rules given by parameters { t}T t=1. Each t is used for evaluating the agent at round t = 1, , T. The agent then best responds to this assessment rule by investing effort in different activities, which in turn produces a series of observable features {ot}T t=1 that maximize his overall score. Through each assessment round t 2 {1, , T}, the agent receives a score yt = f( t, ot), where t is the principal s assessment parameters for round t, and ot is the agent s observable features at that time. Following Kleinberg and Raghavan, we focus on monotone assessment rules. Definition 2.1 (Monotone assessment rules). A assessment rule f( , ) : Rn ! R is monotone if f( , o) f( , o0) for ok o0 k 8k 2 {1, ..., n}. Additionally, 9k 2 {1, ..., n} such that strictly increasing ok strictly increases f( , o). For convenience, we assume the principal s assessment rules are linear, that is, yt = f( t, ot) = > t ot. Later we show that the linearity assumption is without loss of generality. We also restrict t to lie in the n-dimensional probability simplex n. That is, we require each component of t to be at least 0 and the sum of the n components equal 1. From effort investments to observable features and internal states. The agent can modify his observable features by investing effort in various activities. While these effort investments are private to the agent and the principal cannot directly observe them, they lead to features that the principal can observe. In response to the principal s assessment policy, The agent plays an effort policy, consisting of a sequence of effort profiles {et}T t=1 where each individual coordinate of et (denoted by et,j) is a function of the principal s assessment policy { t}T t=1. Specifically, the agent chooses his policy (e1, , e T ), so that it is a best-response to the the principal s assessment policy ( 1, , T ). Next, we specify how effort investment translates into observable features. We assume an agent s observable features in the first round take the form o1 = o0 + σW (e1), where o0 2 Rn is the initial value of the agent s observable features before any modification, e1 2 Rd is the effort the agent 1 To improve readability, we adopt the convention of referring to the principal as she/her and the agent as he/him throughout the paper. expends to modify his features in his first round of interaction with the principal, and σW : Rd ! Rn is the effort conversion function, parameterized by W. The effort conversion function is some concave mapping from effort expended to observable features. (For example, if the observable features in the classroom setting are test and homework scores, expending effort studying will affect both an agent s test and homework scores, although it may require more studying to improve test scores from 90% to 100% than from 50% to 60%.) Over time, effort investment can accumulate. (For example, small businesses accumulate wealth over time by following good business practices. Students learn as they study and accumulate knowledge.) This accumulation takes the form of an internal state, which has the form st = s0 + Pt 1 i=1 ei. Here 2 Rd d is a diagonal matrix in which j,j, j 2 {1, . . . , d} determines how much one unit of effort (e.g., in the jth effort coordinate, ej) rolls over from one time step to the next, and s0 is the agent s initial internal state . An agent s observable features are, therefore, a function of both the effort he expends, as well as his internal state. Specifically, ot = σW (st + et) (here σW (s0) is analogous to o0 in the single-shot setting). Note that while for simplicity, we assume the accumulating effort types are socially desirable, our results apply as well to settings where undesirable efforts can similarly accumulate. Utility functions for the agent and the principal. Given the above mapping, the agent s goal is to pick his effort profiles so that the observable features they produce maximize the sum of his scores over time, that is, the agent s utility = PT t=1 yt = PT t ot. Our focus on the sum of scores over time is a conventional choice and is motivated by real-world examples. (A small business owner who applies for multiple loans cares about the cumulative amount of loans he/she receives. A student taking a series of exams cares about his/her average score across all of them.) The principal s goal is to choose his assessment rules over time so as to maximize cumulative effort investments according to her preferences captured by a matrix . Specifically, the principal s utility 1. The principal s utility can be thought of as a weighted 1 norm of the agent s cumulative effort, where 2 Rd d is a diagonal matrix where the element jj denotes how much the principal wants to incentivize the agent invest in effort component ej.2 Constraints on agent effort. As was the case in the single-shot setting of Kleinberg and Raghavan, we assume that the agent s choice of effort et at each time t is subject to a fixed budget B (with respect to the 1 norm). Without loss of generality, we consider the case where B = 1. We explore the consequences of an alternative agent effort formulation namely a quadratic cost penalty in Appendix G. Proposition 2.2. It is possible to incentivize a wider range of effort profiles by modeling the principal-agent interaction over multiple time-steps, compared to a model which only considers one-shot interactions. See Appendix A for an example which illustrates this phenomena. 3 Equilibrium characterization The following optimization problem captures the expression for the agent s best-response to an arbitrary sequence of assessment rules.3 (Recall that d refers to the dimension of effort vectors (et s), and n refers to the number of observable features, i.e., the dimension of ot s.) The set of agent best-responses to a linear assessment policy, { t}T t=1, is given by the following optimization procedure: t=1 = arg max e1,...,e T , s.t. et,j 0, et,j 1 8t, j The goal of the principal is to pick an assessment policy { }T t=1 in order to maximize the total magnitude of the effort components she cares about, i.e. 2 Note that while we only consider diagonal 2 Rd d + , our results readily extend to general , 2 Rd d + . By focusing on diagonal matrices we have a one-to-one mapping between state and effort components. Non-diagonal corresponds to cases where different effort components can contribute to multiple state components. 3 Throughout this section when it improves readability, we denote the dimension of matrices in their subscript (e.g., Xa b means X is an a b matrix). t=1 = arg max 1,..., T t ( t, . . . , T ) , s.t. t 2 n 8t, where we abuse notation by treating e t as a function of ( t, . . . , T ). Substituting the agent s optimal effort policy into the above expression, we obtain the following formalization of the principal s assessment policy: Proposition 3.1 (Stackelberg Equilibrium). Suppose the principal s strategy space consists of all sequences of linear monotonic assessment rules. The Stackelberg equilibrium of the stateful strategic regression game, , can be specified as the following bilevel multiobjective optimization problem. As is standard throughout the literature, we assume that the agent breaks ties in favor of the principal. Moving forward, we omit the constraints on the agent and principal action space for brevity. t=1 = arg max 1,..., T t ( t, ..., T ) t=1 = arg max e1,...,e T 3.1 Linear assessment policies are optimal Throughout our formalization of the Stackelberg equilibrium, we have assumed that the principal deploys linear assessment rules, when a priori it is not obvious why the principal would play assessment rules of this form. We now show that the linear assessment policy assumption is without loss of generality. We start by defining the concept of incentivizability for an effort policy, and characterize it through a notion of a dominated effort policy. Definition 3.2 (Incentivizability). An effort policy {et}T t=1 is incentivizable if there exists an assessment policy {f( t, )}T t=1 for which playing {et}T t=1 is a best response. (Note: {et}T t=1 need not be the only best response.) Definition 3.3 (Dominated Effort Policy). We say the effort policy {et}T t=1 is dominated by another effort policy if an agent can achieve the same or higher observable feature values by playing another effort policy {at}T t=1 that does not spend the full effort budget on at least one time-step. Note that an effort policy which is dominated by another effort policy will never be played by a rational agent no matter what set of decision rules are deployed by the principal, since a better outcome for the agent will always be achievable. Theorem 3.4. For any effort policy {et}T t=1 that is not dominated by another effort policy, there exists a linear assessment policy that can incentivize it. See Appendix C for the complete proof. We characterize whether an effort policy {et}T t=1 is dominated or not by a linear program, and show that a subset of the dual variables correspond to a linear assessment policy which can incentivize it. Kleinberg and Raghavan present a similar proof for their setting, defining a linear program to characterize whether an effort profile et is dominated or not. They then show that if an effort profile is not dominated, the dual variables of their linear program correspond to a linear assessment rule which can incentivize it. While the proof idea is similar, their results do not extend to our setting because our linear program must include an additional constraint for every time-step to ensure that the budget constraint is always satisfied. We show that by examining the complementary slackness condition, we can upper-bound the gradient of the agent s cumulative score with respect to a subset of the dual variables {λt}T t=1 (where each upper bound depends on the extra term γt introduced by the linear budget constraint for that time-step). Finally, we show that when an effort policy is not dominated, all of these bounds hold with equality and, because of this, the subset of dual variables {λt}T t=1 satisfy the definition of a linear assessment policy which can incentivize the effort policy {et}T 4 Equilibrium computation for linear effort conversions While the optimization in Proposition 3.1 is nonconvex in general, we provide polynomial-time algorithms for settings in which the agent s effort conversion function can reasonably be viewed as being linear, i.e. σW = W, where W 2 Rn d is the agent s effort conversion matrix. Each component wij of W is a nonnegative term which represents how much an increase in observable feature i one unit of effort in action j translates to. While this assumption may not be realistic in some settings, it may work well for others and is a common assumption in the strategic classification literature (e.g., [42, 15, 4]). Overview of our solution. Under settings in which the effort conversion function is linear, we can rewrite the game s Stackelberg Equilibrium in a simplified form (Proposition 4.1). Under this formulation, the agent s optimal effort policy can be computed by solving a sequence of linear programs, but computing the principal s optimal assessment policy is a nonconvex optimization problem. However, when we write the principal s objective in terms of the agent s efforts (incentivized by the principal s policy), the function becomes convex. Given this observation, we design an algorithm to optimize the principal s objective over the the set of incentivizable effort profiles (instead of over the principal s policy space). To perform the optimization via convex optimization methods, we first establish that the set of effort profiles is convex and provide a membership oracle that determines if an effort profile belongs to this set. Given the membership oracle, we leverage the convex optimization method in Kalai and Vempala [28] to find the (approximate) optimal incentivizable effort profile with high probability. Given this effort policy, we can use the dual of our membership oracle to recover a linear assessment policy which can incentivize it. We begin by characterizing the Stackelberg Equilibrium in this setting. Proposition 4.1 (Stackelberg Equilibrium). Suppose the agent s effort conversion function σW is linear. The Stackelberg equilibrium of the stateful strategic regression game, , can then be specified as follows: t = arg max t=1 = arg max 1,..., T Proof Sketch. We show that under linear effort conversion functions, the agent s best response problem is linearly seperable across time, and the agent s effort profile at each time is given by a linear program. We then plug in each expression for the agent s optimal effort profile at time t into the principal s optimization problem to obtain our final result. See Appendix D for the full proof. Given the principal s assessment policy { t}T t=1, it is possible to recover the agent s optimal effort policy by solving the linear program for et at each time t. On the other hand, recovering the principal s optimal assessment policy is more difficult. The principal s optimal policy takes the form of a multiobjective bilevel optimization problem, a class of problems which are NP-Hard in general [10]. However, we are able to exploit the following proposition to give a polynomial-time algorithm for recovering the principal s optimal assessment policy. Proposition 4.2. The set of incentivizable effort policies is convex if the effort conversion function is linear. Proof Sketch. In order to show that the set of incentivizable effort policies is convex, we assume that it is not and proceed via proof by contradiction. We construct an effort policy {zt}T t=1 by taking the element-wise average of two incentivizable effort policies {xt}T t=1 and {yt}T t=1, and assume it is not incentivizable. Since {z t=1 is not incentivizable, there exists some effort policy { t=1 which dominates it. We show that if this is the case, then { t=1 must dominate either {xt}T t=1 or {yt}T t=1. This is a contradiction, since both are incentivizable. See Appendix E.1 for the full proof. Note that the linear program from Theorem 3.4 can serve as a membership oracle for this set. To see this, note that given an effort policy {et}T t=1, the LP returns a value of T if and only if {et}T t=1 is incentivizable. We now show how to leverage this membership oracle to recover the principal s optimal assessment policy in polynomial time. Define Cvx Oracle(f, M, R, r, 0, , δ) to be the membership oracle method of Kalai and Vempala [28], which, for a convex set C, takes a linear function f over the convex set C, membership oracle M to the convex set C, initial point 0 inside of C, radius R of a ball containing C, and a radius r of a ball contained in C and centered at 0 as input, and returns a member of the convex set which minimizes f up to some O( ) term, with probability at least 1 δ. We now present an informal version of their main theorem, followed by our algorithm. Theorem 4.3 (Main Theorem of Kalai and Vempala [28] (Informal)). For any convex set C 2 Rn, given a membership oracle M, starting point 0, upper bound R on the radius of the ball containing C, and lower bound r on the radius of the ball containing C, the algorithm of Kalai and Vempala [28] returns a point I such that f( I) min 2C f( ) with probability 1 δ, where the number of iterations is I = O(pn log(Rn/r δ)), and O(n4) calls to the membership oracle are made at each iteration. Algorithm 1: Assessment Policy Recovery Result: An assessment policy { t=1 Define C to be the set of incentivizable effort policies; Let f({et}T 1, where {et}T t=1 is an incentivizable effort policy; Define M to be the linear program from Theorem 3.4; Fix an arbitrary assessment policy { t,0}T t=1. Solve for initial incentivizable effort policy {et,0}T t=1 as in Proposition 1; T (d 1) 2(T (d 1)+1); t=1 = Cvx Oracle(f, M, R, r, {et,0}T t=1, , δ); Set the primal variables of M equal to {e t=1, and solve a system of linear equations to recover the dual variables { Theorem 4.4 (Optimal Assessment Policy). Let C be the set of incentivizable effort policies. Assuming that C contains a ball with radius at least r centered at {et,0}T t=1, the assessment policy { t=1 recovered by Algorithm 1 is an -optimal assessment policy, with probability at least 1 δ. Before proceeding the proof sketch for Theorem 4.4, we remark that the assumption of C containing a ball of radius r is commonplace within the membership oracle-based convex optimization literature, both in theory [20, 28], and practice (e.g., [5]). The assumption implies that if it is possible to incentivize an agent to play effort policy {et,0}T t=1, then it is also possible to incentivize them to play other effort policies within a small margin of {et,0}T Proof Sketch. The proof consists of several steps. First, note that the agent s effort policy consists of T d-dimensional probability simplexes, which is a T(d 1)-dimensional simplex. The circumradius (i.e., the minimum radius of a ball containing the T(d 1)-dimensional simplex) is R = T (d 1) 2(T (d 1)+1). Next, we observe that we can use the linear program defined in the proof of Theorem 3.4 as a membership oracle to the set of incentivizable effort policies. Finally, we observe that the function we are trying to optimize is linear and that it is possible to identify an initial point {et,0}T t=1 within the convex set C. We can then use a membership oracle-based convex optimization procedure such as Kalai and Vempala [28] to recover the incentivizable effort policy which is most desirable to the principal (up to some O( ) term, with high probability) in polynomial time. Given this effort policy, we can use the complementary slackness conditions of our membership oracle linear program to recover the corresponding dual variables, a subset of which will correspond to an assessment policy which can incentivize the agent to play this effort policy. See Appendix E for full details. The existence of such a membership oracle-based method shows that tractable algorithms exist to recover the principal s optimal assessment policy, and heuristics need not be resorted to under a large class of settings, despite the bilevel multiobjective optimization problem which must be solved. 4.1 How many rounds are necessary to implement a desired effort profile? In the classroom example, we saw that a wider range of effort profiles can be incentivized by extending the fixed budget setting of Kleinberg and Raghavan to multiple time-steps. But how long does the time horizon have to be in order to incentivize a desired effort profile if the principal can pick the time horizon? Additionally, what conditions are sufficient for an effort profile to be incentivizable? We formalize the notion of (T, t)-Implementability in the linear effort conversion function setting with these questions in mind. Definition 4.5 ((T, t)-Implementability). A basis vector bj is said to be (T, t)-implementable if a rational agent can be motivated to spend their entire effort budget on bj for all times 1 t0 t. Figure 2: Contribution of studying vs T t. We plot S (x-axis) vs T t (y-axis) for our classroom example in Appendix A. Note that S is assumed to be 1 in the appendix. Lighter colors indicate settings in which the student has more incentive to cheat. As long as S > 0, there exists some time horizon under which the student is incentivized to study. As S increases, the number of extra time-steps required to incentivize studying decreases. Theorem 4.6. If T t + maxc minm max{0,Wmc Wmj} ( jj Wmj cc Wmc) and jj Wmj cc Wmc > 0, then bj is (T, t)-implementable. See Appendix F for the full derivation. This bound shows that any basis vector is incentivizable if it accumulates faster than other effort profiles. In the worst case, the space of incentivizable effort profiles is the same as in Kleinberg and Raghavan (just set T = 1). However, if an effort component accumlates faster than other effort components, there will always exist a time horizon T for which it can be incentivized. In our classroom example, as long as the student retains some knowledge from studying, there always will exist a time horizon for which it is possible to incentivize the student to study (see Figure 2). Note that while the principal may be interested in incentivizing more than just basis vectors, there does not appear to be a closed-form lower bound for T for non-basis effort profiles. 5 Concluding discussion We proposed a simple and tractable model in which a principal assesses an agent over a series of timesteps to steer him in the direction of investment in desirable but unobservable types of activities. Our work addresses three crucial gaps in the existing literature, stemming from restricted focus on (1) short-term interactions, (2) with myopic agents, (3) ignoring the role of earlier effort investments (i.e., the state) on future rewards. We observe that within our stateful strategic regression setting, the principal is capable of implementing a more expansive space of average effort investments. Our main results consisted of algorithms for computing the equilibrium of the principal-agent interactions, and characterizing several interesting properties of the equilibrium. There are several natural extensions and directions for future work suggested by our basic model and findings. Alternative cost functions. Following Kleinberg and Raghavan [30], we assumed throughout our analysis that the agent has a fixed effort budget in each round. One natural extension of our model is to explore alternative cost formulations for the agent. In Appendix G, we provide the analysis for one natural alternative that is, a cost term which scales quadratically with the total effort expended. Our findings generally remain unaltered. The main qualitative difference between the equilibria of the fixed budget vs. quadratic cost is the following: While under the fixed budget setting, the agent s optimal effort policy is a sequence of basis vectors and the principal s optimal assessment policy generally is not, we find that the opposite is true under the quadratic cost setting. We believe the case-study of quadratic costs provides reassuring evidence for the robustness of our results to the choice of the cost function, however, we leave a more systematic study of equilibrium sensitivity to agent cost function as an interesting direction for future work. Bounded rationality. While we assumed the principal and the agent in our model respond rationally and optimally to each other s strategies, in real-world scenarios, people and institutions are often not fully rational. Therefore, it would be interesting to consider models where our players rationality is bounded, e.g., by designing assessment policies that are robust to suboptimal effort policies and are capable of implementing desired investments despite the agent s bounded rationality. Unknown model parameters & learning. We assumed the fundamental parameters of our model (e.g., σW , , and T) are public knowledge. It would be interesting to extend our work to settings where not all these parameters are known. Can we design learning algorithms that allow the players to learn their optimal policy over time as they interact with their counterparts? Other simplifying assumptions. Finally, we made several simplifying assumptions to gain the insights offered by our analysis. In particular, our algorithms for recovering the optimal principal and agent policies relied on the agent having a linear effort conversion function. It would be interesting to explore algorithms which work for a wider range of effort conversion functions. Additionally, we assumed that effort expended towards some action was time-independent (e.g., one hour spent studying today is equivalent to one hour spent studying yesterday). It would be interesting to relax this assumption and study settings in which the accumulation of effort is subjected to a discount factor. 6 Acknowledgements This research is supported in part by the NSF FAI Award #1939606. The authors would like to thank Anupam Gupta and Gabriele Farina for helpful discussions about convex optimization techniques, and Gokul Swamy for helpful comments and suggestions. [1] A. Agarwal, A. Beygelzimer, M. Dudík, J. Langford, and H. M. Wallach. A reductions approach to fair classification. In J. G. Dy and A. Krause, editors, Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pages 60 69. PMLR, 2018. URL http://proceedings.mlr.press/v80/agarwal18a.html. [2] S. Ahmadi, H. Beyhaghi, A. Blum, and K. Naggita. The strategic perceptron. In P. Biró, S. Chawla, and F. Echenique, editors, EC 21: The 22nd ACM Conference on Economics and Computation, Budapest, Hungary, July 18-23, 2021, pages 6 25. ACM, 2021. doi: 10.1145/ 3465456.3467629. URL https://doi.org/10.1145/3465456.3467629. [3] J. Angwin, J. Larson, S. Mattu, and L. Kirchner. Machine bias. Propublica, 2016. URL https://www.propublica.org/series/machine-bias. [4] Y. Bechavod, K. Ligett, Z. S. Wu, and J. Ziani. Gaming helps! learning from strategic interac- tions in natural dynamics. In A. Banerjee and K. Fukumizu, editors, The 24th International Conference on Artificial Intelligence and Statistics, AISTATS 2021, April 13-15, 2021, Virtual Event, volume 130 of Proceedings of Machine Learning Research, pages 1234 1242. PMLR, 2021. URL http://proceedings.mlr.press/v130/bechavod21a.html. [5] A. Blum, N. Haghtalab, and A. D. Procaccia. Learning optimal commitment to overcome insecurity. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pages 1826 1834, 2014. URL https://proceedings.neurips.cc/paper/2014/hash/ cc1aa436277138f61cda703991069eaf-Abstract.html. [6] M. Brückner and T. Scheffer. Stackelberg games for adversarial prediction problems. In C. Apté, J. Ghosh, and P. Smyth, editors, Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 21-24, 2011, pages 547 555. ACM, 2011. doi: 10.1145/2020408.2020495. URL https://doi.org/ 10.1145/2020408.2020495. [7] G. Carroll. Robustness and linear contracts. American Economic Review, 105(2):536 63, February 2015. doi: 10.1257/aer.20131159. URL https://www.aeaweb.org/articles? id=10.1257/aer.20131159. [8] L. E. Celis, L. Huang, V. Keswani, and N. K. Vishnoi. Classification with fairness constraints: A meta-algorithm with provable guarantees. In danah boyd and J. H. Morgenstern, editors, Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* 2019, Atlanta, GA, USA, January 29-31, 2019, pages 319 328. ACM, 2019. doi: 10.1145/3287560. 3287586. URL https://doi.org/10.1145/3287560.3287586. [9] Y. Chen, C. Podimata, A. D. Procaccia, and N. Shah. Strategyproof linear regression in high dimensions: an overview. SIGecom Exch., 17(1):54 60, 2018. doi: 10.1145/3331033.3331038. URL https://doi.org/10.1145/3331033.3331038. [10] B. Colson, P. Marcotte, and G. Savard. An overview of bilevel optimization. Ann. Oper. Res., 153(1):235 256, 2007. doi: 10.1007/s10479-007-0176-2. URL https://doi.org/10.1007/ s10479-007-0176-2. [11] S. Corbett-Davies and S. Goel. The measure and mismeasure of fairness: A critical review of fair machine learning. Co RR, abs/1808.00023, 2018. URL http://arxiv.org/abs/1808. 00023. [12] R. Cummings, S. Ioannidis, and K. Ligett. Truthful linear regression. In P. Grünwald, E. Hazan, and S. Kale, editors, Proceedings of The 28th Conference on Learning Theory, COLT 2015, Paris, France, July 3-6, 2015, volume 40 of JMLR Workshop and Conference Proceedings, pages 448 483. JMLR.org, 2015. URL http://proceedings.mlr.press/v40/Cummings15.html. [13] N. N. Dalvi, P. M. Domingos, Mausam, S. K. Sanghai, and D. Verma. Adversarial classification. In W. Kim, R. Kohavi, J. Gehrke, and W. Du Mouchel, editors, Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, August 22-25, 2004, pages 99 108. ACM, 2004. doi: 10.1145/1014052. 1014066. URL https://doi.org/10.1145/1014052.1014066. [14] O. Dekel, F. A. Fischer, and A. D. Procaccia. Incentive compatible regression learning. J. Comput. Syst. Sci., 76(8):759 777, 2010. doi: 10.1016/j.jcss.2010.03.003. URL https: //doi.org/10.1016/j.jcss.2010.03.003. [15] J. Dong, A. Roth, Z. Schutzman, B. Waggoner, and Z. S. Wu. Strategic classification from revealed preferences. In É. Tardos, E. Elkind, and R. Vohra, editors, Proceedings of the 2018 ACM Conference on Economics and Computation, Ithaca, NY, USA, June 18-22, 2018, pages 55 70. ACM, 2018. doi: 10.1145/3219166.3219193. URL https://doi.org/10.1145/ 3219166.3219193. [16] M. Donini, L. Oneto, S. Ben-David, J. Shawe-Taylor, and M. Pontil. Empirical risk minimization under fairness constraints. In S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, Neur IPS 2018, December 3-8, 2018, Montréal, Canada, pages 2796 2806, 2018. URL https://proceedings.neurips. cc/paper/2018/hash/83cdcec08fbf90370fcf53bdd56604ff-Abstract.html. [17] P. Dütting, T. Roughgarden, and I. Talgam-Cohen. Simple versus optimal contracts. In A. Karlin, N. Immorlica, and R. Johari, editors, Proceedings of the 2019 ACM Conference on Economics and Computation, EC 2019, Phoenix, AZ, USA, June 24-28, 2019, pages 369 387. ACM, 2019. doi: 10.1145/3328526.3329591. URL https://doi.org/10.1145/3328526.3329591. [18] C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. S. Zemel. Fairness through awareness. In S. Goldwasser, editor, Innovations in Theoretical Computer Science 2012, Cambridge, MA, USA, January 8-10, 2012, pages 214 226. ACM, 2012. doi: 10.1145/2090236.2090255. URL https://doi.org/10.1145/2090236.2090255. [19] A. Frankel and N. Kartik. Improving Information from Manipulable Data. Journal of the European Economic Association, 06 2021. ISSN 1542-4766. doi: 10.1093/jeea/jvab017. URL https://doi.org/10.1093/jeea/jvab017. jvab017. [20] M. Grötschel, L. Lovász, and A. Schrijver. Geometric Algorithms and Combinatorial Optimiza- tion, volume 2 of Algorithms and Combinatorics. Springer, 1988. ISBN 978-3-642-97883-8. doi: 10.1007/978-3-642-97881-4. URL https://doi.org/10.1007/978-3-642-97881-4. [21] M. Hardt, N. Megiddo, C. H. Papadimitriou, and M. Wootters. Strategic classification. In M. Sudan, editor, Proceedings of the 2016 ACM Conference on Innovations in Theoretical Computer Science, Cambridge, MA, USA, January 14-16, 2016, pages 111 122. ACM, 2016. doi: 10.1145/2840728.2840730. URL https://doi.org/10.1145/2840728.2840730. [22] M. Hardt, E. Price, and N. Srebro. Equality of opportunity in supervised learning. In D. D. Lee, M. Sugiyama, U. von Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pages 3315 3323, 2016. URL https://proceedings.neurips.cc/paper/2016/hash/ 9d2682367c3935defcb1f9e247a97c0d-Abstract.html. [23] H. Heidari, V. Nanda, and K. P. Gummadi. On the long-term impact of algorithmic decision policies: Effort unfairness and feature segregation through social learning. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 2692 2701. PMLR, 2019. URL http://proceedings. mlr.press/v97/heidari19a.html. [24] B. Holmstrom and P. Milgrom. Aggregation and linearity in the provision of intertemporal incentives. Econometrica, 55(2):303 328, 1987. ISSN 00129682, 14680262. URL http: //www.jstor.org/stable/1913238. [25] L. Hu and Y. Chen. A short-term intervention for long-term fairness in the labor market. In P. Champin, F. Gandon, M. Lalmas, and P. G. Ipeirotis, editors, Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, April 23-27, 2018, pages 1389 1398. ACM, 2018. doi: 10.1145/3178876.3186044. URL https://doi.org/10.1145/ 3178876.3186044. [26] L. Hu, N. Immorlica, and J. W. Vaughan. The disparate effects of strategic manipulation. In danah boyd and J. H. Morgenstern, editors, Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* 2019, Atlanta, GA, USA, January 29-31, 2019, pages 259 268. ACM, 2019. doi: 10.1145/3287560.3287597. URL https://doi.org/10.1145/ 3287560.3287597. [27] J. Jagtiani and C. Lemieux. The roles of alternative data and machine learning in fintech lending: Evidence from the lendingclub consumer platform. Financial Management, 48(4):1009 1029, 2019. doi: https://doi.org/10.1111/fima.12295. URL https://onlinelibrary.wiley.com/ doi/abs/10.1111/fima.12295. [28] A. T. Kalai and S. S. Vempala. Simulated annealing for convex optimization. Math. Oper. Res., 31(2):253 266, 2006. doi: 10.1287/moor.1060.0194. URL https://doi.org/10.1287/ moor.1060.0194. [29] S. Kannan, A. Roth, and J. Ziani. Downstream effects of affirmative action. In danah boyd and J. H. Morgenstern, editors, Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* 2019, Atlanta, GA, USA, January 29-31, 2019, pages 240 248. ACM, 2019. doi: 10.1145/3287560.3287578. URL https://doi.org/10.1145/3287560.3287578. [30] J. M. Kleinberg and M. Raghavan. How do classifiers induce agents to invest effort strategically? ACM Trans. Economics and Comput., 8(4):19:1 19:23, 2020. doi: 10.1145/3417742. URL https://doi.org/10.1145/3417742. [31] J. M. Kleinberg, S. Mullainathan, and M. Raghavan. Inherent trade-offs in the fair determination of risk scores. In C. H. Papadimitriou, editor, 8th Innovations in Theoretical Computer Science Conference, ITCS 2017, January 9-11, 2017, Berkeley, CA, USA, volume 67 of LIPIcs, pages 43:1 43:23. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017. doi: 10.4230/LIPIcs. ITCS.2017.43. URL https://doi.org/10.4230/LIPIcs.ITCS.2017.43. [32] D. Kuˇcak, V. Juriˇci c, and G. Ðambi c. Machine learning in education - a survey of current research trends. Annals of DAAAM & Proceedings, 29:406 410, 2018. doi: 10.2507/29th. daaam.proceedings.059. [33] J.-J. Laffont and D. Martimort. The Theory of Incentives: The Principal-Agent Model. Princeton University Press, 2009. ISBN 9781400829453. doi: doi:10.1515/9781400829453. URL https://doi.org/10.1515/9781400829453. [34] L. T. Liu, S. Dean, E. Rolf, M. Simchowitz, and M. Hardt. Delayed impact of fair machine learning. In S. Kraus, editor, Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019, pages 6196 6200. ijcai.org, 2019. doi: 10.24963/ijcai.2019/862. URL https://doi.org/10.24963/ijcai. 2019/862. [35] J. Miller, S. Milli, and M. Hardt. Strategic classification is causal modeling in disguise. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 6917 6926. PMLR, 2020. URL http://proceedings.mlr.press/v119/miller20b.html. [36] S. Milli, J. Miller, A. D. Dragan, and M. Hardt. The social cost of strategic classification. In danah boyd and J. H. Morgenstern, editors, Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* 2019, Atlanta, GA, USA, January 29-31, 2019, pages 230 239. ACM, 2019. doi: 10.1145/3287560.3287576. URL https://doi.org/10.1145/ 3287560.3287576. [37] H. Mouzannar, M. I. Ohannessian, and N. Srebro. From fair decision making to social equality. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pages 359 368. ACM, 2019. [38] J. C. Perdomo, T. Zrnic, C. Mendler-Dünner, and M. Hardt. Performative prediction. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 7599 7609. PMLR, 2020. URL http://proceedings.mlr.press/v119/perdomo20a.html. [39] A. Roth, J. Ullman, and Z. S. Wu. Watch and learn: Optimizing from revealed preferences feedback. In Proceedings of the Forty-Eighth Annual ACM Symposium on Theory of Computing, STOC 16, page 949 962, New York, NY, USA, 2016. Association for Computing Machinery. ISBN 9781450341325. doi: 10.1145/2897518.2897579. URL https://doi.org/10.1145/ 2897518.2897579. [40] A. Roth, A. Slivkins, J. Ullman, and Z. S. Wu. Multidimensional dynamic pricing for welfare maximization. ACM Trans. Econ. Comput., 8(1), Apr. 2020. ISSN 2167-8375. doi: 10.1145/ 3381527. URL https://doi.org/10.1145/3381527. [41] J. Sánchez-Monedero, L. Dencik, and L. Edwards. What does it mean to solve the problem of discrimination in hiring?: social, technical and legal perspectives from the UK on automated hiring systems. In M. Hildebrandt, C. Castillo, L. E. Celis, S. Ruggieri, L. Taylor, and G. Zanfir-Fortuna, editors, FAT* 20: Conference on Fairness, Accountability, and Transparency, Barcelona, Spain, January 27-30, 2020, pages 458 468. ACM, 2020. doi: 10.1145/3351095. 3372849. URL https://doi.org/10.1145/3351095.3372849. [42] Y. Shavit, B. L. Edelman, and B. Axelrod. Causal strategic linear regression. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 8676 8686. PMLR, 2020. URL http://proceedings.mlr.press/v119/shavit20a.html. [43] L. Sweeney. Discrimination in online ad delivery. Commun. ACM, 56(5):44 54, 2013. doi: 10.1145/2447976.2447990. URL https://doi.org/10.1145/2447976.2447990. [44] B. Ustun, A. Spangher, and Y. Liu. Actionable recourse in linear classification. In danah boyd and J. H. Morgenstern, editors, Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* 2019, Atlanta, GA, USA, January 29-31, 2019, pages 10 19. ACM, 2019. doi: 10.1145/3287560.3287566. URL https://doi.org/10.1145/3287560.3287566. [45] H. Zhang and V. Conitzer. Incentive-aware PAC learning. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pages 5797 5804. AAAI Press, 2021. URL https://ojs.aaai.org/index.php/AAAI/article/view/16726.