# strategic_classification_with_externalities__a99c6859.pdf

Published as a conference paper at ICLR 2025

STRATEGIC CLASSIFICATION WITH EXTERNALITIES

Safwan Hossain1, Evi Micha2, Yiling Chen1, Ariel Procaccia1

1 Harvard University 2 University of Southern California shossain@g.harvard.edu, evi.micha@usc.edu

We propose a new variant of the strategic classification problem: a principal reveals a classifier, and n agents report their (possibly manipulated) features to be classified. Motivated by real-world applications, our model crucially allows the manipulation of one agent to affect another; that is, it explicitly captures interagent externalities. The principal-agent interactions are formally modeled as a Stackelberg game, with the resulting agent manipulation dynamics captured as a simultaneous game. We show that under certain assumptions, the pure Nash Equilibrium of this agent manipulation game is unique and can be efficiently computed. Leveraging this result, PAC learning guarantees are established for the learner: informally, we show that it is possible to learn classifiers that minimize loss on the distribution, even when a random number of agents are manipulating their way to a pure Nash Equilibrium. We also comment on the optimization of such classifiers through gradient-based approaches. This work sets the theoretical foundations for a more realistic analysis of classifiers that are robust against multiple strategic actors interacting in a common environment.

1 INTRODUCTION

Machine learning algorithms are increasingly deployed in high-stakes decision making, including loan applications, school admissions, hiring, and insurance claims (Bejarano Carbo, 2021; Harwell, 2022; Kumar et al., 2022). Relying on past data as a reference, these algorithms use features of a candidate to determine their merit for the given task. The interactions in these settings, however, often involve strategic agents who may manipulate their features if doing so yields a more favorable outcome. This presents a significant challenge to the algorithm if it is not trained to anticipate such behavior since the training and test distributions no longer match. Correspondingly, a large and growing literature on strategic classification has emerged to understand the dynamics of this behavior and propose strategies for learning in such settings (Hardt et al., 2016).

By and large, the existing literature roughly models the interaction as follows: The learner (or principal) deploys a classifier, and an agent with feature x observes this and may choose to report a manipulated feature x to obtain a better outcome. Manipulation is not considered free since it may involve some additional effort or risk. While the classifier may be deployed for any number of agents, agent interactions with the learner are crucially assumed to be independent; in other words, one agent s action has no influence on another. We posit that this ignores a critical aspect of multiagent interactions in a shared environment: agents actions exert externality on one another. This is not a new observation: indeed, the notion that agents in a system are affected by a common resource has been a core component of economic models for over a century (Pigou, 1920; Coase, 1960). It has not, however, been formally studied in the strategic classification setting despite its relevance and applicability.

Consider the example Hardt et al. (2016) used to initiate this literature: the number of books in the parent s household may be a valid feature for university admissions given its correlation with student success (Evans et al., 2010). Since books are relatively cheap , this allows parents to game the system by buying an attic full of unread books (Hardt et al., 2016). It is immediate, however, that if all parents do this, then the price of books no longer stays cheap. This is the externality induced

Corresponding Authors; Equal Contribution

Published as a conference paper at ICLR 2025

by everyone s actions. Externality, however, can not only model demand for lucrative features but also the additional risk or burden associated with manipulation. When a university is deciding which students to accept from a high school using a classification algorithm, applicants may report features such as class rank, GPA, volunteering affiliations, and so on. Naturally, only a handful of students can be at the top of their class or be affiliated with a specific organization. If only a handful of students manipulate their features, it may not be statistically discernible. However, if many claim to be at the top of their class or work with the same organization, this can stand out and lead the university to audit the applications further. Similarly, a few applicants inflating their credit score on a loan application may be treated as an anomaly; a large pool of applicants doing so becomes a systematic problem that demands a reexamination of the loan process, causing a burden on all.

The preceding examples illustrate that the demand for books, the risk faced by a student, or the burden on the borrower is not solely determined by an individual s action but also influenced by the interactions of others within the same system. This naturally affects how, what, and if an agent manipulates. This phenomenon applies to most settings where strategic classification is pertinent. As such, it is imperative to correctly model and understand this phenomenon to ensure that classifiers deployed in sensitive settings act as intended. We specifically study the following questions:

How should we jointly model the strategic interactions between agents alongside those with the learner? What is the appropriate notion of equilibrium in this setting and what properties does it have? What learning guarantees can be given for classifiers in such settings?

1.1 OUR CONTRIBUTIONS

We study the problem of deploying classifiers in strategic multi-agent settings with inter-agent externalities from a theoretical perspective. Consistent with prior literature, the interaction between the learner and the agents is modeled as a Stackelberg game: the learner first commits to a classifier to which agents can respond. The resulting inter-agent interactions are captured as a simultaneous game, and the Stackelberg-Nash Equilibrium is proposed as the solution concept for all interactions. We precisely define these aspects of the multi-agent strategic classification game in Section 2. Learning in this setting is challenging, not least due to the possible multiplicity of equilibrium, their computation, and their dynamics due to a changing classifier. Section 3 motivates a set of structural assumptions on the cost and externality, under which the inter-agent simultaneous game has a unique pure Nash Equilibrium that is efficiently computable. Building on this insight, Section 4 comments on the regularity of the Nash Equilibrium under changing classifiers, and provides probably approximately correct (PAC) learning guarantees for computing the Stackelberg-Nash Equilibrium. Intuitively, it is possible to learn loss-minimizing classifiers that generalize to a random number of agents manipulating to achieve a Nash Equilibrium. We also differentiate through the equilibrium solution to explicitly characterize the loss gradient, illustrating the feasibility of gradient-based optimization algorithms. Section 5 uses our motivating examples to model some externality functions that can be captured by our general framework. Lastly, in Section 7, we discuss some limitations of our work alongside technical and conceptual extensions.

1.2 RELATED WORK

Our work builds on the growing literature on strategic classification (Hardt et al., 2016; Br uckner et al., 2012; Br uckner & Scheffer, 2009). In the basic setting, a learner seeks to release a classifier that accounts for the fact that strategic agents may misreport their true features to maximize their utility, which is determined by the likelihood of a positive outcome and the cost incurred for misreporting.

Recent papers have studied extensions of the basic strategic classification setting. For example, Levanon & Rosenfeld (2022) introduce a setting where agents utility functions capture different intentions beyond simply maximizing for the positive outcome. Others aim to account for limited information (Ghalme et al., 2021; Harris et al., 2022; Bechavod et al., 2022), unknown utilities (Dong et al., 2018) and causal effects (Miller et al., 2020; Horowitz & Rosenfeld, 2023). These extensions do not handle externalities or multi-agent behaviour in general.

Published as a conference paper at ICLR 2025

While motivated by a different aspect of strategic classification, the work of Eilat et al. (2023), which studies strategic classification in a graph setting, has conceptual similarities with our work. Node classification on graphs naturally depends on neighboring nodes features, which strategic agents can exploit. While this implicitly models agent interactions as not wholly independent, there are several fundamental differences from our work. First, agents in their model are interrelated due to the classifier outcome of one agent depending on neighboring nodes; in contrast, we consider the classifier outcome for an agent to only depend on their feature, with the externality explicitly capturing additional risk or cost to agents due to others also interacting in the system. This better aligns with our motivation to capture classification dynamics in competitive high-stakes settings. Furthermore, they consider agents myopically best responding over a sequence of rounds whereas we consider the performance of a classifier at a Nash Equilibrium.

Further afield, machine learning researchers have investigated the effect of strategic behavior on social welfare (Milli et al., 2019; Haghtalab et al., 2020; Kleinberg & Raghavan, 2020) or on different groups of agents (Hu et al., 2019). However, none of these papers consider the direct impact that manipulation by others can have on each individual agent. The impact of such externalities has, nonetheless, been studied in computational problems like auctions (Agarwal et al., 2020), data markets (Hossain & Chen, 2024), facility location (Li et al., 2019), and for long-term fairness accountability (Heidari et al., 2019).

Preliminaries: For k N, let [k] = {1, . . . , k}. We consider k strategic agents interacting with a learner who releases a classifier parametrized by weights ω. Using one of our running examples, this corresponds to k students applying for admission to a university, which decides to accept or reject using classifier fω. Let xi Rd and yi { 1, +1} denote the feature vector and true class of agent i respectively, with (xi, yi) D. Let X Rk d denote the k agents feature matrix, and y { 1, +1}k the vector of their true class labels1, sampled independently from D; we use the shorthand (X, y) D D | {z } k Dk.

We use the notation Xi: and X:j to respectively refer to the ith row and jth column of the matrix X, and [X1; X2] to denote two matrices concatenated along rows.

Our focus will be on binary linear classifiers due to their wide-spread practical usage and popularity within the strategic classification literature2 (Hardt et al., 2016; Dong et al., 2018; Bechavod et al., 2022); that is, ω Ω Rd and fω(x) = x, ω denotes the score for positive classification. The learner may use any Lipschitz loss function3 to compute the loss and maximize the accuracy of their deployed classifier with respect to the true labels.

Utility and Externality: Following the standard strategic classification model, we consider a Stackelberg interaction between the learner and the agents (Hardt et al., 2016). That is, the learner first releases a classifier fω, and thereafter, agents submit their features to be classified. An agent need not be truthful and may instead submit a manipulated feature vector x i to receive a higher score for the positive class. Consistent with the strategic classification literature (Dong et al., 2018; Bechavod et al., 2022), we assume agent utilities are proportional to the score: fω(x i)g+(xi), where g+(xi) : Rd R is an agent specific gain for positive classification4.

In line with prior work, we assume agent features lie within a bounded region (without loss of generality [0, 1]d) and they incur a cost c(xi, x i) in modifying their true feature vector. However, in contrast to these earlier models, we also consider agents decisions causing externality to others. Conceptually, externality is the impact agents actions have on one another when interacting within a

1In general, we use bold capital letters to refer to matrices, and bold lower case letters to refer to vectors. 2Our results can be generalized to multi-class linear classifiers. 3This includes nearly any reasonable loss function including Hinge Loss, Cross Entropy Loss, MSE, etc. 4It may be common to associate a type vector θi with an agent and express gain in terms of g+(θi, xi). This is equivalent to our utility model as the feature vector can include the type. This type of dependence can be extended to our cost and externality functions as well.

Published as a conference paper at ICLR 2025

common system. Formally, the negative externality suffered by agent i due to an agent j is captured by the function t(xi, x i, xj, x j). We use T(xi, x i, X, X ) = P

j =i t(xi, x i, xj, x j) to denote the total externality suffered by agent i due to all other participating agents decision. In summary, the total utility of an agent i in reporting x i in our multi-agent strategic classification game is given by:

ui(X, X , fω) = fω(x i)g+(xi) c(xi, x i) T(xi, x i, X, X ) (1) The standard literature on strategic classification assumes the principal knows the cost function (Hardt et al., 2016; Milli et al., 2019; Levanon & Rosenfeld, 2021). We extend this assumption to externality functions as well.

Game-Theoretic Model: Since an agent s total utility depends on others reports, the interaction between the agents is naturally modeled as a game. Taking inspiration from prior work on strategic regression (Chen et al., 2018; Hossain & Shah, 2021) and related settings (Nakamura, 2015), we model this inter-agent interaction as a simultaneous game. Correspondingly, the best response of agent i given classifier fω and reports of others is arg maxx i [0,1]d ui(X, X , fω).

For a fixed classifier fω, the natural solution concept for the agent game is a pure Nash Equilibrium (PNE). Formally, a set of reported values Xq = [xq i ; . . . ; xq i ] is a PNE if for each agent i and fixing the reported values of others, xq i is a best response for agent i. Intuitively, no agent can unilaterally improve their total utility at this equilibrium. Since PNE is not necessarily unique, let NE(X, fω) denote the correspondence from the true features to the set of features reported at an equilibrium under classifier fω. That is: Xq NE(X, fω).

Learning Problem: While the relationship between the k agents corresponds to a simultaneous game, the principal and agent interactions still induce a Stackelberg game since the principal first commits to a classifier fω. The Stackelberg equilibrium in the classical setting is with respect to a single agent best responding under the released classifier; however, our multi-agent setting implies the principal must now react according to the Nash Equilibrium induced by their chosen classifier. Thus, the principal s goal is to compute the Stackelberg-Nash Equilibrium (Xu et al., 2016), which corresponds to a classifier that performs optimally when sampled features reach a pure Nash Equilibrium with respect to the chosen classifier.

Since the number of participants in the classification setting (e.g., the number of students applying for admissions) is not fixed, let kmax denote the maximum number of such participants, and let K denote a distribution over [kmax]. The principal has access to a dataset of n labeled samples S = {(X1, y1, k1), . . . , (Xn, yn, kn)}, where sample j consists of kj K and (Xj, yj) Dkj. That is, the jth samples consist of the features and labels of the kj participating agents. The principal uses this training set S to learn a classifier that performs well in practice. The optimal classifier is formally defined as follows: Definition 1. The optimal classifier parameter ω in the multi-agent strategic classification game with loss function ℓ, participant distribution K, and data distribution D is given by:

ω = arg min ω Ω Ek KE(X,y) Dk max Xq NE(X,fω)

i=1 ℓ(yi, fω(xq i ))

The learning problem in this Stackelberg-Nash game is non-trivial for several reasons. First, it requires optimizing over the PNE of the sampled feature vectors. We discuss the efficient computation of this equilibrium in Section 3 under structural assumptions on the cost and externality. Second, we must not only generalize with respect to the equilibrium of sampled agents but also when the number of such agents is random. We address this and the optimization of fω from a dataset S in Section 4.

3 EQUILIBRIUM PROPERTIES

Core to our problem is computing the PNE achieved by the sampled agents since it informs the loss suffered by the principal under their chosen classifier. Thus, understanding equilibrium properties such as existence, uniqueness, and computation is crucial to any learning algorithm. However, such properties cannot be precisely stated without a structured model of the agent utilities, which in our game is largely predicated upon the cost and externality. We formally state our assumptions below:

Published as a conference paper at ICLR 2025

1. The cost is given by the ℓ2 norm of the manipulation vector: c(xi, xi) = α||xi x i||2 2 2. Externalities are pairwise symmetric: i, j, t(xi, x i, xj, x j) = t(xj, x j, xi, x i).

3. Total externality faced by any agent i, T(xi, x i, X, X ) is smooth and convex in the manipulation variables (x 1, . . . , x k).

Initial work on strategic classification, including Hardt et al. (2016), assumed the cost function to be separable, a somewhat restrictive assumption. More recent work models cost as the norm of the manipulation vector x x as it satisfies natural axioms associated with being a metric (Levanon & Rosenfeld, 2022; Horowitz & Rosenfeld, 2023). The most common choice is the ℓ2 norm, which we adopt (our results do hold for a larger class of cost functions - see the end of this section).

Structural assumptions on externality models are routine in economic literature. In competitive settings, which describe many high-stakes classification tasks, symmetric externalities are natural and commonplace (Goeree et al., 2005; Hossain & Chen, 2024). Settings such as data markets (Agarwal et al., 2020) and facility location (Li et al., 2019) further capture externality under a linear model, which our model subsumes. To give an example of a non-linear but convex externality, consider the loan application setting where a bank (the learner) is screening candidates for approval. The more candidates manipulate their reports, the more likely the bank s approval process will become stricter, adversely affecting all candidates. The function t(xi, x i, xj, x j) = (||x i xi||2 + ||x j xj||2)2, which increases as the manipulation increases can model this setting. It is convex since the square of a non-negative convex function is convex. While this is one example, Section 5 contains a more detailed exploration of externality models, and shows that our results hold for a broader class of functions, including non-convex ones.

Returning to the PNE analysis of the manipulation game, we first show in Theorem 1 (proof in Appendix B) that when externalities are pairwise symmetric, the resulting interactions can be characterized by a potential game5. Such games use a single function, the potential function Φ, to encapsulate the difference in utility to any agent i due to their unilateral change in action (Monderer & Shapley, 1996). We formally define this below for convenience. Definition 2. An n player simultaneous game is a potential game if there exists a potential function Φ : An R mapping from joint action space to the reals such that for any agent i: ui(ai, a i) ui(a i, a i) = Φ(ai, a i) Φ(a i, a i), where ui is the ith agent s utility. Theorem 1. For any classifier fω, if the externalities are pairwise symmetric (Condition 2), the agent interactions constitute a potential game with the potential function:

Φ(X, X , fω) =

i=1 fω(x i)g+(xi)

i=1 c(xi, x i)

j>i t(xi, x i, xj, x j).

Since all player incentives map to the potential function, PNE strategies exactly correspond to local maxima of the potential function (Roughgarden, 2010).6 Indeed, if an agent could improve their utility by modifying their action, the value of the potential function at this updated action also increases. This has several implications. First, finding an equilibrium is equivalent to finding such local maxima, and second, properties of local maxima of Φ directly inform the multiplicity/uniqueness of the equilibria. Lastly, suppose agents sequentially play their best response from any starting point. In that case, the potential function value is strictly increasing, giving a simple local algorithm for arriving at a PNE.

Our next result shows that the potential function is strictly concave under our set of assumptions. For such functions, any local optimum is also the global optimum, such an optimum is unique, and the PNE must be at this optimum (Neyman, 1997). Correspondingly, the Nash Equilibrium is unique and can be written as the outcome of a convex optimization problem7. We present this formally below (proof in the Appendix B): Theorem 2. Under Conditions 1, 2 and 3 the potential function specified in Theorem 1 is strictly concave. Further, the Nash Equilibrium is unique and is given by the function, arg max X [0,1]k d Φ(X, X , fω).

5Appendix B also shows a type of global externality can also be modeled as a potential game. 6In this setting, a local maxima refers to a point that is optimal with respect to changes in any one direction. 7Maximizing a concave objective under convex constraints, is a convex optimization problem.

Published as a conference paper at ICLR 2025

This implies that NE(X, fω) is now a function mapping true value to the unique PNE. This resolves the ambiguity of learning in the presence of multiple equilibria. Further, this function is the outcome of a parameterized convex optimization problem (in terms of the classifier fω).

The result above hinges on the strict concavity of the potential function. While our stated conditions imply this, we note that Conditions 1 and 3 can be replaced with a weaker one:

i c(xi, x i) + P

j>i t(xi, x i, xj, x j), is smooth and strictly convex

We denote this as the cumulative impact of manipulation total cost and the sum of all combinations of externality. Due to our pairwise symmetry condition, P

j>i t(xi, x i, xj, x j) = 1 2 Pn i=1 T(xi, x i, X, X ); since convexity is preserved through summation, for any strictly convex cost (i.e. Condition 1, ℓ2 norm squared cost) and any externality where T(xi, x i, X, X ) is convex (Condition 3) the updated condition above is immediately satisfied. This condition however is more general and allows us to capture a larger class of possibly non-convex externality within our framework. We have a detailed discussion of some of these richer models in Section 5.

4 LEARNING AND GENERALIZATION

We now tackle the problem of learnability in our setting. Broadly speaking, the learner s goal is to ensure that a classifier trained on a finite dataset to anticipate agents misreporting to an equilibrium, performs well on the broader population distribution which exhibits similar behaviour. A unique aspect of our setting is the number of agents participating in a given instance need not be fixed. Taking university admissions as an example, the number of applicants in a given year is not constant but rather a random variable. Our generalization result should accommodate this variability in the number of players in each instance, alongside the equilibrium they reach.

Recall from Section 2 that kmax denotes the largest number of simultaneous participants with K, a distribution over [kmax], and D, the data distribution that we sample from, i.e., (xi, yi) D. The learner has access to a training dataset of n instances, where each instance involves a sample ki K, and then ki independent samples from the distribution D. Observe that the following is an equivalent sampling procedure: sample ki K, draw kmax independent samples from D, and then only consider the first ki elements. We use this latter procedure for the generalization result; formally, we define a joint distribution D = D D | {z } kmax K. The learner has access to n samples

from this distribution, representing their training set: S = {(X1, y1, k1), . . . , (Xn, yn, kn)}, where Xi Rkmax d, yi Rkmax, and ki [kmax]. Note that each sample/instance in this dataset can be seen as holding the attributes of ki individuals.

For any chosen classifier fω, the participating agents reach a PNE and the classifier suffers a resulting loss based on this. In the preceding section, we show this equilibrium to be the solution of a convex optimization problem. To provide formal learning guarantees, however, it is important to understand how the outcome of this optimization (i.e. the PNE) behaves as the classifier changes. Indeed, if the PNE drastically changes due to small changes in ω, learning can be challenging.

Let NE(X, fω, k) : Rkmax d Rd [kmax] Rkmax d, where the first k rows of the output correspond to the equilibrium solution when k agents are participating. We prove in the following result that the potential function optimum (and thus the PNE strategy) is Lipschitz continuous in ω. The theorem statement treats inputs to Φ and NE as vectors. This is without loss of generality since our X Rkmax d feature matrix is equivalent to a x Rdkmax vector, with the vector ℓ2 norm ||x||2 equivalent to the matrix Frobenius norm ||X||F . Lemma 1. Let Φ(x, x , fω) be the potential function for kmax agents (x, x Rdkmax). Then for η = γ

c , where c = 1

2 minx ,ω(λmin( 2 x Φ(x, x , fω))) and γ = maxx ,ω || 2 ωx Φ(x, x , fω)||+1, the function NE(x, fω, k) is η-Lipschitz in ω (under the || ||2 norm) for any k.

The proof (deferred to the Appendix C) is technical and in fact shows the Lipschitz continuity for a general class of parametrized convex optimization problems. Noting that local Lipschitzness suffices, we lower bound the difference between an optimal and sub-optimal solution using properties of strict concavity and smoothness of the potential function. The remainder of the proof leverages the continuity of the objective in both x and ω along with the insights above to establish

Published as a conference paper at ICLR 2025

the Lipschitz-ness of the maximizer of Φ. To relate this to NE(X, fω, k), we define the optimization objective as summing the potential function over only the first k participants with a trivial function involving the remaining agents.

Next, for a loss function ℓ(fω(xi), yi) defined on classifying an individual s reported features, the average loss on a sample consisting of k participating strategic agents is given by L(X, y, fω, k) = 1 k Pk i=1 ℓ(fω(NE(X, fω, k)i:), yi)8. We now define the standard learning theory notions using this:

Definition 3. For a classifier fω, the true risk R(fω) on distribution D and the empirical risk ˆR(fω) on a dataset S consisting of n independent samples from D is given respectively by:

R(fω) = E(X,y,k) DL(X, y, fω, k) and ˆR(fω) = 1

i=1 L(Xi, yi, fω, ki).

We want to show that a classifier chosen to minimize the risk on a finite dataset S, denoted by ˆ fω, will be close in a PAC sense, to the true risk minimizer of the population, denoted by f ω. Recall that we focus on linear classifiers and as such, our function class is the set of norm-constrained linear functions: { ω, xi s.t. ||ω|| r}. We denote Ω= {ω s.t. ||ω||2 r} as the corresponding parameter space. We now give our generalization guarantee: Theorem 3. For any λ-Lipschitz loss function ℓ, linear function class Ω= {ω s.t. ||ω||2 r}, agent dynamics captured by a strictly concave potential function Φ(x, x , fω), and ε > 0, δ > 0:

+ d ln 16(λ(d + ηr)γ

= P |R( ˆ fω) R(f ω)| ε δ

where η is the Lipschitz constant from Lemma 1.

The proof (deferred to the Appendix C), notes that the average loss on the sample (X, y, k) is a scalar regardless of the randomness of the number of participants. Further, due to the Lipschitz-ness of the NE function, the loss changes predictably due to a changing classifier. This allows us to apply a covering argument and bound the discretization error. While the sample complexity itself does not directly depend on kmax (due mainly to the sample loss being scaled by the number of participants 1 k) it does depend on the Lipschitz constant η. This constant is defined with respect to the potential function over the maximum number of kmax participants (see Lemma 1). Sample complexity, thus, has an implicit dependence on kmax.

4.1 MINIMIZING EMPIRICAL RISK

While we have shown PAC learning is possible in our multi-agent strategic classification game, a cornerstone of learning is minimizing the empirical risk; that is, finding a classifier that minimizes loss incurred on the training set S. Even for a convex loss function ℓand linear classifier fω, minimizing this across the samples of S is non-trivial since the NE function is the solution to an optimization problem. In general, there is no closed-form expression for the NE function and we cannot hope for this loss minimization problem to be convex.

Machine learning nonetheless has a vast literature on optimizing nonconvex loss functions; these, however, are largely gradient-based and require computing the loss gradient with respect to the learning parameter ω. This loss is computed with respect to the equilibrium outcome/score of a participating agent i: fω(NE(X, fω)i:)9. When fω is linear, this is simply NE(X, fω)i:, ω . We now show when and how this gradient can be explicitly computed.

Let vi = NE(X, fω)i: and zi = fω(NE(X, fω)i:) = f(vi, ω) for i [k], noting that zi R and vi Rd. The loss gradient is given as follows:

ωL(x, y, fω) = 1

i=1 ωℓ(f(NE(X, fω)i:, ω), yi) = 1

i=1 ωℓ(zi, yi) = 1

T vif Jω(NE(x, fω)i:) + ωf

8The notation NE(X, fω)i: denotes the equilibrium report of the ith agent. 9Since we are concerned with the loss on a sample, we assume k = kmax and drop the dependence on k.

Published as a conference paper at ICLR 2025

We use the chain rule here, with Jω representing the Jacobian with respect to ω. ℓ zi is the derivative of the loss with respect to the score, a feature of the loss function. Further, the gradients T vif and ωf depend only on the function class we are learning. Indeed, for a linear function class, the desired gradient is given by:

ωL(x, y, fω) = 1

ωT Jω(NE(x, fω)i:) + NE(x, fω)i: (2)

The remaining and crucial term is the Jacobian of the outcome of the PNE optimizer with respect to the weights: Jω(NE(x, fω)i:). The theorem below proves exactly when this Jacobian exists, and explicitly characterizes it. Once again, for ease of exposition we interpret the input and output of the NE function as a vector. Theorem 4. For NE(x, fω) = arg maxx [0,1]kd Φ(x , x, fω) where Φ is strictly concave and smooth, let λ and µ denote the dual variables corresponding to the upper and lower bound constraints. Then x (ω) = NE(x, fω) is differentiable with respect to ω everywhere except possibly if there exists an x i (ω) {0, 1} with the corresponding dual variable set to 0; i.e., if there exists i such that (x i (ω) = 1 λ i = 0) (x i (ω) = 0 µ i = 0).

The proof (deferred to Appendix C) introduces primal and dual slack variables to capture the KKT conditions, which are necessary and sufficient in the optimization above, in an implicit equation. We show that the Jacobian of a slightly modified but representative version of this implicit function has full rank except possibly at the specified points. Aside from these degenerate points, we can use the implicit function theorem to explicitly compute the gradient. With this hand, the learner is free to use standard gradient descent based algorithms to solve the empirical risk minimization problem.

We also note that this result may be insightful for settings even when the potential function may be non-concave. As we have affine constraints, the KKT conditions are still necessary at local optima (Boyd & Vandenberghe, 2004). Since our approach relies on implicitly characterizing the KKT conditions, we can still use this to compute the derivative at local optima, provided we can find them effectively. Game-theoretically, this would correspond to local Nash Equilibrium (Balduzzi et al., 2018).

5 MODELS OF EXTERNALITY

The presented results leverage the strict concavity of the potential function to assert that the PNE is unique, efficiently computable, and PAC learning guarantees possible for the learning problem. Section 3 concluded by noting a more general Condition (1 ) on the cost and externality to ensure this. This allows us to capture possibly non-convex externality; we give two such examples motivated by the settings we highlighted.

Proportional Externality: Externality in strategic classification can model an increased risk in detection/audit due to others also manipulating. In our admissions example, the risk of the university (classifier) suspecting something is amiss is much higher when many students wrongfully claim to be top of their class, as opposed to a handful. As such, it is natural that one s externality due to manipulation increases proportional to the extent to which others in the system also manipulate. We express the externality suffered by agent i in such a setting as:

t(x i, xi, xj, x j) = β k 1

ℓ=1 (x iℓ xiℓ)2(x jℓ xjℓ)2

We scale by k 1 since each agent pays a single cost c( ) but suffers externality from k 1 agents; this allows α (the scale constant for the cost) and β to be on the same scale. Next, observe that for individuals who do not manipulate, the total externality they incur is 0. This is consistent with one interpretation of the university admissions setting: even if many claim to be top of their class, those who truly are, have nothing to fear. We note this externality is pairwise symmetric and we show below that it satisfies the updated condition (proof in Appendix D). Proposition 1. Under the cost c(x, x ) = α||x x ||2 2, the cumulative impact of manipulation under the proportional externality model is smooth and strictly convex for β < α.

Published as a conference paper at ICLR 2025

Congestion Externality: In many decision-making scenarios, reported features map to real resources and have downstream consequences beyond classification. Externality can thus model an increased cost for manipulated features due to demand from others. Many countries, for example, deploy immigration policies that favour candidates who pledge to settle in under-populated areas (Picot et al., 2023). Naturally, an influx of applicants may report such intentions and initially move to these areas (with many reneging on this soon after and relocating10); this can significantly increase housing and living costs for new immigrants in these underpopulated communities. In such cases, individuals suffer from others manipulating even if they are being honest. Externalities of this nature can be modelled as follows:

t(x i, xi, xj, x j) = β k 1

ℓ=1 exp( (x iℓ x jℓ)2)

Under this externality, when reported values (regardless of their veracity) become more similar, indicating usage of common resources, the externality to agents increases, with the exponential function ensuring this is smooth. Again, it is pairwise symmetric and satisfies our updated convexity condition (proof in Appendix D).

Proposition 2. Under cost c(x, x ) = α||x x ||2 2, the cumulative impact of manipulation under the congestion externality model is smooth and strictly convex for β < α

We note that while these functions capture the spirit of each setting, they are by no means definitive. Indeed there may be other representations for these settings that satisfy our desiderata. Choosing the right model for the right context is an important design goal of the decision-maker.

6 EXPERIMENTS

Figure 1: Strategic training (red), corresponding strategic validation losses (blue), and strategic validation loss of a baseline classifier trained with truthful/non-strategic reporting (green). We plot the mean losses over 15 random datasets (with 90% confidence interval) vs training epochs.

We experimentally validate the possibility of training classifiers to be strategy-aware with respect to k agents misreporting to PNE. We use a 2-class synthetic dataset generated by scikit-learn and use the cvxpylayers library to compute PNE and update gradients (Pedregosa et al., 2011;

10Canada uses such a policy and reported provincial retention rates as low as 39% (Picot et al., 2023).

Published as a conference paper at ICLR 2025

Agrawal et al., 2019). We use a linear classifier that returns the probability of assigning the positive class. Agent utilities are computed as per Equation (1), and we use the ℓ2 norm square for the cost c(xi, x i) = α||xi x i||2 2 and the convex externality discussed in Section 3: t(xi, x i, xj, x j) = β(||x i xi||2+||x j xj||2)2, with T(xi, x i, X, X ) = β k 1 P t(xi, x i, xj, x j); we normalize by (k 1) to make the β term interpretable since each agent face externality from (k 1) others. During training, we sample a set of k agents from the training set, compute their PNE under the current classifier, use these PNE features to determine the outcome and loss, and then update gradients. We denote this as strategic training, and the corresponding loss curve is in red in figure 2. After each epoch, we run our model on the validation/test dataset, wherein we carry out the same procedure: sample k agents, compute PNE, and then classify. The resulting loss is plotted in blue in Figure 2. We ablate this experiment over different values of k and different strengths of cost and externality.

Given that the training and validation curves are decreasing in lockstep, our experiments indicate that the model is indeed generalizing. Further, the generalization error seems to decrease as the intensity of the cost and externality increase. As a baseline to compare against, we train a truthful/nonstrategic classifier to optimality and consider the implications of using this classifier at test-time where agents are strategic (captured in the green curve). This is meant to simulate the deployment of non-strategy-aware classifiers in a strategic setting. We note that for nearly all configurations, this baseline performs poorly, as expected. That said, while it is exceedingly poor for lower intensities of cost and externalities, as the intensity increases, the baseline performance is relatively improved. This trend is somewhat expected since at higher intensities, the negative impact of manipulation is higher, incentivizing many to stay close to their true reports. More interestingly, we note that as k increases, the non-strategic baseline performance also improves. While this is less intuitive, as k increases, any single agent s influence on total externality diminishes, and this quantity becomes an average over a larger set of values. This has a stabilizing effect on the externality an agent experiences which may lead to more moderate equilibrium strategies. Nonetheless, it would be intriguing future work to better understand how aggressively the equilibrium reports shift from true values as k grows larger. In Appendix E, we compare against another baseline where the model is trained only considering cost and not externality, mimicking the classic setup of Hardt et al. (2016).

7 DISCUSSION

This paper studies a fundamental question within the strategic classification paradigm: what is the effect of inter-agent externalities on both the agents and the classifier? It is no longer reasonable to assume agents simply best respond to the classifier; rather, the Nash Equilibrium of the induced game becomes the natural solution concept, and we provide a set of conditions whereupon this equilibrium is unique and efficiently computable. The classifier, on the other hand, must learn an optimal model with respect to such an induced equilibrium. We show that this Stackelberg-Nash Equilibrium can be learned in a PAC sense and its loss gradients computable. This paper shows the possibility of deploying loss-minimizing classifiers robust against rich manipulation dynamics.

A limitation of our work is the structural assumptions on externality. Their pairwise nature gives rise to a potential game, and the convexity assumption ensures this is efficiently computable and satisfies Lipschitz regularity conditions. This leads to an intriguing open question: what learning guarantees, if any, can be given for non-concave potential functions where equilibria may not be unique or well-behaved? Can we precisely specify the necessary conditions (our results give a set of sufficient conditions) on externality to ensure both learnability and robustness? Understanding this is both technically and practically interesting. Another important direction is assuming the learner does not know the cost and externality model and must learn them by observing equilibrium reports over multiple interactions. This closely parallels the literature on online strategic classification (Dong et al., 2018; Chen et al., 2020) and learning in games (Cesa-Bianchi & Lugosi, 2006). Given the high stakes, characterizing the welfare properties of multi-agent strategic interaction is also imperative. This can involve parametrizing the price of anarchy (Roughgarden, 2010) in terms of a given classifier or simultaneously optimizing for welfare alongside robustness. In settings with transferrable utility, it may also be pertinent to consider coalition dynamics within the strategic model. Lastly, engaging with pertinent stakeholders to accurately capture real costs and externality and experimentally validate our model is necessary. This paper lays the groundwork for these important questions that arise when deploying classifiers in strategic multi-agent settings.

Published as a conference paper at ICLR 2025

ACKNOWLEDGEMENTS

Ariel Procaccia was partially supported by the National Science Foundation under grants IIS2147187 and IIS-2229881; by the Office of Naval Research under grants N00014-24-1-2704 and N00014-25-1-2153; and by a grant from the Cooperative AI Foundation. Yiling Chen was partially supported by Amazon and the National Science Foundation under grant IIS-2147187.

Anish Agarwal, Munther Dahleh, Thibaut Horel, and Maryann Rui. Towards data auctions with externalities. ar Xiv preprint ar Xiv:2003.08345, 2020.

A. Agrawal, B. Amos, S. Barratt, S. Boyd, S. Diamond, and Z. Kolter. Differentiable convex optimization layers. In Advances in Neural Information Processing Systems, 2019.

David Balduzzi, Sebastien Racaniere, James Martens, Jakob Foerster, Karl Tuyls, and Thore Graepel. The mechanics of n-player differentiable games. In International Conference on Machine Learning, pp. 354 363, 2018.

Yahav Bechavod, Chara Podimata, Steven Wu, and Juba Ziani. Information discrepancy in strategic learning. In International Conference on Machine Learning, pp. 1691 1715, 2022.

Maria Patricia Bejarano Carbo. Machine learning applications in the united states criminal justice system: A critical content analysis of the compas recidivism risk assessment. 2021.

Claude Berge. Topological spaces. Oliver & Boyd, 1963.

Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.

Michael Br uckner and Tobias Scheffer. Nash equilibria of static prediction games. Advances in neural information processing systems, 22, 2009.

Michael Br uckner, Christian Kanzow, and Tobias Scheffer. Static prediction games for adversarial learning problems. The Journal of Machine Learning Research, 13(1):2617 2654, 2012.

Nicolo Cesa-Bianchi and G abor Lugosi. Prediction, learning, and games. Cambridge university press, 2006.

Yiling Chen, Chara Podimata, Ariel D Procaccia, and Nisarg Shah. Strategyproof linear regression in high dimensions. In Proceedings of the 2018 ACM Conference on Economics and Computation, pp. 9 26, 2018.

Yiling Chen, Yang Liu, and Chara Podimata. Learning strategy-aware linear classifiers. Advances in Neural Information Processing Systems, 33:15265 15276, 2020.

RH Coase. The problem of social cost. Journal of Law and Economics, pp. 1 44, 1960.

Jinshuo Dong, Aaron Roth, Zachary Schutzman, Bo Waggoner, and Zhiwei Steven Wu. Strategic classification from revealed preferences. In Proceedings of the 2018 ACM Conference on Economics and Computation, pp. 55 70, 2018.

Itay Eilat, Ben Finkelshtein, Chaim Baskin, and Nir Rosenfeld. Strategic classification with graph neural networks. In Proceeding of the 11-th International Conference on Learning Representations (ICLR), 2023.

Mariah DR Evans, Jonathan Kelley, Joanna Sikora, and Donald J Treiman. Family scholarly culture and educational success: Books and schooling in 27 nations. Research in social stratification and mobility, 28(2):171 197, 2010.

Ganesh Ghalme, Vineet Nair, Itay Eilat, Inbal Talgam-Cohen, and Nir Rosenfeld. Strategic classification in the dark. In In Proceeding of the 24th International Conference on Machine Learning (ICML), pp. 3672 3681, 2021.

Published as a conference paper at ICLR 2025

Jacob K Goeree, Emiel Maasland, Sander Onderstal, and John L Turner. How (not) to raise money. Journal of Political Economy, 113(4):897 918, 2005.

Nika Haghtalab, Nicole Immorlica, Brendan Lucier, and Jack Z. Wang. Maximizing welfare with incentive-aware evaluation mechanisms. In Proceeding of the 26-th International Joint Conference on Artificial Intelligence (IJCAI), pp. 160 166, 2020.

Moritz Hardt, Nimrod Megiddo, Christos Papadimitriou, and Mary Wootters. Strategic classification. In Proceedings of the 2016 ACM conference on innovations in theoretical computer science, pp. 111 122, 2016.

Keegan Harris, Dung Daniel T Ngo, Logan Stapleton, Hoda Heidari, and Steven Wu. Strategic instrumental variable regression: Recovering causal relationships from strategic responses. In International Conference on Machine Learning, pp. 8502 8522, 2022.

Drew Harwell. A face-scanning algorithm increasingly decides whether you deserve the job. In Ethics of Data and Analytics, pp. 206 211. Auerbach Publications, 2022.

Hoda Heidari, Vedant Nanda, and Krishna P Gummadi. On the long-term impact of algorithmic decision policies: Effort unfairness and feature segregation through social learning. In International Conference on Machine Learning, pp. 2692 2701, 2019.

Guy Horowitz and Nir Rosenfeld. Causal strategic classification: A tale of two shifts. In International Conference on Machine Learning, pp. 13233 13253, 2023.

Safwan Hossain and Yiling Chen. Equilibrium of data markets with externality. In International Conference on Machine Learning, 2024.

Safwan Hossain and Nisarg Shah. The effect of strategic noise in linear regression. Autonomous Agents and Multi-Agent Systems, 35(2):21, 2021.

Lily Hu, Nicole Immorlica, and Jennifer Wortman Vaughan. The disparate effects of strategic manipulation. In Proceedings of the 2nd Conference on Fairness, Accountability, and Transparency (FAcc T), pp. 259 268, 2019.

Meena Jagadeesan, Celestine Mendler-D unner, and Moritz Hardt. Alternative microfoundations for strategic classification. In International Conference on Machine Learning, pp. 4687 4697, 2021.

Jon Kleinberg and Manish Raghavan. How do classifiers induce agents to invest effort strategically? ACM Transactions on Economics and Computation (TEAC), 8(4):1 23, 2020.

I Elizabeth Kumar, Keegan E Hines, and John P Dickerson. Equalizing credit opportunity in algorithms: Aligning algorithmic fairness research with us fair lending regulation. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, pp. 357 368, 2022.

Sagi Levanon and Nir Rosenfeld. Strategic classification made practical. In International Conference on Machine Learning, pp. 6243 6253, 2021.

Sagi Levanon and Nir Rosenfeld. Generalized strategic classification and the case of aligned incentives. In International Conference on Machine Learning, pp. 12593 12618, 2022.

Minming Li, Lili Mei, Yi Xu, Guochuan Zhang, and Yingchao Zhao. Facility location games with externalities. In Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems, pp. 1443 1451, 2019.

John Miller, Smitha Milli, and Moritz Hardt. Strategic classification is causal modeling in disguise. In Proceedings of the 21st International Conference on Machine Learning (ICML), pp. 6917 6926, 2020.

Smitha Milli, John Miller, Anca D Dragan, and Moritz Hardt. The social cost of strategic classification. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAcc T), pp. 230 239, 2019.

Dov Monderer and Lloyd S Shapley. Potential games. Games and economic behavior, 14(1):124 143, 1996.

Published as a conference paper at ICLR 2025

Tomoya Nakamura. One-leader and multiple-follower stackelberg games with private information. Economics Letters, 127:27 30, 2015.

Abraham Neyman. Correlated equilibrium and potential games. International Journal of Game Theory, 26(2):223 227, 1997.

Jorge Nocedal and Stephen J Wright. Numerical optimization. Springer, 1999.

Elinor Ostrom. Tragedy of the commons. The new palgrave dictionary of economics, 2:1 4, 2008.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825 2830, 2011.

WG Picot, Eden Crossman, and Feng Hou. The Provincial Nominee Program: Retention in Province of Landing. Statistics Canada= Statistique Canada, 2023.

AC Pigou. The economics of welfare. 1920.

Tim Roughgarden. Algorithmic game theory. Communications of the ACM, 53(7):78 86, 2010.

Georg Still. Lectures on parametric optimization: An introduction. Optimization Online, pp. 2, 2018.

Stephen J Wright and Benjamin Recht. Optimization for data analysis. Cambridge University Press, 2022.

Tian Xie and Xueru Zhang. Non-linear welfare-aware strategic learning. ar Xiv preprint ar Xiv:2405.01810, 2024.

Jiuping Xu, Zongmin Li, Zhimiao Tao, Jiuping Xu, Zongmin Li, and Zhimiao Tao. Methodology from an equilibria viewpoints. Random-Like Bi-level Decision Making, pp. 365 386, 2016.

Published as a conference paper at ICLR 2025

A IMPERFECT INFORMATION SETTING

Our model considers the agent interaction induced by a classifier as a simultaneous game with the Pure Nash Equilibrium (PNE) solution concept. In general, PNE are defined for complete information settings since the agent needs to evaluate their best-response for them to verify an equilibrium. Nonetheless, recent literature on strategic classification literature has considered settings wherein agents have uncertainty about some relevant information (Jagadeesan et al., 2021; Bechavod et al., 2022; Xie & Zhang, 2024).

In this vein, we show an interesting relationship between our complete information PNE and the equilibrium under such uncertainty. More formally, consider a variant of our model where each agent i does not exactly know the true features X i of the remaining agents, but instead has an estimate ˆ X i about the features of the remaining agents, with X i = ˆ X i + bi, with bi capturing the bias/error of the estimate11. Then each agent computes their utility with respect to this possibly biased estimate and evaluates joint strategy based on this. Formally:

ui(xi, x i, ˆ X i, X i, fω) = fω(x i)g+(xi) c(xi, x i) T(xi, x i, ˆ X i, X i)] (3)

We note that only the externality term is affected by the uncertainty. We now show that at the complete information PNE Xq = (xq 1, . . . , xq k) strategy, no agent can improve their biased/imperfect utility by more than ε. Moreover this ε linearly depends on the bias/error b.

Proposition 3. At the complete information PNE strategy Xq = (xq 1, . . . , xq k), no agent believes they can increase their biased/imperfect utility (equation 3) by more than ε = 2λ||bi||, where λ is the Lipschitz constant for the externality function.

Proof. We recall this total externality function Ti is smooth (twice differentiable) and since the domain of features is bounded, this function is also λ-lipschitz, for some λ 0. Formally, for any fixed value of xi, x i, X i:

|Ti(xi, x i, X i, X i) Ti(xi, x i, ˆ X i, X i)| λ||X i ˆ X i||2 λ||bi||2 (4)

Next, suppose that Xq = (xq 1, . . . , xq k) is the complete information PNE strategy. Then by definition, the following holds for any x i:

fω(xq i )g+(xi) c(xi, xq i ) T(xi, xq i , X i, Xq i) fω(x i)g+(xi) c(xi, x i) T(xi, x i, X i, Xq i)

Using the relations established in equations 4, we have that:

fω(xq i )g+(xi) c(xi, xq i ) T(xi, xq i , ˆ X i, Xq i)] + λ||bi||2

fω(x i)g+(xi) c(xi, x i) T(xi, x i, ˆ X i, X i)] λ||bi||2

The inequality above implies that at a PNE, agent i cannot gain by more than 2λ||bi||2 with respect to their biased/imperfect utility function.

11For simplicity, it is easier to think of ˆ X i and X i as vectors of size (k 1)d.

Published as a conference paper at ICLR 2025

B SECTION 3 PROOFS

Proof of Theorem 1

Proof. A potential function encapsulates the change in utility to any agent i due to their unilateral deviation. For us to claim that the proposed function Φ as a potential function, the following must hold:

i : ui(X, [X i; x i], fω) ui(X, [X i; x i ], fω) = Φ(X, [X i; x i], fω) Φ(X, [X i; x i ], fω)

Observe that the expression on the left is given by:

[fω(x i) fω(x i )]g+(xi) [c(x i, xi) c(x i , xi)] X

j =i t(xi, x i, xj, x j) t(xi, x i , xj, x j)

We claim this is equivalent to Φ(X, [X i; x i], fω) Φ(X, [X i; x i ], fω). Since only x i changes to x i , terms involving gain and cost in the potential function difference match the above. Similarly, all externalities terms not involving x i remain the same and only the externalities involving agent i remain. There are exactly k 1 such terms and due to symmetry, this is equivalent to P j =i t(xi, x i, xj, x j) t(xi, x i , xj, x j).

A note on global externality:

There are scenarios wherein the externality suffered by an agent is global and cannot be factorized in a pairwise fashion. For a set of reported values X = (x 1, . . . , xk), let T(X, X ) denote the externality suffered by any agent due to the global/total set of decisions of everyone. This sort of externality is spiritually akin to a tragedy of the commons scenario (common in modeling pollution, public health, etc) where all agents equally suffer due to the combined actions of the whole (Ostrom, 2008). For such settings, the agent dynamics can be captured by the following potential function:

Φ(X, X , fω) =

i=1 fω(x i)g+(xi)

i=1 c(xi, x i) T(X, X ). (5)

Indeed, any agent i shifting their reported value from x to x observes a change in utility exactly equal to: [fω(x i) fω(xi)]g+(xi) [c(x i, xi) c(xi, xi)] T(X, [X i; x i]) T(X, [X i; x i ]). It is immediate that this is equal to Φ(X, [X i; x i], fω) Φ(X, [X i; x i ], fω). As such, insofar as T(X, X ) is convex and smooth (the assumptions made at the start of section 3) all results in the paper hold for this global model of externality.

Proof of Theorem 2

Proof. Due to the pairwise symmetry condition, we know that Pk i=1 T(xi, x i, X, X ) = 2 Pk i=1 P

j>i t(xi, x i, xj, x j). Hence if each T(xi, x i, X, X ) is convex, then the last term in

the potential function, Pk i=1 P

j>i t(xi, x i, xj, x j) is also convex. Combined with the ℓ2 norm square cost, which is strictly convex, and fω which is linear, it immediately follows that that the potential function is strictly concave. The optimization program for the Nash Equilibrium follows immediately from this.

For concave optimization problems, any local optima is a global optima. Suppose by contradiction there are two global optima X , X , with Φ(X, X , fω) = Φ(X, X , fω) = m. Then strict concavity implies: Φ(X, λX +(1 λ)X , fω) > λΦ(X, X , fω)+(1 λ)Φ(X, X , fω) = m. Since the feasible region is convex, the points on the λX +(1 λ)X are feasible and thus neither X or X could be maximizers.

Published as a conference paper at ICLR 2025

C SECTION 4 PROOFS

Proof of Lemma 1

Proof. We begin by proving the Lipschitz continuity of the maximizer of the following generic optimization problem: x (ω) = arg max x X Ψ(x , ω) (6)

where Ψ is strictly concave in x for any ω Ωand X is convex. We know there is always a unique maximizer, and thus the optimal value x (ω) is a well-defined function. We also note that our optimization problem satisfies the conditions of Berge s Maximum Theorem (Berge, 1963). Thus, we can immediately conclude that the correspondence from any ω to the set of maximizers is upperhemicontinuous. Since our maximizer is unique - i.e. the set is a singleton - it suffices to observe that any set-valued map that is a singleton is upper-hemicontinous if and only if the corresponding function is continuous.

Let Bε(ω) refers to an open ball of radium ε centered at ω. For Lipschitz-ness, we first note that since Ωis bounded and compact, it suffices to prove this locally. That is, we wish to show that for any ω, there exists constants L, εω > 0 such that for all ω Bεω, ||x (ω) x (ω)||2 L|| ω ω||2. Indeed, this means that for any two ω , ω Ω, we can consider a sequence of distinct point ω1 = ω , ω2, . . . , ωn 1, ωn = ω lying on the line αω + (1 α)ω such that ||ωi ωi 1||2 εωi 1. Then we have the following, which means it suffices to focus on local Lipschitz-ness:

||x (ω ) x (ω )||2

i=1 L||ωi ωi 1||2 = L||ω ω ||2

We next prove a property that holds for the maximizer of our problem under any ω. Fix a ω, and let x be the maximizer. Next, consider a Taylor expansion of Ψ at x . One formulation of this presented in Chapter 2 of Wright & Recht (2022) states for some γ (0, 1), we have (the dependence on ω is dropped for now as it is unchanged):

Ψ(x , ) = Ψ(x , ) + (x x )T x Ψ(x , ) + 1

2(x x )T 2 x Ψ(x + γ(x x ), )(x x )

= Ψ(x , ) Ψ(x , ) + 1

2(x x )T 2 x Ψ(x + γ(x x ), )(x x )

where the second line follows from Lemma 2.7 in Still (2018), which states that at for any x X, x Ψ(x , ω) (x x ) 012. We also note that 2 x Ψ(x , ) is the Hessian matrix which is (1) always symmetric and (2) consists of all strictly negative eigenvalues since Ψ(x , ) is always strictly concave. Since any symmetric matrix can be diagonalized as QΛQT , where Q is the matrix of orthonormal eigenvectors and the Λ the eigenvalues, we have that (define p = x x ):

(x x )T 2 xΨ(x + γ(x x ), )(x x ) = p T QΛQT p =

i=1 λi(v T i p)2

where vi is the ith eigenvector. Since we have strictly negative eigenvalues and Q is an orthonormal matrix and thus does not affect the norm of vector it matrix multiplies, we have (where the λmin( 2 x Ψ) returns the minimum eigenvalue of the Hessian across ω Ωand x X):

(x x )T 2 x Ψ(x + γ(x x ), )(x x ) λmin( 2 x Ψ)||QT p||2 2 = λmin( 2 x Ψ)||p||2 2 = Ψ(x , ) Ψ(x , ) + 1

2λmin( 2 x Ψ)||x x ||2 2 Since λmin must be strictly negative, it is evident that there is a strictly positive constant c, namely c = 1

2λmin( 2 x Ψ) = 1

2 minx ,ω(λ( 2 x Ψ(x , ω))) such that:

Ψ(x , ω) Ψ(x , ω) c||x x ||2 2

We now turn back to local Lipschitz-ness. Fix a ω Ωand denote the unique maximizer by x (ω) = x. From the relation derived above (Equation C), there exist constants δ1, c > 0, such that:

Ψ(x, ω) Ψ(x , ω) c||x x ||2 2 x X, ||x x||2 δ1 (7)

12Technically the lemma is for convex functions with a , but it can be easily massaged for concave functions

Published as a conference paper at ICLR 2025

Let Ψ = Ψ(ω, x) denote the optimal value at ω. For δ such that 0 < δ < δ1, let 2α = minx s.t.|| x x ||=δ Ψ(x, ω) Ψ(x , ω). Since x is the global and strict optimizer at ω, it holds that 2α > 0. Therefore, we have that:

Ψ(x , ω) Ψ 2α x such that ||x x||2 = δ (8)

We now leverage the continuity of Ψ() with respect to ω at (x, ω). For a chosen value α

2 , there must exist an ε1 such that for all ω Bε1(ω), we have that |Ψ(x, ω) Ψ(x, ω)| < α

2 . Similarly, the continuity of Ψ() with respect to ω at (ω, x ) such that ||x x||2 = δ, implies that for any such x and chosen value α

2 , there exists εx such that for all ω Bεx (ω), we have that |Ψ(x , ω) Ψ(x , ω)| < α

2 . Letting ε = min(ε1, minx |δ=||x x|| εx ), the following holds:

ω Bε(ω), x : Ψ(x, ω) Ψ α

= Ψ(x, ω) Ψ α

ω Bε(ω) , x s.t ||x x|| = δ : Ψ(x , ω) Ψ(x , ω) α

2 , Ψ(x , ω) + α

Combining Equations 8 and 10 we have that for any ω Bε(ω) and any x such that ||x x||2 δ:

Ψ(x , ω) < Ψ(x , ω) + α

Fix a ω0 Bε(ω). We aim to show Lipschitz continuity between ω and ω0, and follow a similar structure as Theorem 6.2a in Still (2018). We know from Equation 9 that Ψ(x, ω0) Ψ α

2 . We also know from Equation 11 that for any x on the boundary of the Bδ(x) ball, Ψ(x , ω0) Ψ α. Since Ψ() is concave in x , this implies that the c = Ψ α

2 super-level set of Ψ(x , ω0) (for a fixed ω0) must lie within this Bδ(x) ball. In other words, the maximizer at ω0, x (ω0) x0 satisfies ||x0 x||2 δ. Noting that Ψ(x, ω0) Ψ(x0, ω0) < 0, we have that:

Ψ(x, ω) Ψ(x0, ω) = [Ψ(x, ω) Ψ(x, ω0)] [Ψ(x0, ω) Ψ(x0, ω0)] + [Ψ(x, ω0) Ψ(x0, ω0)] [Ψ(x, ω) Ψ(x, ω0)] | {z } A

[Ψ(x0, ω) Ψ(x0, ω0)] | {z } B

Consider next the function Ψ(x , ω) Ψ(x , ω0) g(x ). Consider g(x ) between x and x0. Indeed g(x) g(x0) = A B Ψ(x, ω) Ψ(x0, ω). By the mean value theorem, there exists a α [0, 1] such that:

Ψ(x, ω) Ψ(x0, ω) g(x) g(x0) = x[Ψ(x+α(x0 x), ω) Ψ(x+α(x0 x), ω0)] (x x0)

Recall that for a differentiable function f(ω), we can approximate it near a value ω using the gradient: f(ω) = f(ω) + ωf(ω) (ω ω) + o(||ω ω||). So for any x cl(Bδ(x)) (cl denotes the closure), define h(ω) = xΨ(x , ω). Thus, we have that:

h(ω0) = h(ω) + ωh(ω)(ω ω0) + o(1||ω0 ω||2)

= x Ψ(x , ω0) = x Ψ(x , ω) + 2 ωx Ψ(x , ω)(ω ω0) + o(1||ω0 ω||2)

= x [Ψ(x , ω) Ψ(x , ω0)] = 2 ωx Ψ(x , ω)(ω ω0) + o(1||ω0 ω||2).

Going back to Ψ(x, ω) Ψ(x0, ω) we have the following:

Ψ(x, ω) Ψ(x0, ω) x [Ψ(x+ α(x0 x), ω) Ψ(x+ α(x0 x), ω0)] (x x0)

max x cl Bδ( x) x [Ψ(x , ω) Ψ(x , ω0)] (x x0)

max x cl Bδ( x) (|| 2 ωx Ψ(x , ω)||2 + 1) ||ω ω0||2 ||x x0||2.

Let γ = maxx ,ω || 2 ωx Ψ(x , ω)||2 +1. Then using the relation outlined in Equation 7, and noting the fact that x0 Bδ(x), we have that:

c||x x0||2 2 Ψ(x, ω) Ψ(x0, ω) γ||ω ω0||2 ||x x0||2

= ||x x0||2 γ

c ||ω ω0||2

Published as a conference paper at ICLR 2025

To relate this to the Nash Equilibrium of an arbitrary k participants, observe that the following is a characterization of the NE(x, fω, k) function (again we treat x as a Rdkmax dimensional vector):

NE(x, fω, k) = arg max x [0,1]dkmax

Φ(x 0:k, x0:k, fω)

j=k+1 ||x j xj||2 2

The first term, the potential function, fits the function signature of Ψ (see the objective in equation 6) since x (the true features) is simply a constant. Second, we note that the optimization problem here is separable since the potential function only uses the first k feature vectors and the sum of norms only involves the remaining kmax k feature vectors. Further, the latter is independent of ω and will always be set to x j = xj for k < j kmax in the maximization program. Thus, we can appeal to the result above for the maximization of Φ and claim the function NE(x, fω, k) to be γ

c lipschitz continuous where γ, c are as above but for the largest possible value of k, which is kmax.

Published as a conference paper at ICLR 2025

Proof of Theorem 3

Proof. Let S = {(k1, X1, y1), . . . , (kn, Xn, yn)} be the training set where Xi Rkmax d and yi Rkmax. Let L(Xi, yi, fω, ki) denote the loss on the i-th sample, where L(Xi, yi, fω, ki) = 1 ki Pki j=1 ℓ(fω(NE(Xi, fω, ki)j:), yij). Note that the following holds where the expectation is over samples from D:

i=1 L(Xi, yi, fω, ki) E[ ˆR(fω)] = 1

i=1 E[L(X, y, fω, k)] = R(fω) (12)

Our goal is to show the following generalization: |R( ˆ fω) R(f ω)| ε with low probability. To that end, we use the fact that uniform convergence implies generalization. Formally:

R( ˆ fω) R(f ω) | ˆR( ˆ fω) R( ˆ fω)| + | ˆR(f ω) R(f ω)| 2 sup fω F | ˆR(fω) R(fω)| (13)

where the second inequality follows from the triangle inequality and that ˆR( ˆ fω) R( ˆ fω) 0. Since the function space is continuous, we will fix a ζ-cover N ζ ω of the parameter space Ω. That is, for any ω Ω, there exists a ω N ζ ω such that ||ω ω || ζ. Since our parameter space is d dimensional ball of norm r, it is well known that such a covering can be achieved with |N ζ ω| 2r

d ζ d . For a sample i in our dataset, let zi denote the vector of scores received. That is, zij =

NE(Xi, fω, ki)j:, ω , and let z ij denote the score on this sample when classifier fω is used. Since our loss function ℓis λ-Lipschitz in the score zij, we have that:

|L(Xi, yi, fω, ki) L(Xi, yi, fω , ki)| =

j=1 ℓ(zij, yij) ℓ(z ij, yij)

ki ||zi z i||1

Next, by making use of the triangle inequality, we have the following:

λ ki ||zi z i||1

j=1 | NE(Xi, fω, ki)j:, ω NE(Xi, fω , ki)j:, ω |

j=1 | NE(Xi, fω , ki)j:, ω + NE(Xi, fω, ki)j: NE(Xi, fω , ki)j:, ω NE(Xi, fω , ki)j:, ω |

j=1 | NE(Xi, fω , ki)j:, (ω ω ) | + | NE(Xi, fω, ki)j: NE(Xi, fω , ki)j:, ω |

j=1 ||NE(Xi, fω , ki)j:||1 ||(ω ω )||1 + λ

j=1 | NE(Xi, fω, ki)j: NE(Xi, fω , ki)j:, ω |

λd||(ω ω )||1 + λ

ki ||(NE(Xi, fω, ki) NE(Xi, fω , ki))ω||1

where the last inequality follows from the assumption that the feature vectors are bounded in the region [0, 1]d. Next, we make use of the following 3 relations that hold for any matrix V and vector ω Rd: (1) ||ω||2 ||ω||1

d||ω||2, (2) submultiplicavity of matrix norms ||V ω||2 ||V ||||ω||2 and (3) ||V ||2 ||V ||F , where ||V ||F denotes the matrix Frobenius norm:

||(NE(Xi, fω, ki) NE(Xi, fω , ki))ω||1 p

ki||(NE(Xi, fω, ki) NE(Xi, fω , ki))ω||1

ki||(NE(Xi, fω, ki) NE(Xi, fω , ki))||F ||ω||2

ki||(NE(Xi, fω, ki) NE(Xi, fω , ki))||F

Published as a conference paper at ICLR 2025

Next, we appeal to theorem 1 which establishes the η Lipschitz continuity of the NE function in ω under the || ||F norm to state the following:

ki||(NE(Xi, fω, ki) NE(Xi, fω , ki))||F ηr p

ki||ω ω ||1

Thus in combining the results here, we can state the following, where the last inequality follows from the definition of a γ covering:

|L(Xi, yi, fω, ki) L(Xi, yi, fω , ki)| λd||ω ω ||1 + λ

ki||ω ω ||1 λ(d + ηr)γ

This essentially bounds the change in empirical and true risk due to using classifier parameters from using parameters in the finite covering N ζ ω. Formally, we expand Equation 13 to state the following (ω is the closest element in N ζ ω to ω):

sup ω Ω |R(fω) ˆR(fω)| = sup ω Ω |R(fω ) ˆR(fω ) + R(fω) R(fω ) + ˆR(fω ) ˆR(fω)|

max ω N ζ ω |R(fω ) ˆR(fω )| + 2λ(d + ηr)γ.

By choosing ζ = ε 8λ(d+ηr)γ , we get that

P sup ω Ω |R(fω) ˆR(fω)| ε

P max ω N ζ ω |R(fω ) ˆR(fω )| ε

Lastly, we can apply Hoeffding s inequality due to Equation 12 and using union bound over N δ ω at the right-hand side of the inequality above, we have that

P max ω N ζ ω |R(fω ) ˆR(fω )|| ε

2|N ζ ω| exp nε2

2 16λ(d + ηr)γ

and the theorem follows by choosing n accordingly.

Proof of Theorem 4

Proof. Note that NE is the solution to a strictly convex optimization problem. Let z = kd. We express the Lagrangian of this problem as follows, with the λ Rz denoting the dual variables for the upper bound constraint and µ Rz, the duals for the lower bound (recall our feasible region is a box [0, 1]z):

L(x , λ, µ, ω; x) = Φ(x , x, fω)

i=1 λi(x i 1) +

Since our feasible region is always strictly satisfiable, Slater s condition is always satisfied. When our objective is concave, as modeled throughout the paper, the KKT conditions are both sufficient and necessary for the optimal solution x = NE(x, ω). Since the constraints are affine, in non-concave setting, the KKT conditions are necessary at a local optimum. Define vectors sx+ Rz, sx Rz, sλ Rz, sµ Rz which will be used to denote the primal and dual slacks. We now define the following implicit function G( ) : R7z d R7z:

G(x , λ, µ, sx+, sx , sλ, sµ, ω) =

x L(x , λ, µ, ω; x) x i 1 + s2 x+,i i [m] x i s2 x ,i i [m] diag(λ)(1 x ) diag(µ)(x ) λi s2 λ,i i [m] µi s2 µ,i i [m]

Published as a conference paper at ICLR 2025

Let G( ) = 0. Under this implicit equation, the first row of equations corresponds to the KKT stationarity conditions. The second two rows of equations enforce each x i is less than 1 and greater than 0 respectively - the KKT primal feasibility conditions. The fourth and fifth rows of equations correspond to the complementary slack constraints. The last two rows of equations enforce that the dual variables are positive, the KKT dual feasibility condition. Thus, when G( ) = 0, it means the KKT conditions are satisfied, and with a concave objective this also implies the solution is optimal. Similarly, for any optimal X , λ , µ , since it is feasible, we can always find the corresponding slacks so as to satisfy G( ) = 0. Therefore, solutions to this implicit equation fully characterizes the optimal solution.

Let p = (x , λ, µ, sx, sλ, sµ) and we can simplify our equation to G(p, ω) = 0. Any p that satisfies this is the optimal solution for ω. We aim to use the Implicit Function Theorem and to that end, we first compute the Jacobian Jz(G) as follows (the columns represent the derivative with respect to x , λ, µ, sx+, sx , sλ, sµ):

2 x Φ(x , x, fω) I I 0 0 0 0 I 0 0 diag(2)sx+ 0 0 0 I 0 0 0 diag( 2)sx 0 0 diag(λ) Ix 0 0 0 0 0 diag(µ) 0 Ix 0 0 0 0 0 I 0 0 0 diag( 2)sλ 0 0 0 I 0 0 0 diag( 2)sµ

We note that the first m columns are always linearly independent owing to the block of identity matrices in the second and third rows of block matrices. Similarly, the second and third sets of m columns are linearly independent owing to the identity in the first row. The remaining columns correspond to slack variables, which we now address. A constraint i is active if x i {0, 1}. Under the KKT conditions, the Lagrange multiplier for inactive constraints must be 0. Let I [2m] denote a set of active constraints. Let I(ω) = {i|x i (ω) = 0} {i + m|x i (ω) = 1}. We define the inverse mapping ω(I) as follows: ω(I) = {ω Ω|I(ω) = I}. It is immediate that the set {ω(I) I [2m]} is a partition of the parameter space Ω.

Fix an I and without loss of generality, let i constraints from λ and j constraints from µ be active. Then for ω ω(I), it suffices to consider λi Ri and µj Rj - ie the dual variables corresponding to the active constraints - since the others will be 0 under KKT complementary slackness condition. Similarly, we need only consider the slacks for the inactive constraints since those are the only ones free. Thus for ω ω(I) we can simplify the general function G to a function G : R3m+i+j Rd R3m+i+j by focusing only on the inactive primal slacks and the active dual variables and their corresponding slack.

We note that by definition, the primal slack for inactive constraints cannot be 0 by definition (otherwise those constraints would be active). Hence the columns corresponding to those are also independent. Thus, if the dual slacks for the active constraints are non-zero, then the Jacobian for G has full rank. This allows us to apply the implicit function theorem and state the following:

Jω(NE(x, fω)) NE(x, fω)

ω = J 1 p (G (p, ω))Jω(G (p, ω))

If however some of these dual slacks are zero, then it suggests that some of these constraints are redundant and the Jacobian of G may not be invertible. This is formally equivalent to:

i | (x i = 1 λ i = 0) (x i = 0 µ i = 0)

Indeed, this is a degenerate situation known as strict complementary failure (Nocedal & Wright, 1999) and represents the threshold or boundary of the w(I) region where one set of constraints is becoming active and another inactive.

Published as a conference paper at ICLR 2025

D SECTION 5 PROOFS

Proof of Proposition 1

Proof. We express the cumulative impact of manipulation under the stated conditions as follows:

ℓ=1 (x iℓ xiℓ)2 + β k 1

ℓ=1 (x iℓ xiℓ)2(x jℓ xjℓ)2

i=1 (x iℓ xiℓ)2 + β k 1

j>i (x iℓ xiℓ)2(x jℓ xjℓ)2

Since the sum of strictly convex functions is convex, it suffices to show that each inner summand is strictly convex. For a fixed ℓ, we observe that since since there are exactly k(k 1)

2 pairs satisfying j > i, and in each feature xi appears in exactly (k 1) of these pairs, the inner summand can be written as follows:

α k 1(x il xil)2 + β k 1(x iℓ xiℓ)2(x jℓ xjℓ)2 + α k 1(x jl xjl)2 (14)

Again since convexity is preserved in summation, we only need to show strong convexity with respect to the summands, each of whom is a function of two variables (x iℓand x jℓsince xi and xj are constants). For an arbitrary summand index by (i, j), let ui = (x iℓ xiℓ) and uj = (x jℓ xjℓ). Then the Hessian (upon multiplying Equation 14 by k 1, which does not affect convexity), is:

2 x iℓ,x jℓ= 2α + 2βu2 j 4βuiuj 4βuiuj 2α + 2βu2 i .

The determinant of this Hessian, when simplified, is given by:

det( 2 x iℓ,x jℓ) = 4α2 + 4αβu2 i + 4αβu2 j 12β2u2 i u2 j

Since we wish to show the determinant is strictly positive, α2 + αβu2 i + αβu2 j > 3β2u2 i u2 j. As feature vectors are bounded, ui, uj [ 1, 1], and our condition is α > β, the following holds:

α2 + αβu2 i + αβu2 j > β2 + β2u2 i + β2u2 j β2u2 i u2 j + β2u2 i + β2u2 j β2u2 i u2 j + 2β2u2 i u2 j = 3β2u2 i u2 j,

where the second last transition uses the fact that 2u2 i u2 j u2 i + u2 j in the feature vector range.

Proof of Proposition 2

Proof. We express the cumulative impact of manipulation under the stated conditions as follows:

ℓ=1 (x iℓ xiℓ)2 + β k 1

ℓ=1 exp( (x iℓ x jℓ)2)

i=1 (x iℓ xiℓ)2 + β k 1

j>i exp( (x iℓ x jℓ)2)

Since the sum of strictly convex functions is convex, it suffices to show that each inner summand is strictly convex. For a fixed ℓ, we observe that since since there are exactly k(k 1)

2 pairs satisfying j > i, and in each feature xi appears in exactly (k 1) of these pairs, the inner summand can be written as follows:

α k 1(x il xil)2 + β k 1 exp( (x iℓ x jℓ)2) + α k 1(x jl xjl)2. (15)

Published as a conference paper at ICLR 2025

Again, since convexity is preserved in summation, we only need to show strong convexity with respect to the summands, each of whom is a function of two variables (x iℓand x jℓ). For an arbitrary summand index by (i, j), let u = (x iℓ x jℓ). Then the Hessian (upon multiplying Equation 15 by k 1, which does not affect convexity) is given by:

2 x iℓ,x jℓ= 2α + 2βe u2[2u2 1] 2βe u2[1 2u2] 2βe u2[1 2u2] 2α + 2βe u2[2u2 + 1].

A positive definite matrix is equivalent to a matrix with all positive principal minors. When β < α

2 < α, we note the the (0, 0) index (and thus the first principal minor) is positive. The second principal minor is the determinant, which for our 2 2 matrix above is given by:

det( 2 x iℓ,x jℓ) = 4α2 + 16αβe u2u2 + 4β2e 2u2[4u2 2].

Note that the middle term is always positive. The last term is the most negative when u = 0, which results in it being 8β2. It is evident that can be well compensated for by the first term since using

our relation between β and α, we have that: 8β2 < 8 α

E ADDITIONAL EXPERIMENTS

Figure 2: Strategic training (red), corresponding strategic validation losses (blue), and strategic validation loss of a classifier trained only considering cost (gray). We plot the mean losses over 15 random datasets (with 90% confidence interval) vs training epochs.

We also consider comparing our experimental results against a cost-only baseline. During the training of this classifier, the agents manipulate their features only considering the cost (and not the externality). In this setting, agents have a dominant strategy since the cost does not depend on the reports of the others. The grey curve plots the strategic validation loss of this classifier, wherein agents manipulate considering both cost and externality and reach a PNE. This baseline is essentially trying to capture the performance of a classifier trained in the classical model of Hardt et al. (2016) (which considers only cost) when deployed in settings with externality.

We first note that the cost-only baseline performs worse than the classifier trained with both cost and externality considerations. This is naturally expected since at validation manipulation occurs considering both. That said, this baseline performs relatively better when the intensity of the cost

Published as a conference paper at ICLR 2025

and externality are low. This is intuitive since at low intensities, the agents are relatively free to misreport to increase gain, a dynamic captured by both classifiers. As intensities of both cost and externality increase, the cost-only classifier only captures part of the story, ignoring the relatively large negative impacts of manipulation arising from externality. This likely leads the cost-only trained model to anticipate more extreme misreporting than what happens in practice, leading to relatively worse validation performance at these higher intensities. Lastly, we note that k does not seem to have any meaningful effect on the performance of this cost-only baseline.