# bayesian_strategic_classification__cd1093d7.pdf

Bayesian Strategic Classification

Lee Cohen1 Saeed Sharifi-Malvajerdi2 Kevin Stangl2 Ali Vakilian2 Juba Ziani3

In strategic classification, agents modify their features, at a cost, to obtain a positive classification outcome from the learner s classifier, typically assuming agents have full knowledge of the deployed classifier. In contrast, we consider a Bayesian setting where agents have a common distributional prior on the classifier being used and agents manipulate their features to maximize their expected utility according to this prior. The learner can reveal truthful, yet not necessarily complete, information about the classifier to the agents, aiming to release just enough information to shape the agents behavior and thus maximize accuracy. We show that partial information release can counter-intuitively benefit the learner s accuracy, allowing qualified agents to pass the classifier while preventing unqualified agents from doing so. Despite the intractability of computing the best response of an agent in the general case, we provide oracle-efficient algorithms for scenarios where the learner s hypothesis class consists of low-dimensional linear classifiers or when the agents cost function satisfies a sub-modularity condition. Additionally, we address the learner s optimization problem, offering both positive and negative results on determining the optimal information release to maximize expected accuracy, particularly in settings where an agent s qualification can be represented by a real-valued number.

1 Introduction

Machine Learning critically relies on the assumption that the training data is representative of the unseen instances a learner faces at test time. Yet, in many real-life situations, this assumption fails when individuals (agents) manipulate decision-making algorithms for personal advantage, by modifying their features at a cost. A typical example of such manipulations or strategic behavior is seen in loan applications or credit scoring: for example, an individual may open new credit card accounts to lower their credit utilization and increase their credit score artificially. In the context of job interviews, a candidate can spend time and effort to memorize solutions to common interview questions and potentially look more qualified than they are at the time of an interview. A student might cram to pass an exam this way without actually understanding or improving their knowledge of the subject.

The prevalence of such behaviors has led to the rise of an area of research known as strategic classification. Strategic classification, introduced by Hardt et al. [2016], aims to understand how a learner can optimally modify decision making algorithms to be robust to such strategic manipulations of agents, if and when possible.

Most of the strategic classification literature makes the assumption that the model deployed by the learner is fully observable by the agents, granting them the ability to optimally best respond to the learner using resources such as effort, time, and money. Yet, this full information assumption can be unrealistic in practice. There are several reasons for this: some machine learning models are

1Stanford University. Email: leeco@stanford.edu 2Toyota Technological Institute at Chicago (TTIC). Email: saeed@ttic.edu, kevin@ttic.edu, vakilian@ttic.edu 3Georgia Institute of Technology. Email: jziani3@gatech.edu

38th Conference on Neural Information Processing Systems (Neur IPS 2024).

proprietary and hide the details of the model to avoid leaking trade secrets : e.g., this is the case for the credit scoring algorithms used by FICO, Experian, and Equifax.1 Some classifiers are simply too complex in the first place to be understood and interpreted completely by a human being with limited computational power, such as deep learning models. Other classifiers and models may be obfuscated for data privacy reasons, which are becoming an increasingly major concern with new European consumer protection laws such as GDPR [Regulation, 2018] and with the October 2023 Executive Order on responsible AI [Biden, 2023]. In turn, there is a need to study strategic classification when agents only have partial knowledge of the learner s model.

There has been a relatively short line of work trying to understand the impact of incomplete information on strategic classification. Jagadeesan et al. [2021] and Bechavod et al. [2021] study the optimal classifiers in settings where agents can only gain partial or noisy information about the deployed model. Haghtalab et al. [2023] study calibrated Stackelberg games, a more general form of strategic classification; in their framework, the learner engages in repeated interactions with agents who base their actions on calibrated forecasts about the learner s classifier. They characterize the optimal utility of the learner for such games under some regularity assumptions. While we also model agents with prior knowledge of the learner s actions, Haghtalab et al. [2023] focus on an online learning setting and the selection of a strategy for the learner without incorporating any form of voluntary information release by the learner.

In contrast, we focus on this additional critical aspect of voluntary information release by the learner that these works do not study. Namely, we ask:

How to release partial and truthful information about the classifier to maximize accuracy?

This should give the reader pause: why should a learner release information about their deployed classifier since presumably such information only makes it easier for agents to manipulate their features and trick the learner. In fact, Ghalme et al. [2021] showed that information revelation can help a learner may prefer to fully reveal their classifier as opposed to hiding it. While they consider either fully revealing the classifier or completely hiding it, our model considers a wider spectrum of information revelation that includes both full-information-release and no-information-release . We show that there exist instances where it is optimal to reveal only partial information about the classifier, in a model where a learner is allowed to reveal a subset of the classifiers containing the true deployed classifier. For example, a tech firm might reveal to candidates that they will ask them about a new type of data structure during their job interviews. Lenders might reveal to clients that they do not consider factors like credit score [Lake, 2024]. This selective disclosure can discourage unfit individuals, ultimately saving time and energy for both sides. In the following, we summarize our contributions.

Summary of contributions:

In Section 2, we propose a new model of interactions between strategic agents and a learner, under partial information. The two novel modeling elements compared to the standard strategic classification literature are: i) agents have partial knowledge about the learner s classifier in the form of a distributional prior over the hypothesis class, and ii) the learner can release partial information about their deployed classifier. Specifically, our model allows the learner to release a subset of the hypothesis class to narrow down the agents priors. Given our model, we consider a (Stackelberg) game between agents with partial knowledge and a learner that can release partial information about its deployed model. On the one hand, the agents aim to manipulate their features, at a cost, to increase their likelihood of receiving a positive classification outcome. On the other hand, the learner can release partial information to maximize the expected accuracy of its model, after agent manipulations. In Section 3, we study the agent s best response in our game. We show that while in general, it is intractable to compute the best response of the agents in our model, there exist oracle-efficient algorithms2 that can exactly solve the best response when the hypothesis class is the class of low-dimensional linear classifiers. We then move away from the linearity assumption and consider

1 The exact algorithm used to condense your credit report into a FICO score is a closely guarded secret, but we have a general layout of how your credit score is calculated. Source: Business Insider, October 2023. [Link: https://www.businessinsider.com/personal-finance/what-is-fico-score]. 2An oracle-efficient algorithm is one that calls a given oracle only polynomially many times.

a natural condition on the agents cost function for which we give an oracle-efficient approximation algorithm for the best response of the agents for any hypothesis class.

In Section 4, we study the learner s optimal information release problem. We consider screening/classification settings where agents are represented to the learner by a real-valued number that measures their qualification level for a certain task. Prior work has focused on similar onedimensional settings in the context of strategic classification; see, e.g., [Beyhaghi et al., 2023, Braverman and Garg, 2020]. We first show that the learner s optimal information release problem is NP-hard when the agents prior can be arbitrary. In light of this hardness result, we focus on uniform prior distributions and provide closed-form solutions for the case of continuous uniform priors, and an efficient algorithm to compute the optimal information release for discrete uniform priors.

We finally consider alternative utility functions that are based on false positive (or negative) rates for the learner and provide insights as to what optimal information release should look under these utility functions, without restricting ourselves to uniform priors.

Related Work. Strategic classification was first formalized by Brückner and Scheffer [2011], Hardt et al. [2016]. Hardt et al. [2016] is perhaps the most seminal work in the area of strategic classification: they provide the first computationally efficient algorithms (under assumptions on the agents cost function) to efficiently learn a near-optimal classifier in strategic settings. Importantly, this work makes the assumption that the agents fully know the exact parameters of the classifier due to existing information leakage , even when the firm is obscuring their model. Hardt et al. [2016] also do not consider a learner that can release partial information about their model.

Closest to our work, Jagadeesan et al. [2021], Ghalme et al. [2021], Bechavod et al. [2022], and Haghtalab et al. [2023] relax the full information assumption and characterize the impact of opacity on the utility of the learner and agents. Jagadeesan et al. [2021] are the first to introduce a model of biased information about the learner s classifier: instead of observing the learner s deployed classifier exactly, agents observe and best respond to a noisy version of this classifier; one that is randomly shifted (by an additive amount) from the true deployed classifier.

In contrast, Ghalme et al. [2021] and Bechavod et al. [2022] consider models of partial information on the classifiers, where agents can access samples in the form of historical (feature vector, learner s prediction) pairs. More precisely, Ghalme et al. [2021] study what they coin the price of opacity in strategic classification, defined as the difference in prediction error when not releasing vs fully releasing the classifier. They are the first to show that this price can be positive (in the context of strategic classification), meaning that a learner can reduce their prediction error by fully releasing their classifier in strategic settings. Our work considers more general, intermediate forms of information release, instead of the all-or-nothing, binary approach of Ghalme et al. [2021].

Bechavod et al. [2022] consider a strategic regression setting in which the learner does not release their regression rule, but agents have access to (feature, score) samples as described above. They study how disparity in sample access (e.g., agents may only access samples from people similar to them) about the classifier across different groups induce unfairness in classification outcomes across these groups. Haghtalab et al. [2023] consider agents with (calibrated) forecasts over the actions of the learner, but do not consider the learner s information release which is our focus. Additionally, in our model, we do not constrain the agent s prior distribution to be calibrated.

Beyond strategic classification, there are a few related lines of work where such partial information is considered. One is Bayesian Persuasion [Kamenica and Gentzkow, 2011]: in Bayesian persuasion, the state of the world is randomly drawn from the prior, and there is a mapping from the state of the world to signal distributions. This mapping, i.e. the signaling scheme , must be revealed to the agents in addition to the signal. In our setting, there is a fixed state of the world (the learner s classifier), and there is no need for the signaling scheme to be known, since the signal itself (the subset) reveals all the information needed for the agents. The agents only need to know that the learner is truthful, which is an assumption made in Bayesian persuasion too.

Relatedly, algorithmic recourse studies an intermediate information release problem where the learner publishes a recommended action or recourse for each agent to take, rather than a set of potential classifiers used by the learner; e.g., Harris et al. [2022]. In our model, we release the same signal or information to all agents based on the underlying distribution over these agents features.

Our model consists of a population of agents and a learner. Each agent in our model is represented by a pair (x, y) where x X is a feature vector, and y {0, 1} is a binary label. Throughout, we call an agent with y = 0 a negative , and an agent with y = 1 a positive . We assume there exists a mapping f : X {0, 1} that governs the relationship between x and y; i.e., y = f(x) for every agent (x, y). We will therefore use x to denote agents from now on. We denote by D the distribution over the space of agents X. Agent manipulations are characterized by a cost function c : X X [0, ) where c(x, x ) denotes the cost that an agent incurs when changing their features from x to x . We assume, similar to standard strategic classification settings, that manipulation does not change one s true label: manipulation is seen purely as gaming ; it does not change the qualification of an agent. Let H {0, 1}X denote our hypothesis class, and let h H be the fixed classifier that the learner is using for classification.

A Partial Knowledge Model for the Agents. We move away from the standard assumption that agents fully know h and model agents as having a common (shared by all agents) prior distribution π over H. This distribution captures their initial belief about which classifier is deployed by the learner. Formally, for every h H, π(h ) is the probability that the learner is going to deploy h for classification from the agents perspective. We emphasize that the learner is committed to using a fixed classifier h. The prior π captures the agents belief about the deployed classifier and is known to the learner.

For example, job seekers may use Glassdoor to prepare for interviews. They may not know the exact hiring algorithm (h) of a specific company but can observe patterns from other companies for similar roles. This forms their initial belief, represented by π, about the classifier a company might use. Thus, π captures the agents probabilistic beliefs rather than assuming full knowledge of h.3

A Partial Information Release Model for the Learner. The learner has the ability to influence the agents prior belief π about the deployed classifier h by releasing partial information about h. We model information release by releasing a subset H H such that h H. We note that we reveal information truthfully, meaning that the deployed classifier is required to be in H.

Note that this is a general form of information release because it allows the learner to release any subset of the hypothesis class, so long as it includes the deployed classifier h. Below, we provide natural examples of information release that can be captured by our model.

Example 2.1 (Examples of Information Release via Subsets). Consider the class of linear halfspaces in d dimensions: H = {hw,b : w = [w1, w2, . . . , wd] Rd +, b R} where hw,b(x) 1[w x + b 0] and x X = Rd is the feature vector. Let h = hw0,b0 be the classifier deployed by the learner for some w0, b0. Under this setting, revealing the corresponding parameter of a feature, say xj, in h corresponds to releasing H1 = {hw,b H : wj = wj 0} (e.g., minimal GPA of 3.8 for grad school ). Revealing the top k features of h (e.g., the most significant class grades are algorithms and calculus) corresponds to releasing H2 = {hw,b H : wi1, wi2 . . . , wik are the k largest coordinates of w}. Let I0 be such that wi 0 = 0 iff i I0. Revealing the relevant features of h, i.e. features with nonzero coefficients (e.g., sensitive attributes like race or gender will not be used in the decision) corresponds to releasing H3 = {hw,b H : wi = 0, i I0}. This is a common form of information release in the real world4.

The Strategic Game with Partial Information Release. Once the partial information H is released by the learner, agents best respond as follows: each agent first computes their posterior belief about the deployed classifier by projecting their prior π onto H, which we denote by π|H, and is formally defined by: h H, π|H(h ) π(h )

π(H)1[h H]. Given this posterior distribution, the agent then moves to a new point that maximizes their utility. The utility is quasi-linear and measured by the

3Agent priors may also arise from observing previous decisions made by this classifier, for example as is studied in Bechavod et al. [2022]: the learner (e.g., a hiring company) has been using a classifier h (the hiring algorithm) to screen agents (applicants) for some time. Agents going up for a decision today may observe some of the previous decisions made by the current classifier h and use this information to form their prior. 4For example, recently, many U.S. universities have announced that they will not use race and ethnicity anymore in admissions, in line with a recent Supreme Court ruling.

h1(= f) h2 h3 x1 1 0 0 x2 0 1 0 Table 1: Hypothesis class H in Example 2.3

probability (according to π|H) of receiving a positive outcome minus the manipulation cost. Formally, the utility of agent x that manipulates to x , under the partial information H released by the learner is given by ux(x , H) Pr h π|H [h (x ) = 1] c(x, x ). (1)

We let BR(x, H) denote the best response of agent x, i.e. a point x that maximizes ux(x , H). 5 The goal of the learner is to release H that includes its deployed classifier h so as to maximize its utility which is measured by its expected strategic accuracy.

U(H) Pr x D [h(BR(x, H)) = f(x)] . (2)

Definition 2.2 (Strategic Game with Partial Information Release). The game, between the learner who is using h H for classification, and the agents who have a prior π over H, proceeds as follows:

1. The learner (knowing f, D, c, π) publishes a subset of hypotheses H H such that h H.

2. Every agent x best responds by moving to a point BR(x, H) that maximizes their utility: BR(x, H) argmaxx X ux(x , H).

The learner s goal is to find a subset H H with h H , that maximizes its utility6: H argmax H H, h H U(H)

We note that similar to standard strategic classification, the game defined in Definition 2.2 can be seen as a Stackelberg game in which the learner, as the leader , commits to her strategy first and then the agents, as the followers , respond. The optimal strategy of the learner, H , corresponds to the Stackelberg equilibrium of the game, assuming best response of the agents.

Contrasting with the Standard Setting of Strategic Classification. The game defined in Definition 2.2 not only captures both the partial knowledge of the agents and the leaner s partial information release, but can also be viewed as a generalization of the standard strategic classification game where the agents fully observe the classifier h, which we refer to as the full information release game (e.g., see [Hardt et al., 2016]). This is because the learner can always choose H = {h}. Next, we ask:

Can partial information release increase the learner s utility compared to full information release?

Observe that by definition, U(H ) U({h}), i.e., the learner can only gain utility when they optimally release partial information instead of fully revealing the classifier. In the following examples, we show that there exist instantiations of the problem where U(H ) > U({h}), even when h is picked to be the optimal classifier in the full information release game, i.e., one that maximizes U({h}). In other words, we show that the learner can gain nonzero utility by releasing a subset that is not {h}, even if the choice of h is optimized for the full information release game. Example 2.3 (Partial vs. Full Information Release). Suppose X = {x1, x2}, and that their probability weights under the distribution7 are given by D(x1) = 2/3, D(x2) = 1/3, and their true labels are given by f(x1) = 1, f(x2) = 0. Suppose the cost function is given as follows: c(x1, x2) = 2, c(x2, x1) = 3/4. Let H = {h1, h2, h3} be given by table 1. One can show that under this setting, h = h1 is the optimal classifier under full information release, i.e., it optimizes U({h}), and that for such h, U({h}) = 2/3. However, suppose the prior distribution over H is uniform. One can show that under this setting, and when h = h1 is the deployed classifier, releasing H = {h1, h2} implies U(H ) = 1 > U({h}) = 2/3. In other words, the learner can exploit the agent s prior by releasing information in a way that increases its own utility by a significant amount.

5Ties are broken in favor of an arbitrarily lowest cost solution. 6We emphasize that here h is fixed namely, H is the only variable in the optimization problem of the learner which is constrained to include h. 7The claim holds for any distribution D in which both x1 and x2 are in the support.

Algorithm 1: Oracle(c, H)

Input: agent x, region R = R+ R specified as, R+ = i I+ {z : hi(z) = 1} and R = i I {z : hi(z) = 0} for some I+ and I . Output: argminz R c(x, z)

In the next example, we consider the more natural setting of single-sided threshold functions in one dimension and show that the same phenomenon occurs: the optimal utility achieved by partial information release is strictly larger than the utility achieved by the full information release of h, even after the choice of h is optimized for full information release. Example 2.4 (Partial vs. Full Information Release). Suppose X = [0, 2], D is the uniform distribution over [0, 2], f(x) = 1 [x 1.9], H = {ht : t [0, 2]} where ht(x) 1 [x t]. Suppose the cost function is given by the distance c(x, x ) = |x x |. We have that under this setting, the optimal classifier in H under full information release is h = h2, and that its corresponding utility is U({h}) = 1 Prx Unif[0,2] [1 x < 1.9] = 0.55. Now suppose the agents have the following prior over H: π(h ) = 0.1 1[h = h2] + 0.9 1[h = h1.8]. Under this setting, and when h = h2 is deployed for classification, one can see that releasing H = {h2, h1.8} leads to perfect utility for the learner. We therefore have U(H ) = 1 > U({h}) = 0.55.

3 The Agents Best Response Problem

In this section we focus on the best response problem faced by the agents in our model, as described in Definition 2.2. We consider a natural optimization oracle for the cost function of the agents that can solve simple projections. We will formally define this oracle later on. Given access to such an oracle, we then study the oracle complexity8 of the agent s best response problem. First, we show that the best response problem is computationally hard even for a common family of ℓp-norm cost functions. Next, we provide an oracle-efficient algorithm9 for solving the best response problem when the hypothesis class is the class of low-dimensional linear classifiers. In Appendix B, we consider submodular cost functions and show that for any hypothesis class, there exists an oracle-efficient algorithm that approximately solves the best response problem in this setting.

Recall that the agents best response problem can be cast as the following: given an agent x X, and a distribution P (e.g., P = π|H where π is the prior and H is the released information) over a set {h1, . . . , hn} H, we want to solve argmaxz X {Prh P [h (z) = 1] c(x, z)}. We consider an oracle that given any region R X, specified by the intersection of positive (or negative) regions of hi s, returns the projection of the agent x onto R according to the cost function c: argminz R c(x, z). For example, when H is the class of linear classifiers and c(x, z) = x z 2, the oracle can compute the ℓ2-projection of the agent x onto the intersection of any subset of the linear halfspaces in {h1, . . . , hn}. We denote this oracle by Oracle(c, H) and formally define it in Algorithm 1.

Having access to such an oracle, and without further assumptions, the best response problem can be solved by exhaustively searching over all subsets of {h1, . . . , hn} because:

Pr h P [h (z) = 1] c(x, z) = max S {h1,...,hn}

h S P(h ) min z:h (z)=1, h S c(x, z)

This algorithm is inefficient because it makes exponentially many oracle calls. In what follows, we consider natural instantiations of our model and examine if we can get algorithms that make only poly(n) oracle calls. All missing proof of this sections are provided in Appendix C.

p-Norm Cost Functions. First, we consider Euclidean spaces and the common family of p-norm functions for p 1 and show that even under the assumption that the cost function of the agent belongs to this family, the problem of finding the best response is computationally hard. Formally, a p-norm cost function is defined by: for every x, x Rd, cp(x, x ) = x x p where p 1. Theorem 3.1. Ω(2n/ n) calls to the oracle (Algorithm 1) are required to compute the best response of an agent with 2/3 probability of success, even when X = R2 and the cost function is cp for p 1.

8The number of times the oracle is called by an algorithm. 9An algorithm that calls the oracle only polynomially many times.

Algorithm 2: Best Response of Agents in the Linear Case Input: agent x, cost function c, arbitrary distribution P over linear classifiers {h1, . . . , hn} Step 1. Compute the partitioning (Rn) of the space given by {h1, . . . , hn}; Initialize R1 {{z : h1(z) = 1}, {z : h1(z) = 0}}; for i = 2, . . . , n do Ri Ri 1; for R Ri 1 do if {z : hi(z) = 0} R = then Ri Ri \ R ; // Remove R Ri Ri {{z : hi(z) = 1} R, {z : hi(z) = 0} R} ; // Split R Step 2. Given Rn, compute the best response; for R Rn do Let R = R+ R where R+ = i I+ {z : hi(z) = 1} and R = i I {z : hi(z) = 0}; Call the oracle (Algorithm 1) to solve z R argminz R c(x, z); Compute the utility of z R: utility(z R) = P

i I+ P(hi) c(x, z R); Output: argmaxz Z utility(z) where Z = {z R : R Rn}

Low-Dimensional Linear Classifiers. Next, we show that when X = Rd for some d, and when H contains only linear classifiers, i.e., every h H can be written as h(x) = 1 w x + b 0 for some w Rd and b R, then the best response of the agents can be computed with O(nd) oracle calls when d n.

The algorithm, which is described in Algorithm 2, first computes the partitioning (Rn) of the space X given by the n linear classifiers. For any element of the partition in Rn, it then solves the best response when we restrict the agent to select its best response from that particular element. This gives us a set of points, one for each element of the partition. The algorithm finally outputs the point that has maximum utility for the agent. This point, by construction, is the best response of the agent. The oracle-efficiency of the algorithm follows from the observation that n linear halfspaces in d dimensions partition the space into at most O(nd) elements when d n. Formally,

Theorem 3.2. Suppose X = Rd for some d n, and H contains only linear classifiers. Then for any agent x, any cost function c, and any distribution P over {h1, . . . , hn} H, Algorithm 2 returns the best response of the agent in time O(nd+1), while making O(nd) calls to the oracle (Algorithm 1).

3.1 Generalizing to Arbitrary P

In Theorem 3.2 we require the distribution P be over {h1, . . . hn} H, e.g. to have finite support.

When this does not hold, ie. P has infinite support size, we can ignore classifiers with sufficiently small probabilities (i.e., poly(ϵ)), as they do not affect the manipulation strategy when searching for an (1 + ϵ)-approximate solution. The number of classifiers in the support with probability at poly(ϵ) for a fixed ϵ > 0 is at most 1/poly(ϵ) which is a finite number. Therefore, to obtain a nearly optimal solution, it suffices to only consider probability distributions with finite support size.

4 The Learner s Optimal Information Release Problem

In this section we focus on the learner s optimization problem as described in Definition 2.2. The learner is facing a population of agents with prior π and wants to release partial information H H so as to maximize its utility U(H). We note that the learner s strategy space can be restricted to the support of the agents prior π because including anything in H that is outside of π s support does not impact U(H). Therefore, one naive algorithm to compute the utility maximizing solution for the learner is to evaluate the utility on all subsets H support(π) and output the one that maximizes the utility. However, this solution is inefficient; instead, can we have computationally efficient algorithms? We provide both positive and negative results for a natural instantiation of our model which is introduced below.

The Setup: Classification Based on Test Scores. We adopt the following setup for the learner s information release problem. We are motivated by screening problems such as school admissions and hiring where an individual s qualification level can be captured via a real-valued number, say, a test

score. We therefore consider agents that live in the one dimensional Euclidean space: X = [0, B] R for some B. One can think of each x as the corresponding qualification level or test score of an agent where larger values of x correspond to higher qualification levels or higher test scores. As we are in a strategic setting, agents can modify their true feature x and game the learner by appearing more qualified than they actually are.

We let f(x) = 1 [x t] for some t: there exists some threshold t that separates qualified and unqualified agents. We take the hypothesis class H to be the class of all single-sided threshold classifiers: every h H can be written as h (x) 1 [x t ] for some t . We further take the cost function of the agents to be the standard distance metric in R: c(x, x ) = |x x|.10

Remark 4.1. We emphasize that considering agents in the one-dimensional Euclidean space is only for simplicity of exposition. We basically assume, for an arbitrary space of agents X, there exists a function g : X [0, B] such that f(x) = 1[g(x) t] for some t, and that the cost function is given by c(x, z) = |g(z) g(x)|. Here, g(x) captures the qualification level or test score of an agent x. Now observe that we can reduce this setting to the introduced setup of this section: take X = {g(x) : x X} [0, B], f : X {0, 1} is given by f(x ) = 1[x t], and that the cost function c : X X [0, ) is given by c(x , z ) = |z x |. Remark 4.2. Note that because every classifier h H is uniquely specified by a real-valued threshold, for simplicity of our notations, we abuse notation and use h interchangeably as both a mapping (the classifier) and a real-valued number (the corresponding threshold) throughout this section. The same abuse of notation applies to f as well.

The classifier deployed by the learner is some h f. We note that it is natural to assume h f because in our setup, higher values of x are considered better . So given the strategic behavior of the agents, the learner only wants to make the classification task harder compared to the ground truth f choosing h < f will only hurt the learner s utility. Because we will extensively make use of the fact that h f, we state it as a remark below. Remark 4.3. The learner s adopted classifier is some h H such that h f.

First, we show that under the introduced setup, the learner s optimization problem is NP-hard if the prior can be chosen arbitrarily. This is shown by a reduction from the NP-hard subset sum problem. The formal NP-hardness statement and its proof, as well as further useful facts about the agents best response under this setup are in Appendix D.

4.1 An Efficient Algorithm for Discrete Uniform Priors

Given the hardness of the learner s problem for arbitrary prior distributions, we focus on a specific family of priors, namely, uniform priors over a given set, and examine the existence of efficient algorithms for such priors. In Appendix D.2, we consider continuous uniform priors and provide closed form solutions for the learner s optimal partial information release problem.

In this section, we provide an efficient algorithm for computing the learner s optimal information release when the prior π is a discrete uniform distribution over a set {h1, h2, . . . , hn} H that includes the adopted classifier h. Here, the objective of the learner is to release a subset H {h1, h2, . . . , hn} such that h H. Throughout, we take h = hk where 1 k n, and assume h1 h2 . . . hn. The complete exposition of this section, including all details, proofs, necessary lemmas, and the description of the proposed algorithm, can be found in Appendix E.

We first characterize the utility of any subset H released by the learner using a real-valued function of H. Define, for any H {h1, . . . , hn} such that h H,

RH inf {x : BR(x, H) h} (4) Note that BR(x = h, H) h for any H such that h H. Therefore, {x : BR(x, H) h} is nonempty, and that RH h for any H such that h H. In the following lemma, we show that RH characterizes the utility of H for the learner, for any prior π over {h1, . . . , hn}. Lemma 4.4. Fix any prior π over {h1, . . . , hn}. We have that for any H {h1, . . . , hn} such that h H, the utility of the learner, given by Equation 2, can be written as

U(H) = 1 Prx D [RH < x < f] RH < f 1 Prx D [f x RH] RH f (5)

10Our results can be extended to the case where c(x, x ) = k|x x| for some constant k.

Given such characterization of the learner s utility by RH, we show that when the agents prior is uniform over {h1, . . . , hn}, there are only polynomially many possible values that RH can take, even though there are exponentially many H s. We characterize the set of possible values for RH as well. For any possible value R of RH, our algorithm efficiently finds a subset H such that RH = R, if such H exists, and finally outputs the one with maximal utility.

More formally, we consider the following partitioning of the space of subsets of {h1, . . . , hn}. For any ℓ {1, 2, . . . , n}, and for any i {k, k + 1, . . . , n}, define

Si,ℓ {H {h1, . . . , hn} : h H, |H| = ℓ, BR(h, H) = hi}

Note that BR(h hk, H) {hi : i k} for any H. Therefore, {Si,ℓ}i,ℓgives us a proper partitioning of the space of subsets, which implies

max H {h1,...,hn},h H U(H) = max i,ℓ max H Si,ℓU(H)

Given this partitioning of the space, we show that RH can be characterized as follows. Lemma 4.5. If the prior π is uniform over {h1, . . . , hn}, then for any H Si,l, RH = hi j/ℓ where j = |{h H : h (RH, hi]}|.

Given such characterization, our proposed algorithm (Algorithm 3), for any i, ℓ, enumerates over all possible j {1, . . . , ℓ} and returns a H such that H Si,ℓand RH = hi j/ℓ, if such H exists. The algorithm then outputs the subset H with maximal utility according to Equation 5. Theorem 4.6. There exists an algorithm (Algorithm 3) that for any n, any uniform prior over {h1, . . . , hn} that includes h, and any data distribution D, returns H {h1, . . . , hn} in time O(n3) such that h H , and that U(H ) = max H {h1,...,hn},h H U(H).

4.2 Minimizing False Positive (Negative) Rates for Arbitrary Priors

While so far we worked with accuracy as the utility function of the learner, in this section, we consider other natural performance metrics and provide insights on the optimal information release for the proposed utility functions, without restricting ourselves to uniform priors. In particular, we consider utility functions that are based on False Negative Rate (FNR) and False Positive Rate (FPR) which are formally defined below. For any H H such that h H,

UF P R(H) 1 FPR(H) 1 Pr x D [h(BR(x, H)) = 1|f(x) = 0] (6)

UF NR(H) 1 FNR(H) 1 Pr x D [h(BR(x, H)) = 0|f(x) = 1] (7)

In the following theorem, we establish that for any given prior π over a set {h1, h2, . . . , hn} H, if the learner aims to minimize the FPR, no-information-release is preferable to full-informationrelease.11 Additionally, we show that for minimizing the FNR, an optimal strategy for the learner is full-information-release. By no-information-release , we mean releasing any subset H such that H includes the support of the prior π: H {h1, . . . , hn} which results in π|H = π.By full-information-release , we mean revealing the classifier: H = {h}.

Theorem 4.7. Fix any h f. For any prior π over {h1, . . . , hn} that includes h, we have 1) FPR(H) FPR({h}). 2) FNR({h}) FNR(H) for every H H such that h H.

The proof is provided in Appendix F. In Appendix G, we show that minimizing FPR, unlike minimizing FNR, does not always have a clear optimal solution. We provide three instances such that full-information-release is optimal for the first, no-information-release is optimal for the second, and neither is optimal for the third.

5 Conclusion

We introduce Bayesian Strategic Classification, meaning strategic classification with partial knowledge (of the agents) and partial information release (of the learner). Our model relaxes the often

11We note that in this section, while we work with discrete priors over some {h1, . . . , hn} H, our results can be easily extended to any prior.

unrealistic assumption that agents fully know the learner s deployed classifier. Instead, we model agents as having a distributional prior on which classifier the learner is using. Our results show the existence of previously unknown intriguing informational middle grounds; we also demonstrate the necessity of revisiting the fundamental modeling assumptions of strategic classification in order to provide effective recommendations to practitioners in high-stakes, real-world prediction tasks.

Acknowledgements

The authors thank Avrim Blum for helpful discussions in the early stages of this work. Special thanks to Roy Long for suggesting that our model might leak more information than intended, and to Odelia Lorch for identifying an instance where this occurs and providing a proof (located in Appendix H).

Lee Cohen is supported by the Simons Foundation Collaboration on the Theory of Algorithmic Fairness, the Sloan Foundation Grant 2020-13941, and the Simons Foundation investigators award 689988. Kevin Stangl was supported in part by the National Science Foundation under grants CCF2212968 and ECCS-2216899, by the Simons Foundation under the Simons Collaboration on the Theory of Algorithmic Fairness, and by the Defense Advanced Research Projects Agency under cooperative agreement HR00112020003. The views expressed in this work do not necessarily reflect the position or the policy of the Government and no official endorsement should be inferred.

Y. Bechavod, K. Ligett, S. Wu, and J. Ziani. Gaming helps! learning from strategic interactions in natural dynamics. In International Conference on Artificial Intelligence and Statistics (AISTATS), pages 1234 1242, 2021.

Y. Bechavod, C. Podimata, S. Wu, and J. Ziani. Information discrepancy in strategic learning. In International Conference on Machine Learning (ICML), pages 1691 1715, 2022.

H. Beyhaghi, M. K. Camara, J. Hartline, A. Johnsen, and S. Long. Screening with Disadvantaged Agents. In 4th Symposium on Foundations of Responsible Computing (FORC), volume 256 of Leibniz International Proceedings in Informatics (LIPIcs), pages 6:1 6:20, 2023.

J. R. Biden. Executive order on the safe, secure, and trustworthy development and use of artificial intelligence. 2023.

M. Braverman and S. Garg. The role of randomness and noise in strategic classification. In Foundations of Responsible Computing (FORC), volume 156 of LIPIcs, pages 9:1 9:20, 2020.

M. Brückner and T. Scheffer. Stackelberg games for adversarial prediction problems. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 547 555, 2011.

M. El Halabi and S. Jegelka. Optimal approximation for unconstrained non-submodular minimization. In International Conference on Machine Learning, pages 3961 3972. PMLR, 2020.

G. Ghalme, V. Nair, I. Eilat, I. Talgam-Cohen, and N. Rosenfeld. Strategic classification in the dark. In International Conference on Machine Learning, pages 3672 3681. PMLR, 2021.

N. Haghtalab, C. Podimata, and K. Yang. Calibrated stackelberg games: Learning optimal commitments against calibrated agents. Advances in Neural Information Processing Systems, 36, 2023.

M. Hardt, N. Megiddo, C. Papadimitriou, and M. Wootters. Strategic classification. In Proceedings of the 2016 ACM conference on innovations in theoretical computer science, pages 111 122, 2016.

K. Harris, V. Chen, J. Kim, A. Talwalkar, H. Heidari, and S. Z. Wu. Bayesian persuasion for algorithmic recourse. Advances in Neural Information Processing Systems, 35:11131 11144, 2022.

L. Hu, N. Immorlica, and J. W. Vaughan. The disparate effects of strategic manipulation. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pages 259 268, 2019.

M. Jagadeesan, C. Mendler-Dünner, and M. Hardt. Alternative microfoundations for strategic classification. In International Conference on Machine Learning, pages 4687 4697. PMLR, 2021.

E. Kamenica and M. Gentzkow. Bayesian persuasion. American Economic Review, 101(6):2590 2615, 2011.

R. Lake. How to get a personal loan without a credit check. Time, 2024. URL https://time.com/personal-finance/article/ how-to-get-a-personal-loan-without-a-credit-check/.

S. Milli, J. Miller, A. D. Dragan, and M. Hardt. The social cost of strategic classification. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pages 230 239, 2019.

G. D. P. Regulation. General data protection regulation (gdpr). Intersoft Consulting, Accessed in October, 24(1), 2018.

A. C.-C. Yao. Probabilistic computations: Toward a unified measure of complexity. In 18th Annual Symposium on Foundations of Computer Science (sfcs 1977), pages 222 227. IEEE Computer Society, 1977.

A Limitations, and Broader Impacts, and Future Work

Limitations Potential limitations of our model are the following:

We assume the agents prior is realizable, in the sense that the classifier deployed by the learner is in the support of the prior. This is a standard assumption in machine learning works, and it will be interesting to relax it in future works.

The learner, in order to decide on the optimal level of information release, must know the agents prior. This assumption, while common in related settings like Bayesian persuasion, may be unrealistic in practice; although, in real-life, the learner may have an imperfect idea of or be able to partially recover the agents priors from previous interactions with them. Beyond this, we note that in practice, different agents may also have different beliefs and priors about the learner s model; this can affect the way the learner should release information, given that this information release may affect different users differently.

A limitation of this model is that it assumes the learner must commit to a fixed classifier in advance. In real-life, classifiers are dynamically updated over time, using the additional information obtained from each decision. However, we note that changing the screening algorithm requires significant resources, and the rate at which the classifier is updated is generally slower than the rate at which decisions are made. In practice, this means that strategic agents effectively face a fixed model in each batch between updates.

If agents know the prior distribution D and the mapping f from feature vectors to labels, they might infer information regarding h when |H| > 1. We show an example of such a case in Appendix H.

Broader Impacts On the plus side, our approach provides a deeper understanding of strategic behavior in machine learning settings, when strategic agents may not fully understand the deployed model. By doing so, we are taking the understanding of strategic classification one step closer to real life, providing useful insights on how much information a learner should provide about their model to prevent model manipulation and gaming.

One potential negative impact is that our approach takes the point of view of the learner who is solely interested in maximizing his own accuracy (or utility). It is well-known that this focus on accuracy can lead to unfairness and disparate harms across different populations; further, prior work studying fairness in the standard strategic classification setting [Hu et al., 2019, Milli et al., 2019] and in a related partial information setting [Bechavod et al., 2022] have shown that strategic classification can amplify these disparities.

Future Work [Hu et al., 2019, Milli et al., 2019] consider disparities across different groups due to differing cost functions. In our model of strategic classification, different population groups may not only have differing cost functions but also differing prior distributions: network homophily, social disparities, and stratification can cause population groups to have distinct priors, leading to further disparities across groups. In turn, it will be critical in future work to design fairness-aware information release strategies by a learner faced with strategic behavior.

B Oracle-Efficient Approximate Best Response for V -Submodular Costs

In this section we give a sufficient condition on the cost function under which we can give an approximation algorithm for the best response of the agents. In particular, for a given collection of classifiers V H, we introduce the notion of V -submodular cost functions which is a natural condition that can arise in many applications. Borrowing results from the literature on submodular optimization, we then show that for any distribution P over a set V = {h1, . . . , hn}, if the cost function is V -submodular, there exists an oracle-efficient approximation algorithm for the best response problem. Recall, from Equation (3), that the best response problem faced by agent x can be written as max S {h1,...,hn} gx(S) P

h S P(h ) c(x, S) where, with slight abuse of notation, we define c(x, S) min z:h (z)=1, h S c(x, z) (8)

For any S {h1, . . . , hn}, c(x, S) is simply the minimum cost that the agent x has to incur in order to pass all classifiers in S, and can be computed via the oracle (Algorithm 1). We now state our main assumption on the cost function: Definition B.1 (V -Submodularity). Let V = {h1, . . . , hn} be any collection of classifiers. We say a cost function c is V -submodular, if for all x, the set function c(x, ) : 2V R defined in Equation 8 is submodular: for every S, S V such that S S and every h / S ,

c (x, S {h }) c (x, S) c (x, S {h }) c (x, S )

This condition asks that the marginal cost of passing the new classifier h is smaller when the new classifier is added to S versus S, for any such h , S, S . Fix a collection of classifiers V . Informally speaking, a cost function is V -submodular if the agent s manipulation to pass a classifier only helps her (i.e., reduces her cost) to pass other classifiers: the more classifiers the agent passes, it becomes only easier for her to pass an additional classifier. This can happen in settings where some of the knowledge to pass a certain number of tests is transferable across tests. Some real-life examples include: 1) a student that is preparing for a series of math tests on topics like probability, statistics, and combinatorics. 2) a job applicant who is applying for multiple jobs within the same field and preparing for their interviews.

We give a formal example of a V -submodular cost function below. In particular, we show that when X = R, the cost function c(x, x ) = |x x | is V -submodular where V can be any set of single-sided threshold classifiers. Claim B.2. Let X = R, and V = {h1, . . . , hn} where every hi can be written as hi(x) = 1[x ti] for some ti R. We have that the cost function c(x, x ) = |x x | is V -submodular.

Proof of Claim B.2. We will abuse notation and use hi for the threshold ti (hi ti R).

Fix any x. Consider S S {h1, . . . , hn} and h R such that h / S . Note that

c(x, S) = max (max(S) x, 0) , c (x, S {h }) = max (max(S {h }) x, 0)

c(x, S ) = max (max(S ) x, 0) , c (x, S {h }) = max (max(S {h }) x, 0) where max(F) is simply the largest threshold in F, for any set F. Note that max(S) max(S ) because S S . Suppose x max(S). We have three cases

1. If h max(S ), then

c (x, S {h }) c (x, S) = h max(S) h max(S ) = c (x, S {h }) c (x, S )

2. If max(S) h max(S ), then

c (x, S {h }) c (x, S) = h max(S) 0 = c (x, S {h }) c (x, S )

3. If h max(S), then

c (x, S {h }) c (x, S) = c (x, S {h }) c (x, S ) = 0

So we have shown that the cost function is submodular if x max(S). We can similarly, using a case analysis, show that the cost function is submodular when x > max(S).

We now state the main result of this section. Theorem B.3. Fix any H and any distribution P over some V = {h1, . . . , hn} H. If the cost function c is V -submodular, then there exists an algorithm that for every agent x and every ϵ > 0, makes O(n/ϵ2) calls to the oracle (Algorithm 1) and outputs a set ˆS V such that gx( ˆS) max S V gx(S) ϵ.

Proof of Theorem B.3. Note that when the cost function is V -submodular, the objective function gx can be written as the difference of a monotone non-negative modular function12 and a monotone non-negative submodular function: gx : 2V R, gx(S) = P

h S P(h ) c(x, S). The result then follows from [El Halabi and Jegelka, 2020] where they provide an efficient algorithm for approximately maximizing set functions with such structure.

12A set function r is modular if r(S) = P

s S r(s) for any S.

C Missing Proofs of Section 3

Theorem 3.1. Ω(2n/ n) calls to the oracle (Algorithm 1) are required to compute the best response of an agent with 2/3 probability of success, even when X = R2 and the cost function is cp for p 1.

Proof of Theorem 3.1. To prove the claim, we reduce the following hidden-set detection problem with EQUALTO( ) oracle to our best response problem. In hidden-set detection problem, given two players, Alice and Bob, with Bob possessing a hidden subset S [n] of size n/2, Alice aims to identify Bob s set S using the minimum number of queries to Bob. She has query access to EQUALTO(T ) oracle that checks whether her set T [n] matches Bob s set (S ). It is trivial that any randomized algorithm for the hidden-set detection problem with success probability of at least 1 O(1) requires n n/2 queries in the worst-case scenario. This is via a straightforward application of Yao s Min-Max principle Yao [1977]: consider a uniform distribution over all subsets of size n/2 from [n], as the Bob s set. Then, after querying half of the subsets of size n/2, the failure probability of Alice in detecting Bob s set is at least (1 1/n)(1 1/(n 1)) (1 1/(n/2)) > (1 2/n)n/2 > e 1(1 2/n) > 1/3 for sufficiently large values of n.

Next, corresponding to an instance of the hidden-set detection problem with S , we create an instance of the agents best response problem and show that any algorithm that computes the best response with success probability at least 2/3 using N oracle calls (Algorithm 1), detects the hidden set of the given instance of the hidden-set detection problem using at most N calls of EQUALTO( ) with probability at least 2/3. Hence, computing the best response problem with success probability at least 2/3 requires n n/2 = Ω(2n/ n) oracle calls.

Let n = 2k and ϵ < 1/n. Corresponding to every subset S [n] of size n/2 1, there is a distinct point x S at distance 1/2 ϵ from the origin, i.e., x S p = 1/2 ϵ. Corresponding to every subset S [n] of size n/2, there are two distinct points x S,n and x S,f at distances respectively 1/2 ϵ (near) and 1/2 + ϵ (far) from the origin, i.e., x S,n p = 1/2 ϵ and x S,f p = 1/2 + ϵ.

Now, we are ready to describe the instance IS of our best response problem corresponding to the given hidden-set detection problem with S . We define H = {h1, , hn} and distribution P over H such that,

P is a uniform distribution over all classifiers H, i.e., P(hi) = 1/n for every i [n].

For every subset T [n] of size n/2 1, we define hi(x T ) = 1[i T].

For every subset T [n] of size n/2, we define hi(x T,f) = 1[i T]. Moreover, if T = S , then hi(x T,n) = 0 for all i [n]. Otherwise, if T = S , we define hi(x T,n) = 1[i T].

Finally, for the remaining points in X, i.e., x R2 \ ({x T : T [n] and |T| = n/2 1} {x T,n, x T,f : T [n] and |T| = n/2}), we define hi(x ) = 0 for all i [n]. In other words, points that do not correspond to subsets of size n/2 1 or n/2 are classified as negative examples by every classifier in H.

In the constructed instance IS , no point is classified as positive by more than n/2 classifiers in H, and the p-norm distance from the origin for all points classified as positive by a subset of classifiers is at least 1/2 ϵ. Therefore, the best response for an agent located at the origin of the space is x S ,n, yielding a utility of 1/2 (1/2 ϵ) = ϵ > 0. Hence, the computational task in computing the best response involves identifying the (hidden) subset S . Refer to Figure 1 for a description of IS .

Although we described the construction of IS , what we need to show to get the exponential lower bound on the oracle complexity of the best response problem is constructing an oracle (i.e., an implementation of Algorithm 1), using the EQUALTO( ) oracle, consistent with IS . To do so, given a subset of classifiers specified by T [n], the oracle returns as follows:

if |T| > n/2: It returns an empty set.

if |T| < n/2: It returns x T for an arbitrary set T T of size n/2 1. Note that x T p = 1/2 ϵ.

if |T| = n/2 and EQUALTO(T) = FALSE: It returns x T,f. Note that x T,f p = 1/2 + ϵ.

Figure 1: In this example, we consider p = 2, i.e., c(x, x ) = x, x 2. The agent is located at the origin. Blue nodes correspond to a point in the intersection of the positive regions of subsets of classifiers of size n

2 1, each located at a Euclidean distance of 1/2 ϵ from the origin, where ϵ is a small positive value. Moreover, points in the intersection of the positive regions of subsets classifiers of size n

2 are indicated by red points, all except the one corresponding to S are located at a Euclidean distance of 1/2 + ϵ from the origin. The red point corresponding to S is uniquely placed at a distance of 1/2 ϵ from the origin, similar to the blue nodes. Furthermore, all points, corresponding to different subsets, are located at distinct locations in the space.

if |T| = n/2 and EQUALTO(T) = TRUE: It returns x T,n. Note that x T,n p = 1/2 ϵ.

Remark C.1. As in each instance IS the only point with strictly positive utility is x S ,n, our proof for Theorem 3.1 essentially rules out the existence of any approximation algorithm for the best response problem with success probability at least 2/3 using o(2n/ n).

Theorem 3.2. Suppose X = Rd for some d n, and H contains only linear classifiers. Then for any agent x, any cost function c, and any distribution P over {h1, . . . , hn} H, Algorithm 2 returns the best response of the agent in time O(nd+1), while making O(nd) calls to the oracle (Algorithm 1).

Proof of Theorem 3.2. The fact that Algorithm 2 returns the best response follows from the construction of the algorithm. We first prove the oracle and the runtime complexity for d = 2 and then generalize it to any d. The oracle complexity of Algorithm 2 is |Rn|. Note that |R1| = 2, and for any n 2, |Rn| |Rn 1| + n. This is because the line {z : hn(z) = 0} intersects the lines formed by {h1, . . . , hn 1} in at most n 1 points, which will then partition {z : hn(z) = 0} into at most n segments. Each segment of the new line then splits a region in Rn 1 into two regions. So, there are at most n new regions when hn is introduced. The recursive relationship implies that |Rn| 1 + n(n+1)

2 = O(n2). The runtime complexity of the algorithm is then given by O (Pn i=1 |Ri|) = O(n3).

Now consider any dimension d and let R(n, d) denote the number of partitions induced by the classifiers {h1, . . . , hn}. Note that in this case we have R(n, d) R(n 1, d) + R(n 1, d 1). The first term on the right hand side is the number of regions induced by {h1, . . . , hn 1} and the second term is the number of splits (dividing a region into two) when hn is introduced. Note that {z : hn(z) = 0} is a d 1-dimensional hyperplane and the number of splits induced by hn is simply the number of regions induced by {h1, . . . , hn 1} on {z : hn(z) = 0}, which is R(n 1, d 1). The recursive relationship implies that |Rn| = R(n, d) Pd j=0 n j = O(nd).

D Missing Details of Section 4

We first state a remark about the tie-breaking of agents best response problem:

Remark D.1. As mentioned in the model section, when there are several utility-maximizing solutions for the agents, we always break ties in favor of the lowest cost solution. Furthermore, each agent x in our setup manipulate only to larger values of x (x x); this is formally stated in the first part of

Lemma D.2. Therefore, the tie-breaking of agents is in favor of smaller values of x in our setup. In other words, given some released information H, an agent x chooses

BR(x, H) = min argmax x x ux(x , H) (9)

In the rest of this section, when we write argmaxx x ux(x , H), we implicitly are taking the smallest x x that maximizes the utility of the agent x.

Next, we state some useful facts about the agents best response in our setup. Lemma D.2. Fix any prior π and any points x2 x1. We have that, for any H H,

1. BR(x1, H) x1.

2. BR(x2, H) BR(x1, H).

3. If BR(x1, H) x2, then BR(x1, H) = BR(x2, H).

Proof of Lemma D.2. Fix any x, and any H. We have

BR(x, H) = argmax x

Pr h π|H [x h ] |x x|

= argmax x x

Pr h π|H [x h ] (x x)

= argmax x x

Pr h π|H [x h ] x

= argmax x x g H(x )

where we take g H(x ) Prh P |H [x h ] x . The first equality follows because agents do not gain any utility by moving to a point x < x, and that tie-breaking is in favor of lowest cost solution.

The first and the second part of the lemma follows from this derivation. For the third part, we have

BR(x1, H) = argmax x x1 g H(x ) = argmax x x2 g H(x ) = BR(x2, H)

where the second equality follows because BR(x1, H) x2.

D.1 NP-Hardness of Learner s Optimization Problem with Arbitrary Prior Distributions

In this section, we formally state the NP-hardness of the learner s optimization problem in the general setting. Theorem D.3. Consider an arbitrary prior π over a set of threshold classifiers {h1, h2, . . . , hn} H that includes h. The problem of finding H {h1, h2, . . . , hn} so that h H and the learner s utility U(H) is maximized is NP-hard.

Proof of Theorem D.3. The proof is via a reduction from the subset sum problem. In particular, we consider a variant of the subset sum problem in which we are given a set of n positive numbers a1, , an, and the goal is to decide whether a subset S [n] such that P

i S ai = T := (1/2) P

i [n] ai exists.

Given an instance of the subset sum problem with input ({a1, , an}, T := (1/2) P

i [n] ai), we construct the following instance of our problem with one-dimensional threshold classifiers. Define f(x) = 1 [x 0], h(x) = 1 [x 2/3], and hi(x) = 1 [x 100 + i] for every i [n]. Moreover, suppose that the prior distribution of the agents π is given by: π(h) = 1/2 and for every i [n], π(hi) = ai/(4T). Note that π(h) + P

i π(hi) = 1. Let the data distribution D be the uniform distribution over [ 1000, 1000].

Intuitively speaking, the inclusion of hi s in H have no direct effect on the accuracy of the released subset H, as they can only lead to a subset of the agents located at x 100 to manipulate. However,

their presence in H will impact the probability mass of h under the posterior π|H, which is given by π|H(h) = π(h)/π(H) ρH. We will show that the learner can achieve perfect accuracy if and only if in the given instance subset sum problem there exists a subset which sums up to T. To see this consider the following cases for the released information H.

Case 1: ρH > 2/3. For any such H, all agents at distance ρH from 2/3 gain positive utility by manipulating to x = 2/3. Hence, the utility of such solutions for the learner is given by 1 Prx D[x [ 2

3 ρH, 0)] < 1.

Case 2: ρH < 2/3. For any such H, as all classifiers in H \ {h} are located at t > 100, no agent belonging to [0, 2/3 ρH) gain positive utility by manipulating to x = 2/3. Hence, these points will be misclassified by h, and consequently, the utility of such solutions for the learner is given by 1 Prx D[x [0, 2

3 ρH)] < 1.

Case 3: ρH = 2/3. By similar arguments to the previous cases, all agents belonging to [0, 2/3) manipulate to x = 2/3 and all points with negative labels (x < 0) stay at their location. Therefore, no one will be misclassified, and therefore, the utility of such solutions is 1.

Note that because ρH = π(h) π(H) = 1/2 1/2+π(H {h1, ,hn}), we have that ρH = 2/3 if and only if π(H {h1, , hn}) = 1/4. But π(H {h1, , hn}) = 1/(4T) P

hi H ai. We therefore have that ρH = 2/3 if and only if P

hi H ai = T. Hence, deciding whether the learner s optimization problem has a solution with perfect utility is equivalent to deciding whether in the given subset sum problem there exists a subset S [n] such that P

i S ai = T := (1/2) P

D.2 A Closed-form Solution for Continuous Uniform Priors

In this section, we provide closed-form solutions for continuous uniform priors. More concretely, we assume in this section that π is the uniform distribution over an interval [a, b] R that includes h. The information release of the learner will then be releasing an interval H = [c, d] [a, b] such that h [c, d].

For example, a student may know that a GPA of 3.5 or higher will guarantee admission to a certain college, but not the exact threshold. Similarly, a loan applicant might know that a credit score above 650 will likely suffice for securing a loan, but not the precise cutoff. These uncertainties are sometimes due to factors unknown to the agents, such as the financial situation of the lender. Therefore, agents treat the threshold as uniformly distributed within a known and reasonable range.

Theorem D.4. Fix any data distribution D over X. Suppose the prior π is uniform over an interval [a, b] for some a, b such that h [a, b]. Define Hc [c, d] where d min (b, max (h, f + 1)).

If b a < 1, we have that H = Hc is optimal for any c [a, h], with corresponding utility

U(Hc) = 1 Prx D [d 1 < x < f] d 1 < f 1 Prx D [f x d 1] d 1 f

If b a 1, we have that for any c (b 1, h], the optimal solution H is given by

H = Hc U(Hc) > U([a, b]) [a, b] U(Hc) U([a, b])

where U(Hc) is given above and U([a, b]) = 1 Prx D [f x < h] is the utility of releasing [a, b].

Proof of Theorem D.4. Suppose H = [c, d] [a, b] is the released information by the learner. The agents then project their uniform prior π over [a, b] onto H, which leads to the uniform distribution over [c, d] for π|H, and then best respond according to π|H. Therefore, for any agent x,

BR(x, H) = argmax x x

Pr h Unif[c,d] [x h ] (x x)

One can then show that if d c 1, BR(x, H) = x for all x because for any manipulation (x > x), the marginal gain in the probability of receiving a positive classification is less than the marginal cost.

Furthermore, if d c < 1, then we have

BR(x, H) = d d 1 < x < d x Otherwise

Therefore, for any H = [c, d] [a, b], if d c 1, we have U(H) = 1 Prx D [f x < h], and if d c < 1, we have

U(H) = 1 Prx D [d 1 < x < f] d 1 < f 1 Prx D [f x d 1] d 1 f

This is because under d c 1, no one manipulates, and thus, the error corresponds to the probability mass between f and h: the positives who cannot manipulate to pass h. Under d c < 1, because every agent x > d 1 can receive positive classification by manipulating, the error of H corresponds to the probability mass between d 1 and f: if d 1 < f, this corresponds to the negatives who can manipulate and receive positive classification, and if d 1 f, this corresponds to the positives who cannot manipulate to receive positive classification.

Now assume b a < 1, which implies that d c < 1. At a high level, to maximize U(H) in this case, we want to pick d such that d 1 is as close as possible to f. More formally, our goal is to pick [c, d] [a, b] such that h [c, d] and that the probability mass between d 1 and f is minimized. In this case, one can see, via a case analysis, that d = min (b, max (h, f + 1)) is the optimal value, and that c can be any point in [a, h].

If b a 1, then both d c < 1 and d c 1 are possible. If d c < 1, then the optimality of [c, d] where c is any point in (b 1, h], and d = min (b, max (h, f + 1)) can be established as above. Note that the choice of c (b 1, h] guarantees that d c < 1. If d c 1, then the utility of the learner doesn t change if [c, d] = [a, b] simply because the agents do not manipulate for any [c, d] such that d c 1. Finally, the optimal interval is chosen based on which case (d c < 1 vs. d c 1) leads to higher utility.

E The Complete Exposition of Section 4.1

In this section we will provide an efficient algorithm for computing the learner s optimal information release when the prior π is a discrete uniform distribution over a set {h1, h2, . . . , hn} H that includes the adopted classifier h. The objective of the learner is to release a H {h1, h2, . . . , hn} such that h H. Throughout, we take h = hk where 1 k n, and assume h1 h2 . . . hn.

We first state some facts about the agents best response for any prior π over {h1, h2, . . . , hn} H. To start, we first show that the best response of any agent can be characterized as follows:

Lemma E.1. For any agent x, and any prior π over {h1, h2, . . . , hn} H, we have BR(x, H) {x} {hi H : hi > x}.

Proof of Lemma E.1. Recall from Lemma D.2 that BR(x, H) x. Note that the utility of the agent x X from manipulating to a point x x can be expressed as

ux(x , H) = X

i:hi x π|H(hi) (x x)

For any x x such x / {x} {hi H : hi > x}, it is easy to see that the agent can increase her utility by moving to a point in {x} {hi H : hi > x}, which proves the lemma.

This lemma basically tells us that the best response of any agent x is either to stay at its location, or to manipulate to hi H such that hi > x. Given such characterization of the agents best response in our setup, we now characterize, for any classifier hi in the support of π, the set of agents that will manipulate to hi.

Lemma E.2. Fix any prior π over {h1, . . . , hn} and any H. If for any i, {x : BR(x, H) = hi} = , then for some α, {x : BR(x, H) = hi} = (α, hi], where α satisfies uα(α, H) = uα(hi, H).

Proof of Lemma E.2. Let α = inf {x : BR(x, H) = hi}. Take any x (α, hi]. We have, by the definition of α, that there exists x (α, x) such that x {x : BR(x, H) = hi}, implying BR(x , H) = hi. Therefore, BR(x , H) x. The third part of Lemma D.2 implies that hi = BR(x , H) = BR(x, H). This proves that

(α, hi] {x : BR(x, H) = hi}

If x > hi, then BR(x, H) > hi by the first part of Lemma D.2. Therefore x / {x : BR(x, H) = hi}.

If x < α, then BR(x, H) < hi by the definition of α, implying x / {x : BR(x, H) = hi}.

If x = α, we will show that BR(x, H) = α. Note that α BR(α, H) hi by Lemma D.2. But if BR(α, H) > α, then by the third part of Lemma D.2, BR(α, H) = hi. So BR(α, H) {α, hi}. Suppose BR(α, H) = hi. Therefore, there exists ϵ > 0 such that uα(α, H) + ϵ < uα(hi, H), by the definition of agents best response and the fact that tie-breaking is in favor of smaller values (Remark D.1). Let ϵ (0, ϵ/2] be such that {h H : α ϵ h < α} = . Consider x = α ϵ . We have, by Lemma D.2 and E.1, that BR(x , H) {x } [α, hi]. But because BR(α, H) = hi, Lemma D.2 implies that BR(x , H) {x , hi}. Note that

ux (x , H) uα(α, H) < uα(hi, H) ϵ = ux (hi, H) (ϵ ϵ )

implying that BR(x , H) = hi. But x = α ϵ and this contradicts with the definition of α. Therefore BR(α, H) = α, and this completes the proof of the first part of the lemma.

We now focus on the second part of the lemma. Note that uα(α, H) uα(hi, H), because if uα(α, H) < uα(hi, H), then α < BR(α, H) hi. Together with the first part of this lemma, and Lemma D.2, this implies that BR(α, H) = hi which is a contradiction with the first part of the lemma. Next, we show that uα(α, H) uα(hi, H). Suppose uα(α, H) > uα(hi, H) + ϵ for some ϵ > 0. Consider x = α + ϵ/2. We have that

ux(x, H) uα(α, H) > uα(hi, H) + ϵ = ux(hi, H) + ϵ/2

implying that BR(x, H) = hi. This is in contradiction with the first part of the lemma. Therefore, uα(α, H) = uα(hi, H).

Next, we characterize the utility of any subset H released by the learner using a real-valued function of H. Define, for any H {h1, . . . , hn} such that h H,

RH inf {x : BR(x, H) h} (10)

Note that BR(x = h, H) h for any H such that h H. Therefore, {x : BR(x, H) h} is nonempty, and that RH h for any H such that h H. Our next lemma shows that RH characterizes the utility of H for the learner, for any prior π over {h1, . . . , hn}. Lemma 4.4. Fix any prior π over {h1, . . . , hn}. We have that for any H {h1, . . . , hn} such that h H, the utility of the learner, given by Equation 2, can be written as

U(H) = 1 Prx D [RH < x < f] RH < f 1 Prx D [f x RH] RH f (5)

Proof of Lemma 4.4. Recall that U(H) = Prx D [h(BR(x, H)) = f(x)]. The claim follows from the fact that h(BR(x, H)) = 1 if and only if x > RH. Note that if x > RH, then BR(x, H) h (equivalently, h(BR(x, H)) = 1) by the definition of RH and Lemma D.2. Further, if BR(x, H) h then x > RH by the definition of RH.

Given such characterization of the learner s utility, we will show that when the agents prior is uniform over {h1, . . . , hn}, there are only polynomially many possible values that RH can take, even though there are exponentially many H s. Our algorithm then for any possible value R of RH, finds a subset H such that RH = R, if such H exists. The algorithm then outputs the H with maximal utility according to Equation 5. More formally, we consider the following partitioning of the space of subsets of {h1, . . . , hn}. For any ℓ {1, 2, . . . , n}, and for any i {k, k + 1, . . . , n}13, define

Si,ℓ= {H {h1, . . . , hn} : h H, |H| = ℓ, BR(h, H) = hi}

13Recall k is the index of h in {h1, . . . , hn}, i.e., h = hk.

Note that by Lemma E.1, BR(h hk, H) {hi : i k} for any H. Therefore, {Si,ℓ}i,ℓgives us a proper partitioning of the space of subsets, which implies max H {h1,...,hn},h H U(H) = max i,ℓ max H Si,ℓU(H)

We will show that when the prior is uniform, solving max H Si,ℓU(H) can be done efficiently, by showing a construction of the optimal H Si,ℓin our algorithm. To do so, we first show that RH (defined in Equation 10) can be characterized by hi, when we restrict ourselves to H Si,ℓ. Lemma E.3. Fix any prior π over {h1, . . . , hn}. If H Si,ℓ, then {x : BR(x, H) = hi} = (RH, hi].

Proof of Lemma E.3. We first show that RH = inf {x : BR(x, H) = hi}. Fix any H Si,ℓ. Let QH = inf {x : BR(x, H) = hi}. First note that because H Si,ℓ, we have BR(h, H) = hi, and therefore {x : BR(x, H) = hi} = , and that QH h. Additionally, because {x : BR(x, H) = hi} {x : BR(x, H) h} we have that QH RH. So we have RH QH h. If QH = RH, then there exists RH < x < QH, such that h BR(x, H) < hi. But, for x = BR(x, H), we have BR(x , H) = hi > x = BR(x, H). This is in contradiction with the third part of Lemma D.2 (taking x1 = x, and x2 = x ). Therefore, QH = RH, and this proves the first part of the lemma. The second part of the lemma is followed from part one and Lemma E.2.

In particular, this Lemma implies that for H Si,ℓ, we have RH = inf {x : BR(x, H) = hi}. Next, we demonstrate the possible values that RH can take for uniform priors. In particular, the following lemma establishes that RH can take only polynomially many values. Lemma E.4. If the prior π is uniform over {h1, . . . , hn}, then for any H Si,l, RH = hi j/ℓ where j = |{h H : h (RH, hi]}|.

Proof of Lemma E.4. Fix any H Si,ℓ. Note that Lemma E.3 and Lemma E.2 together imply that u RH(RH, H) = u RH(hi, H). This implies Pr h π|H [RH h ] = Pr h π|H [hi h ] + (hi RH)

But Prh π|H [RH h ] = j1/ℓand Prh π|H [RH h ] = j2/ℓwhere j1 and j2 are the number of hypotheses in H that are smaller (or equal to) RH, and smaller (or equal to) hi, respectively. In other words, j1 = |{h H : h RH}| , j2 = |{h H : h hi}| Therefore,

RH = hi j2 j1

ℓ which completes the proof.

Given such characterization, Algorithm 3, for any i, ℓ, enumerates over all possible j {1, . . . , ℓ} and returns a H such that H Si,ℓand RH = hi j/ℓ, if such H exists. To elaborate, for any i, ℓ, j, such H Hi,ℓ j is constructed by first picking the j largest classifiers that are between hi j/ℓand hi (including both hi and h). If there are not at least j classifiers between hi j/ℓand hi, then no such H exists for i, ℓ, j because of Lemma E.4. After picking the first j elements as described, the remaining ℓ j classifiers are first chosen from all classifiers that are less than (or equal to) hi j/ℓ, and once these classifiers are exhausted, the rest are taken from the classifiers that are larger than hi, starting from the largest possible classifier, and going down until ℓclassifiers are picked.

Note that this construction of H Hi,ℓ j guarantees that BR(hi, H ) is minimized among all H s with corresponding values of (i, ℓ, j). Therefore, if BR(hi, H) > hi, it is guaranteed that no H exists for (i, ℓ, j). If BR(hi, H) = hi, then the construction of H Hi,ℓ j guarantees that RH = inf {x : BR(x, H) = hi} is as small as possible. Therefore, if inf {x : BR(x, H) = hi} > hi j/ℓ, then it is guaranteed that no such H exists for (i, ℓ, j) (note that inf {x : BR(x, H) = hi} hi j/ℓ by construction). The algorithm finally outputs, among all H s found, the subset H with maximum utility according to Equation 5.

This proves the following theorem.

Algorithm 3: The Learner s Optimization Problem: Discrete Uniform Prior Input: ground truth classifier f, adopted classifier h f, prior s support {h1, . . . , hn} where h1 . . . hn and hk = h, data distribution D for i = k, k + 1, . . . , n do for ℓ= 1, 2, . . . , n do for j = 1, 2, . . . , ℓdo R hi j/ℓ; // candidate value R for RH. S1 {h {h1, . . . , hn} : R < h hi} ; // all classifiers between R and hi. S2 {h {h1, . . . , hn} : h R} ; // all classifiers smaller than R. S3 {h {h1, . . . , hn} : h > hi} ; // all classifiers larger than hi. if R h or |S1| < j then Hi,ℓ j ; // no H exists for (i, ℓ, j). else Hi,ℓ j {h, hi};

Hi,ℓ j Hi,ℓ j MAXj |Hi,ℓ j | S1 \ Hi,ℓ j ; // MAXm( ) m largest elements

if |S2| ℓ j then T any subset of size ℓ j from S2 else T S2 MAXℓ j |S2| (S3) ; // MAXm( ) m largest elements Hi,ℓ j Hi,ℓ j T; if BR(hi, Hi,ℓ j ) > hi then Hi,ℓ j ; // no H exists for (i, ℓ, j). else if inf n x : BR(x, Hi,ℓ j ) = hi o > R then

Hi,ℓ j ; // no H exists for (i, ℓ, j). if Hi,ℓ j = then U i,ℓ j ; else if R < f then U i,ℓ j 1 Prx D [R < x < f] ; // utility according to Equation 5. if R f then U i,ℓ j 1 Prx D [f x R] ; // utility according to Equation 5.

Output: H = Hi ,ℓ

j where (i , ℓ , j ) argmax(i,ℓ,j) U i,ℓ j .

Theorem 4.6. There exists an algorithm (Algorithm 3) that for any n, any uniform prior over {h1, . . . , hn} that includes h, and any data distribution D, returns H {h1, . . . , hn} in time O(n3) such that h H , and that U(H ) = max H {h1,...,hn},h H U(H).

F Missing Proof of Section 4.2

Theorem 4.7. Fix any h f. For any prior π over {h1, . . . , hn} that includes h, we have 1) FPR(H) FPR({h}). 2) FNR({h}) FNR(H) for every H H such that h H.

Proof of Theorem 4.7. We begin by showing that FPR(H) FPR({h}). Let x X be such that f(x) = 0 and h(BR(x, H)) = 1. We will show that h(BR(x, {h})) = 1.

Lemma E.1 together with h f imply the existence of hj such that BR(x, H) = hj > x (as f(x) = h(BR(x, H))). This further indicates that when H is released, the utility of the agent is strictly higher when it manipulates to hj, compared to not moving:

ux(hj, H) =

i=1 π(hi) (hj x) > X

i:hi x π(hi) = ux(x, H)

Note that h(BR(x, H)) = 1 and BR(x, H) = hj implies that hj h, and therefore:

ux(h, {h}) = 1 (h x)

i=1 π(hi) (hj x) > X

i:hi x π(hk) = ux(x, {h})

Since Lemma E.1 implies that BR(x, {h}) {x, h}, we derive from the above inequality that h(BR(x, {h})) = 1. This proves the first part of the theorem.

Next, we show that FNR({h}) FNR(H) for every H H. Let x X be such that f(x) = 1 and h(BR(x, {h})) = 0, and let H be any subset of H. We will show that h(BR(x, H)) = 0.

Lemma E.1 implies that BR(x, {h}) {x, h}. Together with h(BR(x, {h})) = 0, we derive that BR(x, {h}) = x, and thus:

ux(h, {h}) = 1 (h x) ux(x, {h}) = 0

Now, for every hj such that hj h, we have:

ux(hj, H) =

i=1 π|H(hi) (hj x) 1 (h x) 0 X

i:hi x π|H(hi) = ux(x, H).

As a result, when the learner releases H, the utility of agent x from remaining at x is greater than (or equal to) any manipulation hj such that hj h. This implies that h(BR(x, H)) = 0.

G Optimal Information Release for Minimizing FPR

We show that minimizing FPR, unlike minimizing FNR, does not always have a clear optimal solution for the learner, by providing three examples with very different optimal solutions. Example G.1 (Full-information-release is optimal for FPR). Fix any B > 1 and any 0 t < B 1. Let D be the uniform distribution over X = [0, B], and f(x) = 1 [x t]. Let H be the class of single-sided threshold classifiers and suppose the adopted classifier h(x) = 1 [x t + 1]. Under any prior over H, one can show that the full information release of H = {h} achieves perfect FPR for this setting: FPR({h}) = 0. Example G.2 (No-information-release is optimal for FPR). Under the same setup as in Example 2.4, one can show that releasing the support of the prior H = {h1.8, h2} achieves FPR(H) = 0, whereas full information release of the adopted classifier h = h2 achieves FPR({h}) = 0.9/1.9 0.47. Note that H = {h1.8, h2} is the support of the prior, so it constitutes as no-information-release. In other words, we have FPR(H ) = FPR(H) = 0 for every H such that H H H. Claim G.3. There exists an instance in which neither full-information-release nor no-informationrelease are optimal when the utility function of the learner is UF P R.14

Proof of Claim G.3. We construct such an instance as follows. Suppose the domain is X = {x1, x2} with x1 = 0, x2 = 0.4 and the distribution D is given by D(x1) = D(x2) = 0.5. In addition, consider f = 0.3, and hypothesis class H = {h1, h2, h3} where h1 = 0.1, h2 = 0.5, h3 = 0.7, and a prior distribution π such that π(h1) = 0.2, π(h2) = 0.1, π(h3) = 0.7.

Observe that under full-information-release, FPR({h}) = 1 for every h H. Now suppose h = h2 is the adopted classifier. We have that BR(x1, {h}) = 0.5 implying h(BR(x1, {h})) = 1 = f(x1) = 0 implying x1 is a false positive under {h} release. Additionally, BR(x1, H) = 0.7 implies that h(BR(x1, H)) = 1 = f(x1) = 0 implying x1 is a false positive under H release. Further, it holds that BR(x1, {h1, h2}) = 0.1 and so h(BR(x1, {h1, h2})) = 0 = f(x1). Moreover, in this particular instance, releasing {h1, h2} achieves perfect utility as BR(x2, {h1, h2}) = 0.5 which implies h(BR(x2, {h1, h2})) = 1 = f(x2).

H Possible Information Leakage Through Firm s Choice of H

One limitation of our model is that if the agents have knowledge of the mapping f from feature vectors to labels, they might gain information on h in cases when |H| > 1. More specifically,

14We remark that the claim holds when the utility function is U (as defined in Equation 2) as well.

knowing the mapping D, the mapping f, and that the firm is optimizing the choice of H for accuracy, agents could deduce h . In this case, the choice of H leaks more information than intended.

We proceed by showing an example with threshold classifiers for such a case.

Example H.1. Suppose a distribution D over agents x X = [0, 1] is uniform, c(x, x ) = |x x |, and f(x) = 1 [x 0.15]. The set of classifiers available to the firm is H = {h1, h2} where α1 = 0.1, α2 = 0.9 is each classifier s respective threshold.

Following our Bayesian model, the firm chooses to release a subset H H over which the agents have a uniform prior π|H. The agents know that the firm is choosing H to optimize accuracy, i.e. the function U(H) = Prx D[h ( H(x)) = y] (where y = 1[x T]). We will show that the firm can release a subset H which is in arg max H H[U(H)|h = h1] but not in arg max H H[U(H)|h = h2], allowing the agent to reason that h must be h1.

Proposition H.2. Consider Example H.1. If the agents know agents know that f(x) = 1 [x 0.15] and the prior D, they can infer that h = h1.

Proof. We first solve for arg max H H[U(H)|h = h1]. Suppose H = {h1}. Then all agents know h = h1. If x α1, the agent will manipulate to α1 to get a positive outcome if 1 c(x, α1) > 0, which is always true. If x α1, agents will stay the same to get a positive outcome. So the set of misclassified agents is those with x [0, T] and U({h1}) = 1 T = 0.85. Now suppose H = {h1, h2}. Agents believe each classifier is h with probability 0.5. If x α2, agents are guaranteed a positive outcome and stay the same. If α1 x α2, agents will manipulate to α2 to get a positive outcome if 1 c(x, α2) > 0.5, so all agents with x (0.4, 0.9] will be classified correctly. The rest of the agents with x [α1, 0.4] will stay the same to get a positive outcome with probability 0.5, and those with x [α1, T] will be misclassified. Lastly, if x < α1, agents will manipulate to α2 to get a guaranteed positive outcome if 1 c(x, α2) > 0.5 c(x, α1) (a.k.a. 1 0.9 x > 0.5 0.1 x), which is never true, and otherwise manipulate to α1 to get a positive outcome with probability 0.5 if 0.5 c(x, α1) > 0, which is always true. So the set of misclassified agents is those with x [0, T] and U({h1, h2}) = 1 T = 0.85. Therefore arg max H H[U(H)|h = h1] = {{h1}, {h1, h2}}.

Now we consider arg max H H[U(H)|h = h2]. Suppose H = {h2}. As before, all agents know h = h2. If x α2, the agent will manipulate to α2 to get a positive outcome if 1 c(x, α2) > 0, which is always true. If x α2, agents will stay the same to get a positive outcome. So the set of misclassified agents is those with x [0, T] and U({h2}) = 1 T = 0.85. Now suppose H = {h1, h2}. As before, agents believe each classifier is h with probability 0.5. If x α2, agents are guaranteed a positive outcome and stay the same. If α1 x α2, agents will manipulate to α2 to get a positive outcome if 1 c(x, α2) > 0.5, so all agents with x (0.4, 0.9] will be classified correctly. The rest of the agents with x [α1, 0.4] will stay the same to get a positive outcome with probability 0.5, and will be misclassified. Lastly, if x < α1, agents will manipulate to α1 to get a positive outcome with probability 0.5, and will be classified correctly. So the set of misclassified agents is those with x (0.4, 0.9] and U({h1, h2}) = 1 |0.9 0.4| = 0.5, which is less than for U({h2}). Therefore arg max H H[U(H)|h = h2] = {{h2}}.

We have shown that if the firm chooses H = {h1, h2}, the agent can reason that H arg max H H[U(H)|h = h1] but H / arg max H H[U(H)|h = h2], so h must be h1.

Neur IPS Paper Checklist

Question: Do the main claims made in the abstract and introduction accurately reflect the paper s contributions and scope?

Answer: [Yes] Justification: Our introduction provides a summary of contributions that accurately reflect the new model and the results of this paper, with pointers to specific sections. Our model is introduced in Section 2, and Sections 3, 4 contain all formal statements and their proofs.

Guidelines:

The answer NA means that the abstract and introduction do not include the claims made in the paper. The abstract and/or introduction should clearly state the claims made, including the contributions made in the paper and important assumptions and limitations. A No or NA answer to this question will not be perceived well by the reviewers. The claims made should match theoretical and experimental results, and reflect how much the results can be expected to generalize to other settings. It is fine to include aspirational goals as motivation as long as it is clear that these goals are not attained by the paper.

2. Limitations

Question: Does the paper discuss the limitations of the work performed by the authors?

Answer: [Yes]

Justification: See Section A.

Guidelines:

The answer NA means that the paper has no limitation while the answer No means that the paper has limitations, but those are not discussed in the paper. The authors are encouraged to create a separate "Limitations" section in their paper. The paper should point out any strong assumptions and how robust the results are to violations of these assumptions (e.g., independence assumptions, noiseless settings, model well-specification, asymptotic approximations only holding locally). The authors should reflect on how these assumptions might be violated in practice and what the implications would be. The authors should reflect on the scope of the claims made, e.g., if the approach was only tested on a few datasets or with a few runs. In general, empirical results often depend on implicit assumptions, which should be articulated. The authors should reflect on the factors that influence the performance of the approach. For example, a facial recognition algorithm may perform poorly when image resolution is low or images are taken in low lighting. Or a speech-to-text system might not be used reliably to provide closed captions for online lectures because it fails to handle technical jargon. The authors should discuss the computational efficiency of the proposed algorithms and how they scale with dataset size. If applicable, the authors should discuss possible limitations of their approach to address problems of privacy and fairness. While the authors might fear that complete honesty about limitations might be used by reviewers as grounds for rejection, a worse outcome might be that reviewers discover limitations that aren t acknowledged in the paper. The authors should use their best judgment and recognize that individual actions in favor of transparency play an important role in developing norms that preserve the integrity of the community. Reviewers will be specifically instructed to not penalize honesty concerning limitations.

3. Theory Assumptions and Proofs

Question: For each theoretical result, does the paper provide the full set of assumptions and a complete (and correct) proof?

Answer: [Yes] Justification: We describe our model in Section 2, and specific assumptions appear in relevant sections. For every theoretical result, we provide a complete (and correct) proof. Guidelines:

The answer NA means that the paper does not include theoretical results. All the theorems, formulas, and proofs in the paper should be numbered and crossreferenced. All assumptions should be clearly stated or referenced in the statement of any theorems. The proofs can either appear in the main paper or the supplemental material, but if they appear in the supplemental material, the authors are encouraged to provide a short proof sketch to provide intuition. Inversely, any informal proof provided in the core of the paper should be complemented by formal proofs provided in appendix or supplemental material. Theorems and Lemmas that the proof relies upon should be properly referenced. 4. Experimental Result Reproducibility

Question: Does the paper fully disclose all the information needed to reproduce the main experimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper (regardless of whether the code and data are provided or not)? Answer: [NA] . Justification: The paper does not include experiments. All results are carefully proven, and do not require experiments to establish correctness. Guidelines:

The answer NA means that the paper does not include experiments. If the paper includes experiments, a No answer to this question will not be perceived well by the reviewers: Making the paper reproducible is important, regardless of whether the code and data are provided or not. If the contribution is a dataset and/or model, the authors should describe the steps taken to make their results reproducible or verifiable. Depending on the contribution, reproducibility can be accomplished in various ways. For example, if the contribution is a novel architecture, describing the architecture fully might suffice, or if the contribution is a specific model and empirical evaluation, it may be necessary to either make it possible for others to replicate the model with the same dataset, or provide access to the model. In general. releasing code and data is often one good way to accomplish this, but reproducibility can also be provided via detailed instructions for how to replicate the results, access to a hosted model (e.g., in the case of a large language model), releasing of a model checkpoint, or other means that are appropriate to the research performed. While Neur IPS does not require releasing code, the conference does require all submissions to provide some reasonable avenue for reproducibility, which may depend on the nature of the contribution. For example (a) If the contribution is primarily a new algorithm, the paper should make it clear how to reproduce that algorithm. (b) If the contribution is primarily a new model architecture, the paper should describe the architecture clearly and fully. (c) If the contribution is a new model (e.g., a large language model), then there should either be a way to access this model for reproducing the results or a way to reproduce the model (e.g., with an open-source dataset or instructions for how to construct the dataset). (d) We recognize that reproducibility may be tricky in some cases, in which case authors are welcome to describe the particular way they provide for reproducibility. In the case of closed-source models, it may be that access to the model is limited in some way (e.g., to registered users), but it should be possible for other researchers to have some path to reproducing or verifying the results. 5. Open access to data and code

Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material?

Answer: [NA] .

Justification: This is a theoretical paper. All results are carefully proven, and do not require experiments to establish correctness.

Guidelines:

The answer NA means that paper does not include experiments requiring code. Please see the Neur IPS code and data submission guidelines (https://nips.cc/ public/guides/Code Submission Policy) for more details. While we encourage the release of code and data, we understand that this might not be possible, so No is an acceptable answer. Papers cannot be rejected simply for not including code, unless this is central to the contribution (e.g., for a new open-source benchmark). The instructions should contain the exact command and environment needed to run to reproduce the results. See the Neur IPS code and data submission guidelines (https: //nips.cc/public/guides/Code Submission Policy) for more details. The authors should provide instructions on data access and preparation, including how to access the raw data, preprocessed data, intermediate data, and generated data, etc. The authors should provide scripts to reproduce all experimental results for the new proposed method and baselines. If only a subset of experiments are reproducible, they should state which ones are omitted from the script and why. At submission time, to preserve anonymity, the authors should release anonymized versions (if applicable). Providing as much information as possible in supplemental material (appended to the paper) is recommended, but including URLs to data and code is permitted.

6. Experimental Setting/Details

Question: Does the paper specify all the training and test details (e.g., data splits, hyperparameters, how they were chosen, type of optimizer, etc.) necessary to understand the results?

Answer: [NA] .

Justification: This is a theoretical paper. All results are carefully proven, and do not require experiments to establish correctness.

Guidelines:

The answer NA means that the paper does not include experiments. The experimental setting should be presented in the core of the paper to a level of detail that is necessary to appreciate the results and make sense of them. The full details can be provided either with the code, in appendix, or as supplemental material.

7. Experiment Statistical Significance

Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments?

Answer: [NA] .

Justification: This is a theoretical paper. All results are carefully proven, and do not require experiments to establish correctness.

Guidelines:

The answer NA means that the paper does not include experiments. The authors should answer "Yes" if the results are accompanied by error bars, confidence intervals, or statistical significance tests, at least for the experiments that support the main claims of the paper.

The factors of variability that the error bars are capturing should be clearly stated (for example, train/test split, initialization, random drawing of some parameter, or overall run with given experimental conditions). The method for calculating the error bars should be explained (closed form formula, call to a library function, bootstrap, etc.) The assumptions made should be given (e.g., Normally distributed errors). It should be clear whether the error bar is the standard deviation or the standard error of the mean. It is OK to report 1-sigma error bars, but one should state it. The authors should preferably report a 2-sigma error bar than state that they have a 96% CI, if the hypothesis of Normality of errors is not verified. For asymmetric distributions, the authors should be careful not to show in tables or figures symmetric error bars that would yield results that are out of range (e.g. negative error rates). If error bars are reported in tables or plots, The authors should explain in the text how they were calculated and reference the corresponding figures or tables in the text. 8. Experiments Compute Resources

Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [NA] . Justification: This is a theoretical paper. All results are carefully proven, and do not require experiments to establish correctness. Guidelines:

The answer NA means that the paper does not include experiments. The paper should indicate the type of compute workers CPU or GPU, internal cluster, or cloud provider, including relevant memory and storage. The paper should provide the amount of compute required for each of the individual experimental runs as well as estimate the total compute. The paper should disclose whether the full research project required more compute than the experiments reported in the paper (e.g., preliminary or failed experiments that didn t make it into the paper). 9. Code Of Ethics

Question: Does the research conducted in the paper conform, in every respect, with the Neur IPS Code of Ethics https://neurips.cc/public/Ethics Guidelines? Answer: [Yes] Justification: We have carefully read and comply with the Neur IPS code of ethics. This work is theoretical and we do not foresee harmful consequences. Guidelines:

The answer NA means that the authors have not reviewed the Neur IPS Code of Ethics. If the authors answer No, they should explain the special circumstances that require a deviation from the Code of Ethics. The authors should make sure to preserve anonymity (e.g., if there is a special consideration due to laws or regulations in their jurisdiction). 10. Broader Impacts

Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work performed? Answer: [Yes] Justification: We note that our work is in nature theoretical and we are not directly deploying new algorithms or technologies. However, our theory provides algorithms and insights that are motivated by and can be translated to practical applications such as loan decisions. As such, we discuss potential broader impacts in Section A, both positive and negative.

Guidelines:

The answer NA means that there is no societal impact of the work performed. If the authors answer NA or No, they should explain why their work has no societal impact or why the paper does not address societal impact. Examples of negative societal impacts include potential malicious or unintended uses (e.g., disinformation, generating fake profiles, surveillance), fairness considerations (e.g., deployment of technologies that could make decisions that unfairly impact specific groups), privacy considerations, and security considerations. The conference expects that many papers will be foundational research and not tied to particular applications, let alone deployments. However, if there is a direct path to any negative applications, the authors should point it out. For example, it is legitimate to point out that an improvement in the quality of generative models could be used to generate deepfakes for disinformation. On the other hand, it is not needed to point out that a generic algorithm for optimizing neural networks could enable people to train models that generate Deepfakes faster. The authors should consider possible harms that could arise when the technology is being used as intended and functioning correctly, harms that could arise when the technology is being used as intended but gives incorrect results, and harms following from (intentional or unintentional) misuse of the technology. If there are negative societal impacts, the authors could also discuss possible mitigation strategies (e.g., gated release of models, providing defenses in addition to attacks, mechanisms for monitoring misuse, mechanisms to monitor how a system learns from feedback over time, improving the efficiency and accessibility of ML).

11. Safeguards

Question: Does the paper describe safeguards that have been put in place for responsible release of data or models that have a high risk for misuse (e.g., pretrained language models, image generators, or scraped datasets)?

Answer: [NA] .

Justification: The work is entirely theoretical; we do not release new data or models.

Guidelines:

The answer NA means that the paper poses no such risks. Released models that have a high risk for misuse or dual-use should be released with necessary safeguards to allow for controlled use of the model, for example by requiring that users adhere to usage guidelines or restrictions to access the model or implementing safety filters. Datasets that have been scraped from the Internet could pose safety risks. The authors should describe how they avoided releasing unsafe images. We recognize that providing effective safeguards is challenging, and many papers do not require this, but we encourage authors to take this into account and make a best faith effort.

12. Licenses for existing assets

Question: Are the creators or original owners of assets (e.g., code, data, models), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected?

Answer: [NA] .

Justification: We do not use existing assets.

Guidelines:

The answer NA means that the paper does not use existing assets. The authors should cite the original paper that produced the code package or dataset. The authors should state which version of the asset is used and, if possible, include a URL. The name of the license (e.g., CC-BY 4.0) should be included for each asset.

For scraped data from a particular source (e.g., website), the copyright and terms of service of that source should be provided. If assets are released, the license, copyright information, and terms of use in the package should be provided. For popular datasets, paperswithcode.com/datasets has curated licenses for some datasets. Their licensing guide can help determine the license of a dataset. For existing datasets that are re-packaged, both the original license and the license of the derived asset (if it has changed) should be provided. If this information is not available online, the authors are encouraged to reach out to the asset s creators.

13. New Assets

Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets?

Answer: [NA] .

Justification: The paper does not release new assets.

Guidelines:

The answer NA means that the paper does not release new assets. Researchers should communicate the details of the dataset/code/model as part of their submissions via structured templates. This includes details about training, license, limitations, etc. The paper should discuss whether and how consent was obtained from people whose asset is used. At submission time, remember to anonymize your assets (if applicable). You can either create an anonymized URL or include an anonymized zip file.

14. Crowdsourcing and Research with Human Subjects

Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation (if any)?

Answer: [NA] .

Justification: The paper does not involve crowdsourcing nor research with human subjects.

Guidelines:

The answer NA means that the paper does not involve crowdsourcing nor research with human subjects. Including this information in the supplemental material is fine, but if the main contribution of the paper involves human subjects, then as much detail as possible should be included in the main paper. According to the Neur IPS Code of Ethics, workers involved in data collection, curation, or other labor should be paid at least the minimum wage in the country of the data collector.

15. Institutional Review Board (IRB) Approvals or Equivalent for Research with Human Subjects

Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or institution) were obtained?

Answer: [NA] .

Justification: The paper does not involve crowdsourcing nor research with human subjects.

Guidelines:

The answer NA means that the paper does not involve crowdsourcing nor research with human subjects.

Depending on the country in which research is conducted, IRB approval (or equivalent) may be required for any human subjects research. If you obtained IRB approval, you should clearly state this in the paper. We recognize that the procedures for this may vary significantly between institutions and locations, and we expect authors to adhere to the Neur IPS Code of Ethics and the guidelines for their institution. For initial submissions, do not include any information that would break anonymity (if applicable), such as the institution conducting the review.