# bias_detection_via_signaling__f923abda.pdf

Bias Detection via Signaling

Yiling Chen Harvard University yiling@seas.harvard.edu

Tao Lin Harvard University tlin@g.harvard.edu

Ariel D. Procaccia Harvard University arielpro@g.harvard.edu

Aaditya Ramdas Carnegie Mellon University aramdas@cmu.edu

Itai Shapira Harvard University itaishapira@g.harvard.edu

We introduce and study the problem of detecting whether an agent is updating their prior beliefs given new evidence in an optimal way that is Bayesian, or whether they are biased towards their own prior. In our model, biased agents form posterior beliefs that are a convex combination of their prior and the Bayesian posterior, where the more biased an agent is, the closer their posterior is to the prior. Since we often cannot observe the agent s beliefs directly, we take an approach inspired by information design. Specifically, we measure an agent s bias by designing a signaling scheme and observing the actions the agent takes in response to different signals, assuming that the agent maximizes their own expected utility. Our goal is to detect bias with a minimum number of signals. Our main results include a characterization of scenarios where a single signal suffices and a computationally efficient algorithm to compute optimal signaling schemes.

1 Introduction

A bag contains two coins that look and feel identical, but one is a fair coin that, on a flip, comes up heads with probability 0.5, and the other is an unfair coin with probability 0.9 of heads. You reach into the bag, grab one of the coins and flip it once; it lands on heads. Since you are (hopefully) familiar with Bayes rule, you conclude that the probability you are holding the fair coin is 0.36. Now suppose you are offered the following deal: if you pay $1, you get to flip the same coin again, and if it comes up heads, you will receive $1.4. Since you now believe that the probability of heads is 0.76, you take the deal (assuming you are risk neutral) and earn 6 cents in expectation.

If, by contrast, another risk-neutral person in the same situation decides to decline the same deal, they must believe that the probability they are holding the fair coin is greater than 0.47. That is, their belief is still very close to the prior of 0.5. We think of such a person as being biased, in the sense that they are unwilling to significantly update their beliefs, despite evidence to the contrary.

Of course, failing to update one s beliefs about coin flips is not the end of the world. But this example serves to illustrate a broader phenomenon that, in our view, is both important and ubiquitous. In particular, the stickiness of prior beliefs in the face of evidence plays a role in politics think of the controversy over Russian collusion in the 2016 US presidential election or the existence of weapons of mass destruction in Iraq in 2003. It is also prevalent in science, as exemplified by the polarized debate over the origins of the Covid pandemic [3].

Our goal in this paper is to develop algorithms that are able to detect bias in the form of non-Bayesian updating of beliefs. To our knowledge, we are the first to formalize and analytically address this problem, and we aim to build an initial framework that future work would build on. In the long term, we believe such algorithms could have many applications, including understanding to what

38th Conference on Neural Information Processing Systems (Neur IPS 2024).

degree the foregoing type of bias contributes to disagreement and polarization, and discounting the opinions of biased agents to improve collective decision making.

Our approach. The first question we need to answer is how to quantify bias. In this first investigation, we adopt a linear model of bias that was proposed and used as a general belief updating model in economics [8, 10, 5, 18]. If the prior is µ0 and the correct Bayesian posterior upon receiving a signal (or evidence) s is denoted µs, we posit that an agent with bias w [0, 1] adopts the belief wµ0 + (1 w)µs. At the extremes, an agent with bias w = 0 performs perfect Bayesian updating and an agent with bias w = 1 cannot be convinced to budge from the prior.

The bigger conceptual question is how we can infer an agent s bias. To address it, we take an approach that is inspired by the literature on information design [13]. In our context, suppose that we (the principal) and the agent have asymmetric information: while both share a common (say public) prior about the state of the world, the principal knows the true (realized) state of the world, but the agent does not. The principal publicly commits to a (randomized) signaling scheme that specifies the probability of sending each possible signal given each possible realized state of the world. Given their knowledge of the latter, the principal draws a signal from the specified distribution and sends it to the agent. Upon receiving such a signal, the agent updates their beliefs about the state of the world (from the common prior) and then takes an action that maximizes their expected utility according to a given utility function. Similarly to the example we started with, it is the action taken by the agent that can (indirectly) reveal their degree of bias.

Note that the problem of estimating the exact level of bias reduces to the problem of detecting whether the agent s bias is above or below some threshold. Indeed, to estimate the level of bias to an accuracy of ϵ, log(1/ϵ) such threshold queries suffice by using binary search. The challenge, then, is to design signaling schemes that test whether bias is above or below a given threshold in the most efficient way, that is, using a minimum number of signals in expectation.

Our results. We design a polynomial-time algorithm that computes optimal signaling schemes, in Section 4. We first show that constant algorithms, which repeatedly use the same signaling scheme, are as powerful as adaptive algorithms, which can vary the scheme over time based on historical data (Lemma 4.1); we can therefore restrict our attention to constant algorithms. In Lemma 4.5, we establish a version of the revelation principle for the bias detection problem, which asserts that optimal signaling schemes need only use signals that can be interpreted as action recommendations. Finally, building on these insights, we show that the optimal solution to our problem is obtained by solving a small linear program (Algorithm 1 and Theorem 4.6).

In Section 5, we present a geometric characterization of optimal signaling schemes (Theorem 5.2), which sheds additional light on the performance of the algorithm. In particular, the characterization provides sufficient and necessary conditions for the testability of bias, and also identifies cases where only a single sample is needed for this task.

Related work. There is a significant body of experimental work in the social sciences aiming to explain the failure of partisans to reach similar beliefs on factual questions where there is a large amount of publicly available evidence. The fact that biased belief updating occurs is undisputed (to our knowledge), and the focus is on understanding the factors that play a role. In particular, a prominent line of work supports the (perhaps counterintuitive) hypothesis that the more cognitively sophisticated a partisan is, the more politically biased is their belief update process [16, 17, 12, 11]. These results are challenged by more recent work by Tappin et al. [19], who found that greater analytical thinking is associated with belief updates that are less biased, using an experimental design that explicitly measures the proximity of belief updates to a correct Bayesian posterior. While these studies provide empirical underpinnings for our theoretical model, their research questions are orthogonal to ours: we aim to measure the magnitude of bias regardless of its source.

Classical work in information design [4, 13] studies how a principal can strategically provide information to induce an agent to take actions that are beneficial for the principal, assuming a perfectly Bayesian agent. Various relaxations of the perfectly Bayesian assumption have been investigated [1, 10, 7, 5, 9, 21, 14]. The work by de Clippel and Zhang [5] is close to us, which studies biased belief update models including the linear model. However, their goal is to maximize the principal s utility with the agent s bias fully known. In our problem the agent s bias level is unknown, and the principal s goal is to infer the agent s bias level instead of maximizing their own utility. Tang and

Ho [18] present real-world experiments showing that human belief updates are close to a linear bias model, which supports our theoretical assumption.

Biased agent. Consider a standard Bayesian setting: the relevant state of the world is θ Θ, distributed according to some known prior distribution µ0. If an agent were perfectly Bayesian, when receiving some new information ( signal ) s and with the knowledge of the conditional distributions P(s|θ) for all θ, they would update their belief about the state of the world according to Bayes Rule: µs(θ) = P(θ|s) = µ0(θ)P (s|θ)

P (s) . We refer to µs as the true posterior belief induced by s. Being biased, the agent s belief after seeing s, denoted νs, is a convex combination of µs and µ0:

νs = wµ0 + (1 w)µs,

where w [0, 1] is the unknown bias level, capturing the agent s inclination to retain their prior belief in the presence of new information. This linear model was proposed and adopted in economics for non-Bayesian belief updating [8, 10], in order to capture people s conservatism in processing new information and their tendency to protect their beliefs [20].

The agent can choose an action from a finite set A and has a state-dependent utility function U : A Θ R. They receive utility U(a, θ) when taking action a in state θ. The agent will act according to their (biased) belief νs and choose an action a that maximizes their expected utility:

a arg max a A Eθ νs[U(a, θ)] = arg max a A

θ Θ νs(θ)U(a, θ).

In the absence of any additional information, the agent operates based on the prior belief µ0 and will select an action deemed optimal with respect to µ0. We introduce the following mild assumption to ensure the uniqueness of this action: Assumption 2.1. There is a unique action that maximizes the expected utility based on the prior belief µ0: | arg maxa A{P

θ Θ µ0(θ)U(a, θ)}| = 1.

This assumption will be made throughout the paper. We denote the unique optimal action on the prior belief as a0 = arg maxa A{P

θ Θ µ0(θ)U(a, θ)}, and call it the default action.

Bias detection. The principal, who knows the prior µ0 and the agent s utility function U, seeks to infer the agent s bias level from their action as efficiently as possible. The principal has an informational advantage they observe the independent realizations of the state of the world at each time step. In other words, the principal knows θt, an independent sample drawn according to µ0 at time t. The principal wants to design signaling schemes to strategically reveal information about θt to the agent, hoping to influence the agent s biased belief in a way that the agent s chosen actions reveal information about their bias level. Specifically, with a finite signal space S, the principal can commit to a signaling scheme πt : Θ (S) at time t, where πt(s|θ) specifies the probability of sending signal s in state θ at time t. After seeing a signal st, drawn according to πt(s|θt) at time t, the agent takes action at that is optimal for their biased belief νst. The principal infers information about bias w from the history of signaling schemes, realized states, realized signals, and agent actions Ht = {(π1, θ1, s1, a1), . . . , (πt, θt, st, at)}. We denote by Π an adaptive algorithm that the principal uses to decide on the signaling scheme at time t + 1 based on history Ht.

Given a threshold τ (0, 1), the principal wants to design Π to answer the following question:

Is the agent s bias level w greater than or equal to τ or less than or equal to τ?1

As noted earlier, by answering the above threshold question, one can also estimate the bias level w within accuracy ϵ by performing log(1/ϵ) iterations of binary search. This effectively reduces the broader task of estimating w to a sequence of targeted threshold checks. By employing an adaptive signaling scheme, this approach lets us approximate w to any desired precision, providing an efficient solution to the bias estimation problem.

1One may want to test w τ or w < τ instead. But this requires assumptions on tie-breaking when the agent has multiple optimal actions. Indifference at w = τ allows us to avoid such assumptions.

An algorithm Π for the above question terminates as soon as it can output a deterministic answer. The number of time steps for Π to terminate, denoted by Tτ(Π, w), is a random variable. The sample complexity of Π is defined to be the expected termination time in the worst case over w [0, 1]: Definition 2.1 (sample complexity). The (worst-case) sample complexity of Π is defined as2

Tτ(Π) = max w [0,1] E[Tτ(Π, w)].

Our objective is to develop an algorithm Π that can determine whether w τ or w τ with minimal sample complexity. Specifically, we want to solve the following minimax problem:

min Π max w [0,1] E[Tτ(Π, w)].

We say that an algorithm Π is constant if it keeps using the same signaling scheme repeatedly until termination. Constant algorithms are a special case of non-adaptive algorithms, which may vary the signaling schemes over time but remain independent of historical data.

Preliminaries. We now introduce the well-known splitting lemma from the information design literature [2, 13, 15]. It relates a signaling scheme with a set of induced true posteriors for a Bayesian agent and a distribution over the set of true posteriors. Lemma 2.1 (Splitting Lemma, e.g., [13]). Let π be a signaling scheme where each signal s S is sent with unconditional probability π(s) = P

θ Θ µ0(θ)π(s|θ) and induces true posterior µs. Then, the prior µ0 equals the convex combination of {µs}s S with weights {π(s)}s S: µ0 = P

s S π(s)µs. Conversely, if the prior can be expressed as a convex combination of distributions µ s (Θ): µ0 = P

s S psµ s, where ps 0, P

s S ps = 1, then there exists a signaling scheme π where each signal s is sent with unconditional probability π(s) = ps and induces posterior µ s.

The splitting lemma is also referred as the Bayesian consistency condition. It allows one to think about choosing a signaling scheme as choosing a set of true posteriors, {µs}s S, and a distribution over the set, {π(s)}s S, in a Bayesian consistent way.

3 Warm-Up: A Two-State, Two-Action Example

How can the principal design a signaling scheme to learn the agent s bias level? We use a simple two-state, two-action example to demonstrate how inducing a specific true posterior belief will allow the principal to determine whether w τ or w τ.

The two states of the world are represented as {Good, Bad}. The agent has two possible actions: Active and Passive. Taking the Passive action always yields a utility of 0, independently of the state. For the Active action, the utility is a if the state is Good and b otherwise; a, b > 0. We use the probability of the Good state to represent a belief, so the prior is a number µ0 [0, 1], which is only a slight abuse of notation. With belief µ [0, 1] for the Good state (and 1 µ for the Bad state), the agent s expected utility for choosing the Active action is aµ b(1 µ) = (a + b)µ b. Thus, the Active action is better than the Passive action (so the agent will take Active) if

(a + b)µ b > 0 µ > b a+b =: µ . (1)

Conversely, the Passive action is better if µ < µ . Here, µ = b a+b is an indifference belief where the agent is indifferent between the two actions. We assume that the prior µ0 satisfies 0 < µ0 < µ , so the agent chooses the Passive action by default.

Consider the following constant signaling scheme πτ with two signals {G, B}:

If the state is Good, send signal G with probability one.

If the state is Bad, send signal B with probability µ µ0 (µ τµ0)(1 µ0) and signal G with the complement probability.

2Taking the worst case over w [0, 1] is not overly pessimistic. As we will show in the proofs, the worst case in fact happens at w [τ ε, τ + ε] for some ε > 0, which makes intuitive sense. Therefore, the sample complexity can be equivalently defined as Tτ(Π) = maxw [τ ε,τ+ε] E[Tτ(Π, w)].

We will show that, by repeatedly using πτ, we can test whether the agent s bias w is τ or τ. By Bayes Rule, the true posterior beliefs (for the Good state) associated with the two signals are µB = 0 (i.e., on receiving B, the agent knows the state is Bad for sure) and

µG = P(Good|G) = µ0 πτ (G|Good) µ0 πτ (G|Good)+(1 µ0) πτ (G|Bad) = µ τµ0

Notably, the posterior µG satisfies the following property: if the agent s bias level w is exactly equal to τ, then the agent s biased belief is equal to the indifference belief:

when w = τ, νG = τµ0 + (1 τ)µG = µ .

We also note the inequality µ0 < µ < µG. As a result, if the agent s bias level w is greater than τ, then the biased belief will be smaller than µ , and otherwise the opposite is true:

for w > τ, wµ0 + (1 w)µG < µ ; for w < τ, wµ0 + (1 w)µG > µ .

By Equation (1), this means that the agent will take the Passive action if w > τ, and the Active action if w < τ (on receiving G). Therefore, by observing which action is taken by the agent when signal G is sent, we can immediately tell whether w τ or w τ. This leads to the following:

Theorem 3.1. In the two-state, two-action example, for any threshold τ [0, 1 µ

1 µ0 ], the above constant signaling scheme πτ can test whether the agent s bias w satisfies w τ or w τ: specifically, whenever the signal G is sent,

if the agent takes action Active, then w τ, if the agent takes action Passive, then w τ.

The sample complexity of this scheme is µ µ0 µ0(1 τ) + 1, which increases with τ.

Proof. The range τ [0, 1 µ

1 µ0 ] ensures that the probability πτ(B|Bad) = µ µ0 (µ τµ0)(1 µ0) is in [0, 1]. The two items in the theorem follow from the argument before the theorem statement. The sample complexity is equal to the expected number of time steps until a G signal is sent, which is a geometric random variable with success probability P(G) = µ0πτ(G|Good) + (1 µ0)πτ(G|Bad) = µ0(1 τ)

µ τµ0 . So the sample complexity is equal to the mean 1 P (G) = µ µ0 µ0(1 τ) +1.

The main intuition behind this result is that in order to test whether w τ or w τ, we design a signaling scheme where certain signals induce posteriors that make the agent indifferent between two actions if the agent s bias level is exactly τ. Then, the action actually taken by the agent will directly reveal whether w τ or w τ. Such signals are useful signals, but not all signals are necessarily useful. The sample complexity is then determined by the total probability of useful signals. This intuition will carry over to computing the optimal signaling scheme for the general case in Section 4.

Finally, we remark that using the constant signaling scheme πτ constructed above to test w τ or w τ is in fact the optimal adaptive algorithm, according to the results we will present in Section 4. So, the minimal sample complexity to test whether w τ or w τ in this two-state, two-action example is exactly µ µ0 µ0(1 τ) + 1 as shown in Theorem 3.1.

4 Computing the Optimal Signaling Scheme in the General Case

In this section, we generalize the initial observations from the previous section to the case with any number of actions and states and general utility function U. We will show how to compute the optimal algorithm (signaling scheme) to test the agent s bias level. There are three key ingredients. First, we prove that we can use a constant signaling scheme. Second, we develop a revelation principle to further simplify the space of signaling schemes. Building on these two steps, we show that the optimal signaling scheme can be computed by a linear program.

4.1 Optimality of Constant Signaling Schemes

In this subsection, we show that adaptive algorithms are no better than constant algorithms for the problem of testing whether w τ or w τ. Therefore, to find the algorithm with minimal sample complexity, we only need to consider constant algorithms/signaling schemes.

Lemma 4.1. Fix τ (0, 1). For the problem of testing whether w τ or w τ, the sample complexity of any adaptive algorithm is at least that of the optimal constant algorithm (i.e., using a fixed signaling scheme repeatedly).

To prove this lemma, we introduce some notations. For any action a A \ {a0}, define vector

ca = (ca,θ)θ Θ = U(a0, θ) U(a, θ)

θ Θ R|Θ|, (2)

whose components are the utility differences between the default action a0 and any other action a at different states θ Θ. Let Ra0 (Θ) be the region of beliefs under which the agent strictly prefers a0 over any other action:

Ra0 = µ (Θ) | a A \ {a0}, c a µ > 0 .

It is the intersection of |A| 1 open halfspaces with the probability simplex (Θ). As the agent strictly prefers a0 at the prior µ0, we have µ0 Ra0. The boundary of this region, Ra0, is the set of beliefs where the agent is indifferent between a0 and at least one other action a A \ {a0} and a0 and a are both (weakly) better than any other action:

Ra0 = µ (Θ) | a A \ {a0}, c a µ = 0 and a A \ {a0}, c a µ 0 . (3)

Lastly, the exterior of Ra0, denoted as ext Ra0, comprises the set of beliefs where the agent strictly prefers not to choose a0:

ext Ra0 = (Θ) \ (Ra0 Ra0) = µ (Θ) | a A \ {a0}, c a µ < 0 .

Given a signaling scheme π, we classify its signals into three types based on the location of the biased belief associated with the signal with respect to the region Ra0. Definition 4.1. Let τ (0, 1) be a parameter. Let s S be a signal from a signaling scheme π, with associated true posterior µs and τ-biased posterior µτ s = τµ0 + (1 τ)µs. We say s is

an internal signal if µτ s Ra0; a boundary signal if µτ s Ra0; an external signal if µτ s ext Ra0.

The above classification helps to formalize the idea of whether a signal is useful for bias detection. A boundary signal is useful because the action taken by the agent after receiving a boundary signal immediately tells whether w τ or w τ: Lemma 4.2. When a boundary signal is realized, the agent s action immediately reveals whether w τ or w τ. Specifically, if the agent chooses action a0, then w τ; otherwise, w τ.

Proof. If the agent s bias level satisfies w < τ, then the biased belief νs = wµ0 + (1 w)µs must be inside Ra0 (because µτ s = τµ0 + (1 τ)µs is on the boundary of Ra0 and µ0 Ra0), so the agent strictly prefers the default action a0. If w > τ, then the biased belief νs is outside of Ra0, so the agent will not take action a0.

An external signal might also be useful in revealing whether w τ or w τ if the agent is indifferent between some actions a1, a2 other than a0 at the τ-biased belief µτ s. However, the following lemma shows that, in such cases, we can always modify the signaling scheme to turn the external signal into a boundary signal. This modification will increase the total probability of useful signals and hence reduce the sample complexity. The proof of this lemma is in Appendix A.1. Lemma 4.3. Suppose Π is an adaptive algorithm that uses signaling schemes with internal, boundary, and external signals. Then, there exists another adaptive algorithm Π with equal or lower sample complexity that employs only signaling schemes with internal and boundary signals.

An internal signal, on the other hand, is not useful for testing w τ or w τ, for the following reason. For an internal signal, the biased belief with bias level τ, µτ s, lies inside Ra0. Since Ra0 is an open region, there must exist a small number ε > 0 such that when the agent has bias level w = τ + ε or τ ε, the biased belief with bias level w, wµ0 + (1 w)µs, is also inside the region Ra0, so the agent will take action a0. As the agent takes a0 under both w = τ + ε and τ ε, we cannot distinguish these two cases, so this signal is not helpful in determining w τ or w τ. The following lemma formalizes the idea that internal signals are not useful:

Lemma 4.4. To test whether w τ or w τ, any adaptive algorithm that uses signaling schemes with boundary and internal signals cannot terminate until a boundary signal is sent.

Proof of Lemma 4.1. By Lemma 4.3, the optimal adaptive algorithm only uses signaling schemes with boundary and internal signals. By Lemma 4.4, the algorithm cannot terminate until a boundary signal is sent. By Lemma 4.2, the algorithm terminates when a boundary signal is sent. We conclude that the termination time of any adaptive algorithm cannot be better than the constant algorithm that keeps using the signaling scheme that maximizes the total probability of boundary signals.

4.2 Revelation Principle

To compute the optimal constant signaling scheme, we need another technique that is similar to the revelation principle in the information design literature [13, 6]. The revelation principle says that, in some information design problems, it is without loss of generality to consider only direct signaling schemes where signals are recommendations of actions for the agent, that is, the signal space is S = A, and when the principal sends signal a, it should be optimal for the agent to take action a given the posterior belief induced by signal a.

Unlike classical information design problems where the agent is unbiased, our problem involves a biased agent, so we need a different revelation principle: the signals are still action recommendations, but when the principal sends signal a, action a is optimal for an agent with bias level exactly τ; moreover, if a = a0, then an agent with bias level τ will be indifferent between a and a0. This insight is formalized in the following lemma: Lemma 4.5 (revelation principle for bias detection). Let π be an arbitrary signaling scheme that can test w τ or w τ. Then, there exists another signaling scheme π that can do so with signal space S = A such that:

(1) Given signal a A, action a is an optimal action for any agent with bias level w = τ. (2) Given signal a A \ {a0}, actions a and a0 are both optimal for any agent with bias level w = τ. As a corollary, if the agent s bias level w < τ, then the agent strictly prefers a over a0; and if w > τ, then the agent strictly prefers a0 over any other actions. (3) The sample complexity satisfies Tτ(π ) Tτ(π).

In the above signaling scheme π , every a A \ {a0} is a boundary signal (Definition 4.1), which is useful for testing bias: given signal a A \ {a0}, if the agent takes action a0, then it must be w τ; otherwise w τ. The signal a0 is internal and not useful for determining w τ or w τ. So, the sample complexity of π is equal to the expected time steps until a signal in A \ {a0} is sent.

The idea behind Lemma 4.5 is combination of signals. Suppose there is a signaling scheme that can determine whether w τ or w τ with a signal space larger than A. There must exist two signals s and s under which the agent is indifferent between a0 and some action a = a0 if the agent s bias level is exactly τ. We can then combine the two signals into a single signal s under which the agent remains indifferent between a0 and a, yielding a new signaling scheme with a smaller signal space. Repeating this can reduce the signal space to size |A|. See Appendix A.3 for the full proof.

4.3 Algorithm for Computing the Optimal Signaling Scheme

Finally, we present an algorithm to compute the optimal (minimal sample complexity) signaling scheme to test whether w τ or w τ. The revelation principle in the previous subsection ensures that we only need a direct signaling scheme where signals are action recommendations. The optimal direct signaling scheme turns out to be solvable by a linear program, detailed in Algorithm 1. In the linear program, the constraint in Equation (5) ensures that whenever the principal recommends action a A, it is optimal for an agent with bias level τ to take action a; this satisfies condition (1) in the revelation principle (Lemma 4.5). The indifference constraint (Equation (5)) ensures that when the recommended action a is not a0, an agent with bias level τ is indifferent between a and a0; this satisfies condition (2) in the revelation principle. The objective (Equation (4)) is to maximize the probability of useful signals (those in A \ {a0}), hence minimize the sample complexity. Theorem 4.6. Algorithm 1 finds a constant signaling scheme for testing w τ or τ that is optimal among all adaptive signaling schemes. The sample complexity of the optimal signaling scheme is 1/p , where p is the optimal objective value in Equation (4).

Algorithm 1: Linear program to compute the optimal signaling scheme Input : prior µ0, utility function U, and the parameter τ (0, 1) Variable: signaling scheme π, consisting of conditional probabilities π(a|θ) for a A, θ Θ

Denote U(a, a , θ) = U(a, θ) U(a , θ). Solve the following linear program:

θ Θ π(a|θ)µ0(θ) (4)

subject to:

Optimality of a over other actions: a A, a A \ {a} X

θ Θ π(a|θ) µ0(θ) h (1 τ) U(a, a , θ) + τ X

θ Θ µ0(θ ) U(a, a , θ ) i 0; (5)

Indifference between a and a0: a A \ {a0}, X

θ Θ π(a|θ) µ0(θ) h (1 τ) U(a, a0, θ) + τ X

θ Θ µ0(θ ) U(a, a0, θ ) i = 0; (6)

Probability distribution constraints: θ Θ, X

a A π(a|θ) = 1 and a A, π(a|θ) 0.

Using the above optimal signaling scheme, whenever the principal recommends an action a other than a0, the agent s action immediately reveals whether w τ or w τ: if the agent indeed follows the recommendation or takes any other action than a0, then the bias must be small (w τ); if the agent takes a0 instead, the bias must be large (w τ). Thus, the expected sample complexity is equal to the expected number of iterations until a signal in A \ {a0} is sent, which is 1/p 3.

The linear program in Algorithm 1 has a polynomial size in |A| (the number of actions) and |Θ| (the number of states), so it is a polynomial-time algorithm. The solution p depends on the geometry of the problem instance and does not seem to have a closed-form expression.

The remainder of this section proves Theorem 4.6. The proof requires an additional lemma: Lemma 4.7. Given a signaling scheme π = (π(a|θ))a A,θ Θ and an agent s bias level w, after signal a is sent, the agent strictly prefers action a1 over a2 under the biased belief if and only if: X

θ Θ π(a|θ) µ0(θ) h (1 w) U(a1, a2, θ) + w X

θ Θ µ0(θ ) U(a1, a2, θ ) i > 0.

Proof. The agent s biased belief under signal a and bias level w is given by (1 w) µ0(θ)π(a|θ) P

θ Θ µ0(θ )π(a|θ ) + wµ0(θ), θ Θ. The condition for the agent to strictly prefer a1 over a2 is that the expected utility under the biased belief when choosing a1 is greater than that of a2: X

(1 w) µ0(θ)π(a|θ) P

θ Θ µ0(θ )π(a|θ ) + wµ0(θ) U(a1, a2, θ) > 0,

where U(a1, a2, θ) = U(a1, θ) U(a2, θ). Multiplying by P

θ Θ µ0(θ )π(a|θ ), we obtain:

θ Θ µ0(θ)π(a|θ) U(a1, a2, θ) + w X

θ Θ µ0(θ) X

θ Θ µ0(θ )π(a|θ ) U(a1, a2, θ) > 0.

Factoring out the terms, this can be rewritten as: X

θ Θ π(a|θ)µ0(θ) (1 w) U(a1, a2, θ) + w X

θ Θ µ0(θ ) U(a1, a2, θ ) > 0.

3While our focus is the expected sample complexity, we can also derive a high-probability guarantee: with t 1 p log 1

δ iterations, we can determine whether w τ or w τ with probability at least 1 δ. This is because the probability that no useful signal is sent after t iterations is at most (1 p )t δ when t 1 p log 1

(1, 0, 0) (0, 1, 0)

(a) A single sample

(1, 0, 0) (0, 1, 0)

(b) Finite sample complexity

(1, 0, 0) (0, 1, 0)

(c) Cannot be solved

Figure 1: The three qualitatively different cases for detecting the level of bias, each illustrated within a simplex over three states where µ0 is the prior belief. Each point in the simplex corresponds to an optimal action for the agent. Green curves indicate indifference between the default action a0 and another action under an unbiased belief. Orange curves are translated versions of these indifference curves; a posterior on these curves means the agent s biased belief (at bias level τ) aligns with the green curves. From (a) to (c), τ increases, translating the orange curves further. In Figure 1a, µ0 can be represented as a convex combination of points on the translated curves, allowing bias level detection with a single sample. In Figure 1b, only some signals are useful, requiring more than one sample in the worst case. In Figure 1c, the bias level cannot be tested against τ.

This final expression is positive if and only if the agent to strictly prefer a1 over a2.

Proof of Theorem 4.6. According to Lemma 4.1 (constant algorithms are optimal) and Lemma 4.5 (revelation principle), to find an optimal adaptive algorithm we only need to find the optimal constant signaling scheme that satisfies the conditions in Lemma 4.5. We verify that the signaling scheme computed from the linear program in Algorithm 1 satisfies the conditions in Lemma 4.5:

The optimality constraint (Equation (5)) in the linear program, together with Lemma 4.7, ensures that: whenever signal a A is sent, action a is weakly better than any other action for an agent with bias level w = τ. This satisfies the first condition in Lemma 4.5. The indifference constraint (Equation (5)), together with Lemma 4.7, ensures that: whenever a A \ {a0} is sent, the agent is indifferent between action a and a0 if the bias level w = τ. Then, by the optimality constraint (Equation (5)), we have both a and a0 being optimal actions. This satisfies the second condition in Lemma 4.5.

We then argue that the solution of the linear program is the optimal signaling scheme that satisfies the conditions of Lemma 4.5. According to our argument after Lemma 4.5, only the signals in A \ {a0} are useful signals, so the sample complexity is equal to the expected number of time steps until a signal in A \ {a0} is sent. The probability that a signal in A \ {a0} is sent at each time step is X

a A\{a0} π(a) = X

θ Θ µ0(θ)π(a|θ).

The expected number of time steps is the inverse 1 P

θ Θ µ0(θ)π(a|θ) (because the number of time steps is a geometric random variable). The linear program maximizes the probability P

θ Θ µ0(θ)π(a|θ), so it minimizes the sample complexity.

5 Geometric Characterization of the Testability of Bias

To complement the algorithmic solution presented in the previous section, this section provides a geometric characterization of the bias detection problem. We identify the conditions under which testing whether w τ or w τ can be done in only one sample, in finite number of samples, or cannot be done at all (which is the scenario where the linear program in Algorithm 1 is infeasible).

By Assumption 2.1 (a0 is strictly better than other actions at prior µ0), we have:

θ Θ µ0(θ) U(a0, θ) U(a, θ) > 0, a A \ {a0},

where ca is as defined in Equation (2). Define Ia as the set of indifference beliefs between action a and a0, which is the intersection of the hyperplane {x|c a x = 0} and the probability simplex (Θ):

Ia := {µ (Θ) | c a µ = 0}.

Given a parameter τ (0, 1), for which we want to test whether w τ or w τ, let

Ia,τ = {µ (Θ) | (1 τ)µ + τµ0 Ia}

be the set of posterior beliefs for which, if the agent s bias level is exactly τ, then the agent s biased belief will fall within the indifference set Ia. Lemma 5.1. Ia,τ is equal to the intersection of the probability simplex (Θ) and a translation of the hyperplane {x | c a x = 0}: Ia,τ = µ (Θ) | c a µ = τ 1 τ c a µ0 .

The proof of this lemma is in Appendix B.1. With this representation of Ia,τ in hand, we can now present a geometric characterization of the testability of bias. Theorem 5.2 (geometric characterization). Fix τ (0, 1). The problem of testing w τ or w τ

Can be solved with a single sample (the sample complexity is 1) if and only if the prior µ0 is in the convex hull formed by the translated sets Ia,τ for all non-default actions a A \ {a0}: i.e., µ0 Convex Hull S a A\{a0} Ia,τ .

Can be solved (with finite sample complexity) if and only if Ia,τ = for at least one a A\{a0}. Cannot be solved if Ia,τ = for all a A \ {a0}.

Figure 1 illustrates the three cases of Theorem 5.2. In the first case, the solution of the linear program in Algorithm 1 satisfies P

θ Θ π(a|θ)µ0(θ) = 1, meaning that useful signals are sent with probability 1, which allows us to tell whether w τ or w τ immediately. In the second case, the total probability of useful signals satisfies P

θ Θ π(a|θ)µ0(θ) < 1, so the sample complexity is more than 1. In the third case, the linear program in Algorithm 1 is not feasible, so w τ or w τ cannot be determined; importantly, this is not a limitation of our particular algorithm, but a general impossibility in our model. The proof of Theorem 5.2 is in Appendix B.2.

6 Discussion

Our approach has some limitations; here we discuss the two that we view as most significant.

First, we have assumed a linear model of bias. While the linear model is common in the literature [8, 10, 5, 18], we also consider a more general model of bias (in Appendix C): as the bias level w increases from 0 to 1, the agent s belief changes from the true posterior µs to the prior µ0 according to some general continuous function ϕ(µ0, µs, w). We show that, as long as the function ϕ satisfies a certain single-crossing property (as w increases, once the agent starts to prefer the default action a0, they will not change the preferred action anymore), our results regarding the optimality of constant signaling schemes and the geometric characterization still hold, while the revelation principle and the linear program algorithm no longer work because ϕ is not linear. We consider it an interesting challenge to come up with more general models of bias that are still tractable, in the sense that one can efficiently design good signaling schemes with reasonable sample complexity bounds.

Second, we have assumed that the agent s prior is the same as the real prior from which states of the world are drawn. But what if the agent s prior is different? Our results directly extend to the case where the agent has a wrong, known prior. If the agent s prior is unknown, then our problem becomes significantly more challenging. More generally, the agent may have a private type that determines both their prior and utility and is unknown to the principal. We conjecture that testing the agent s bias in this case becomes impossible, because if different types consistently take opposite actions, then the actions provide no information about the agent s bias.

Despite these limitations, we view our paper as making significant progress on a novel problem that seems fundamental. Our results suggest that practical algorithms for detecting bias in belief update are within reach and, in the long term, may lead to new insights on issues of societal importance. In particular, we anticipate future research in more complex situations such as combining decisions of many experts (human or AI) after measuring and accounting for their individual biases.

Acknowledgments

This research was partially supported by the National Science Foundation under grants IIS-2147187, IIS-2229881 and CCF-2007080; and by the Office of Naval Research under grants N00014-20-12488 and N00014-24-1-2704.

[1] R. Alonso and O. Câmara. Bayesian persuasion with heterogeneous priors. Journal of Economic Theory, 165:672 706, 2016.

[2] R. J. Aumann, M. B. Maschler, and R. E. Stearns. Repeated Games with Incomplete Information. MIT Press, 1995.

[3] J. D. Bloom, Y. A. Chan, R. S. Baric, P. J. Bjorkman, S. Cobery, B. E. Deverman, D. N. Fisman, R. Gupta, A. Iwasaki, M. Lipsitch, R. Medzhitov, R. A. Neher, R. Nielsen, N. Patterson, T. Stearns, E. van Nimwegen, M. Worobey, and D. A. Relman. Investigate the origins of COVID-19. Science, 372(6543):694, 2021.

[4] V. P. Crawford and J. Sobel. Strategic information transmission. Econometrica, 50(6):1431 1451, 1982.

[5] G. de Clippel and X. Zhang. Non-Bayesian persuasion. Journal of Political Economy, 130 (10):2594 2642, 2022.

[6] S. Dughmi and H. Xu. Algorithmic Bayesian persuasion. In Proceedings of the 48th Annual ACM Symposium on Theory of Computing (STOC), pages 412 425, 2016.

[7] P. Dworczak and A. Pavan. Preparing for the worst but hoping for the best: Robust (Bayesian) persuasion. Econometrica, 90(5):2017 2051, 2022.

[8] L. G. Epstein, J. Noor, and A. Sandroni. Non-Bayesian learning. The B.E. Journal of Theoretical Economics, 10(1):732 735, 2010.

[9] Yiding Feng, Chien-Ju Ho, and Wei Tang. Rationality-robust information design: Bayesian persuasion under quantal response. In Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 501 546. SIAM, 2024.

[10] D. Hagmann and G. Loewenstein. Persuasion with motivated beliefs. Carnegie Mellon University technical report, 2017.

[11] D. M. Kahan. Ideology, motivated reasoning, and cognitive reflection. Judgment and Decision Making, 8(4):18, 2013.

[12] D. M. Kahan, E. Peters, M. Wittlin, P. Slovic, L. L. Ouellette, D. Braman, and G. Mandel. The polarizing impact of science literacy and numeracy on perceived climate change risks. Nature Climate Change, 2(10):732 735, 2012.

[13] E. Kamenica and M. Gentzkow. Bayesian persuasion. American Economic Review, 101(6): 2590 2615, 2011.

[14] T. Lin and Y. Chen. Persuading a learning agent. ar Xiv preprint ar Xiv:2402.09721, 2024.

[15] J. Renault. Repeated games with incomplete information. In R. A. Meyers, editor, Encyclopedia of Complexity and Systems Science. Springer, 2012.

[16] C. S. Taber and M. Lodge. Motivated skepticism in the evaluation of political beliefs. American Journal of Political Science, 50(3):755 769, 2006.

[17] C. S. Taber, D. Cann, and S. Kucsova. The motivated processing of political arguments. Political Behavior, 31(2):137 155, 2009.

[18] Wei Tang and Chien-Ju Ho. On the Bayesian Rational Assumption in Information Design. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, 9:120 130, October 2021. ISSN 2769-1349, 2769-1330. doi: 10.1609/hcomp.v9i1.18945. URL https: //ojs.aaai.org/index.php/HCOMP/article/view/18945.

[19] B. M. Tappin, G. Pennycook, and D. G. Rand. Bayesian or biased? Analytic thinking and political belief updating. Cognition, 204:104375, 2020.

[20] E. Ward. Conservatism in human information processing. In D. Kahneman, P. Slovic, and A. Tversky, editors, Judgment Under Uncertainty: Heuristics and Biases. Cambridge University Press, 2007.

[21] K. Yang and H. Zhang. Computational aspects of Bayesian persuasion under approximate best response. ar Xiv preprint ar Xiv:2402.07426, 2024.

Neur IPS Paper Checklist

Question: Do the main claims made in the abstract and introduction accurately reflect the paper s contributions and scope?

Answer: [Yes]

Justification:

Guidelines:

The answer NA means that the abstract and introduction do not include the claims made in the paper. The abstract and/or introduction should clearly state the claims made, including the contributions made in the paper and important assumptions and limitations. A No or NA answer to this question will not be perceived well by the reviewers. The claims made should match theoretical and experimental results, and reflect how much the results can be expected to generalize to other settings. It is fine to include aspirational goals as motivation as long as it is clear that these goals are not attained by the paper.

2. Limitations

Question: Does the paper discuss the limitations of the work performed by the authors?

Answer: [Yes]

Justification: Limitations are discussed in Section 6.

Guidelines:

The answer NA means that the paper has no limitation while the answer No means that the paper has limitations, but those are not discussed in the paper. The authors are encouraged to create a separate "Limitations" section in their paper. The paper should point out any strong assumptions and how robust the results are to violations of these assumptions (e.g., independence assumptions, noiseless settings, model well-specification, asymptotic approximations only holding locally). The authors should reflect on how these assumptions might be violated in practice and what the implications would be. The authors should reflect on the scope of the claims made, e.g., if the approach was only tested on a few datasets or with a few runs. In general, empirical results often depend on implicit assumptions, which should be articulated. The authors should reflect on the factors that influence the performance of the approach. For example, a facial recognition algorithm may perform poorly when image resolution is low or images are taken in low lighting. Or a speech-to-text system might not be used reliably to provide closed captions for online lectures because it fails to handle technical jargon. The authors should discuss the computational efficiency of the proposed algorithms and how they scale with dataset size. If applicable, the authors should discuss possible limitations of their approach to address problems of privacy and fairness. While the authors might fear that complete honesty about limitations might be used by reviewers as grounds for rejection, a worse outcome might be that reviewers discover limitations that aren t acknowledged in the paper. The authors should use their best judgment and recognize that individual actions in favor of transparency play an important role in developing norms that preserve the integrity of the community. Reviewers will be specifically instructed to not penalize honesty concerning limitations.

3. Theory Assumptions and Proofs

Question: For each theoretical result, does the paper provide the full set of assumptions and a complete (and correct) proof?

Answer: [Yes]

Justification: Guidelines:

The answer NA means that the paper does not include theoretical results. All the theorems, formulas, and proofs in the paper should be numbered and crossreferenced. All assumptions should be clearly stated or referenced in the statement of any theorems. The proofs can either appear in the main paper or the supplemental material, but if they appear in the supplemental material, the authors are encouraged to provide a short proof sketch to provide intuition. Inversely, any informal proof provided in the core of the paper should be complemented by formal proofs provided in appendix or supplemental material. Theorems and Lemmas that the proof relies upon should be properly referenced. 4. Experimental Result Reproducibility

Question: Does the paper fully disclose all the information needed to reproduce the main experimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper (regardless of whether the code and data are provided or not)? Answer: [NA] Justification: Guidelines:

The answer NA means that the paper does not include experiments. If the paper includes experiments, a No answer to this question will not be perceived well by the reviewers: Making the paper reproducible is important, regardless of whether the code and data are provided or not. If the contribution is a dataset and/or model, the authors should describe the steps taken to make their results reproducible or verifiable. Depending on the contribution, reproducibility can be accomplished in various ways. For example, if the contribution is a novel architecture, describing the architecture fully might suffice, or if the contribution is a specific model and empirical evaluation, it may be necessary to either make it possible for others to replicate the model with the same dataset, or provide access to the model. In general. releasing code and data is often one good way to accomplish this, but reproducibility can also be provided via detailed instructions for how to replicate the results, access to a hosted model (e.g., in the case of a large language model), releasing of a model checkpoint, or other means that are appropriate to the research performed. While Neur IPS does not require releasing code, the conference does require all submissions to provide some reasonable avenue for reproducibility, which may depend on the nature of the contribution. For example (a) If the contribution is primarily a new algorithm, the paper should make it clear how to reproduce that algorithm. (b) If the contribution is primarily a new model architecture, the paper should describe the architecture clearly and fully. (c) If the contribution is a new model (e.g., a large language model), then there should either be a way to access this model for reproducing the results or a way to reproduce the model (e.g., with an open-source dataset or instructions for how to construct the dataset). (d) We recognize that reproducibility may be tricky in some cases, in which case authors are welcome to describe the particular way they provide for reproducibility. In the case of closed-source models, it may be that access to the model is limited in some way (e.g., to registered users), but it should be possible for other researchers to have some path to reproducing or verifying the results. 5. Open access to data and code

Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material?

Answer: [NA] Justification: Guidelines:

The answer NA means that paper does not include experiments requiring code. Please see the Neur IPS code and data submission guidelines (https://nips.cc/ public/guides/Code Submission Policy) for more details. While we encourage the release of code and data, we understand that this might not be possible, so No is an acceptable answer. Papers cannot be rejected simply for not including code, unless this is central to the contribution (e.g., for a new open-source benchmark). The instructions should contain the exact command and environment needed to run to reproduce the results. See the Neur IPS code and data submission guidelines (https: //nips.cc/public/guides/Code Submission Policy) for more details. The authors should provide instructions on data access and preparation, including how to access the raw data, preprocessed data, intermediate data, and generated data, etc. The authors should provide scripts to reproduce all experimental results for the new proposed method and baselines. If only a subset of experiments are reproducible, they should state which ones are omitted from the script and why. At submission time, to preserve anonymity, the authors should release anonymized versions (if applicable). Providing as much information as possible in supplemental material (appended to the paper) is recommended, but including URLs to data and code is permitted. 6. Experimental Setting/Details

Question: Does the paper specify all the training and test details (e.g., data splits, hyperparameters, how they were chosen, type of optimizer, etc.) necessary to understand the results? Answer: [NA] Justification: Guidelines:

The answer NA means that the paper does not include experiments. The experimental setting should be presented in the core of the paper to a level of detail that is necessary to appreciate the results and make sense of them. The full details can be provided either with the code, in appendix, or as supplemental material. 7. Experiment Statistical Significance

Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments? Answer: [NA] Justification: Guidelines:

The answer NA means that the paper does not include experiments. The authors should answer "Yes" if the results are accompanied by error bars, confidence intervals, or statistical significance tests, at least for the experiments that support the main claims of the paper. The factors of variability that the error bars are capturing should be clearly stated (for example, train/test split, initialization, random drawing of some parameter, or overall run with given experimental conditions). The method for calculating the error bars should be explained (closed form formula, call to a library function, bootstrap, etc.) The assumptions made should be given (e.g., Normally distributed errors). It should be clear whether the error bar is the standard deviation or the standard error of the mean.

It is OK to report 1-sigma error bars, but one should state it. The authors should preferably report a 2-sigma error bar than state that they have a 96% CI, if the hypothesis of Normality of errors is not verified. For asymmetric distributions, the authors should be careful not to show in tables or figures symmetric error bars that would yield results that are out of range (e.g. negative error rates). If error bars are reported in tables or plots, The authors should explain in the text how they were calculated and reference the corresponding figures or tables in the text. 8. Experiments Compute Resources

Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [NA] Justification: Guidelines:

The answer NA means that the paper does not include experiments. The paper should indicate the type of compute workers CPU or GPU, internal cluster, or cloud provider, including relevant memory and storage. The paper should provide the amount of compute required for each of the individual experimental runs as well as estimate the total compute. The paper should disclose whether the full research project required more compute than the experiments reported in the paper (e.g., preliminary or failed experiments that didn t make it into the paper). 9. Code Of Ethics

Question: Does the research conducted in the paper conform, in every respect, with the Neur IPS Code of Ethics https://neurips.cc/public/Ethics Guidelines? Answer: [Yes] Justification: Guidelines:

The answer NA means that the authors have not reviewed the Neur IPS Code of Ethics. If the authors answer No, they should explain the special circumstances that require a deviation from the Code of Ethics. The authors should make sure to preserve anonymity (e.g., if there is a special consideration due to laws or regulations in their jurisdiction). 10. Broader Impacts

Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work performed? Answer: [Yes] Justification: They are discussed in Section 1 and Section 6. Guidelines:

The answer NA means that there is no societal impact of the work performed. If the authors answer NA or No, they should explain why their work has no societal impact or why the paper does not address societal impact. Examples of negative societal impacts include potential malicious or unintended uses (e.g., disinformation, generating fake profiles, surveillance), fairness considerations (e.g., deployment of technologies that could make decisions that unfairly impact specific groups), privacy considerations, and security considerations. The conference expects that many papers will be foundational research and not tied to particular applications, let alone deployments. However, if there is a direct path to any negative applications, the authors should point it out. For example, it is legitimate to point out that an improvement in the quality of generative models could be used to

generate deepfakes for disinformation. On the other hand, it is not needed to point out that a generic algorithm for optimizing neural networks could enable people to train models that generate Deepfakes faster. The authors should consider possible harms that could arise when the technology is being used as intended and functioning correctly, harms that could arise when the technology is being used as intended but gives incorrect results, and harms following from (intentional or unintentional) misuse of the technology. If there are negative societal impacts, the authors could also discuss possible mitigation strategies (e.g., gated release of models, providing defenses in addition to attacks, mechanisms for monitoring misuse, mechanisms to monitor how a system learns from feedback over time, improving the efficiency and accessibility of ML). 11. Safeguards

Question: Does the paper describe safeguards that have been put in place for responsible release of data or models that have a high risk for misuse (e.g., pretrained language models, image generators, or scraped datasets)? Answer: [NA] Justification: Guidelines:

The answer NA means that the paper poses no such risks. Released models that have a high risk for misuse or dual-use should be released with necessary safeguards to allow for controlled use of the model, for example by requiring that users adhere to usage guidelines or restrictions to access the model or implementing safety filters. Datasets that have been scraped from the Internet could pose safety risks. The authors should describe how they avoided releasing unsafe images. We recognize that providing effective safeguards is challenging, and many papers do not require this, but we encourage authors to take this into account and make a best faith effort. 12. Licenses for existing assets

Question: Are the creators or original owners of assets (e.g., code, data, models), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected? Answer: [NA] Justification: Guidelines:

The answer NA means that the paper does not use existing assets. The authors should cite the original paper that produced the code package or dataset. The authors should state which version of the asset is used and, if possible, include a URL. The name of the license (e.g., CC-BY 4.0) should be included for each asset. For scraped data from a particular source (e.g., website), the copyright and terms of service of that source should be provided. If assets are released, the license, copyright information, and terms of use in the package should be provided. For popular datasets, paperswithcode.com/datasets has curated licenses for some datasets. Their licensing guide can help determine the license of a dataset. For existing datasets that are re-packaged, both the original license and the license of the derived asset (if it has changed) should be provided. If this information is not available online, the authors are encouraged to reach out to the asset s creators. 13. New Assets

Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets?

Answer: [NA] Justification: Guidelines:

The answer NA means that the paper does not release new assets. Researchers should communicate the details of the dataset/code/model as part of their submissions via structured templates. This includes details about training, license, limitations, etc. The paper should discuss whether and how consent was obtained from people whose asset is used. At submission time, remember to anonymize your assets (if applicable). You can either create an anonymized URL or include an anonymized zip file. 14. Crowdsourcing and Research with Human Subjects

Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation (if any)? Answer: [NA] Justification: Guidelines:

The answer NA means that the paper does not involve crowdsourcing nor research with human subjects. Including this information in the supplemental material is fine, but if the main contribution of the paper involves human subjects, then as much detail as possible should be included in the main paper. According to the Neur IPS Code of Ethics, workers involved in data collection, curation, or other labor should be paid at least the minimum wage in the country of the data collector. 15. Institutional Review Board (IRB) Approvals or Equivalent for Research with Human Subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or institution) were obtained? Answer: [NA] Justification: Guidelines:

The answer NA means that the paper does not involve crowdsourcing nor research with human subjects. Depending on the country in which research is conducted, IRB approval (or equivalent) may be required for any human subjects research. If you obtained IRB approval, you should clearly state this in the paper. We recognize that the procedures for this may vary significantly between institutions and locations, and we expect authors to adhere to the Neur IPS Code of Ethics and the guidelines for their institution. For initial submissions, do not include any information that would break anonymity (if applicable), such as the institution conducting the review.

A Missing Proofs from Section 4

A.1 Proof of Lemma 4.3

Proof. Suppose that, during its operation, Π selects a signaling scheme π that includes an external signal s S. By definition, for an external signal, the τ-biased belief µτ s = τµ0 + (1 τ)µs is in ext Ra0. This implies that the true posterior µs, derived from the signaling scheme π and the prior µ0, also lies in ext Ra0. Consequently, the line segment connecting µs and µ0, represented as {(1 t)µs + tµ0 | t [0, 1]}, must intersect the boundary Ra0 at some point. Denote this intersection by µ = (1 t )µs + t µ0 Ra0.

We will adjust the original signaling scheme π. To do so, define µs as the belief whose τ-biased version equals µ :

τµ0 + (1 τ) µs = µ µs = (t τ)µ0 + (1 t )µs

Under the original signaling scheme π, according to the splitting lemma (Lemma 2.1), the prior µ0 can be represented as a convex combination of µs and the posteriors associated with other signals s S \ {s}: µ0 = psµs + X

s S\{s} ps µs .

If we change µs to µs, then we obtain a new convex combination (this is valid because µs is on the line segment from µs to µ0): µ0 = ps µs + X

s S\{s} ps µs ,

where ps = ps 1 t + t ps and s S \ {s}, ps = 1 t

1 t + t ps ps .

Then, by the splitting lemma (Lemma 2.1), there exists a signaling scheme π with |S| signals where signal s induces posterior µs and other signals s induces µs . Note that the τ-biased version of µs satisfies τµ0 + (1 τ) µs = µ Ra0, so s is a boundary signal under signaling scheme π .

Since s is a boundary signal, we can immediately tell whether w τ or w τ according to Lemma 4.2 when s is sent and end the algorithm. If any signal s other than s is sent, the induced posterior µs is the same as the posterior in the original signaling scheme π, so the agent will take the same action, and we can just follow the rest of the original algorithm Π. But we note that the probability of signal s being sent under the new signaling scheme π is larger than or equal to the probability under the original signaling scheme π:

ps = ps 1 t + t ps ps.

So, in expectation, we can end the algorithm faster by using π than using π. Hence, by repeating the above procedure to replace all the signaling schemes in the original algorithm Π that use external signals, we obtain a new algorithm Π that only uses boundary and internal signals with smaller or equal sample complexity.

A.2 Proof of Lemma 4.4

Proof. Let Π be any adaptive algorithm using signaling schemes with boundary and internal signals. Let Ht = {(π1, θ1, s1, a1), . . . , (πt, θt, st, at)} be any history that can happen during the execution of Π. If no boundary signal has been sent, then every realized signal sk is an internal signal in the respective signaling scheme πk, with the τ-biased posterior satisfying µτ sk = τµ0 + (1 τ)µsk Ra0. Because Ra0 = µ (Θ) | a A \ {a0}, c a µ > 0 is an open region, there must exist some εk > 0 such that the ℓ1-norm ball Bεk(µτ sk) = {µ (Θ) : µ µτ sk 1 εk} is a subset of Ra0. Let ε = mint k=1 εk > 0. Then Bε(µτ sk) Ra0 for every k = 1, . . . , t. Suppose the agent s bias level w is in the range [τ ε

2]. Then, for every signal sk, the agent s biased belief νsk = wµ0 + (1 w)µsk satisfies:

νsk µτ sk 1 = (w τ)(µ0 µsk) 1 |w τ| µ0 µsk 1 ε.

This means νsk Bε(µτ sk) Ra0. So, the agent should take action a0 given signal sk. Note that this holds for every k = 1, . . . , t and any w [τ ε

2]. So we cannot determine whether w τ or w τ so far. We have to run the algorithm until a boundary signal is sent.

A.3 Proof of Lemma 4.5

Proof. Let π be a signaling scheme that can test whether w τ or w τ. According to Lemma 4.3, π can be assumed to only use boundary and internal signals. Recall that a signal s is boundary if the τ-biased belief µτ s = τµ0 + (1 τ)µs lies on the boundary set Ra0. For a A \ {a0}, let Ba be the set of beliefs under which the agent is indifferent between a and a0 and a and a0 are both better than other actions: Ba = {µ (Θ) | c a µ = 0 and a A, c a µ 0}. The boundary set Ra0 can be written as the union of Ba for a A \ {a0}:

a A\{a0} Ba.

Then, we classify the boundary signals into |A| 1 sets {Sa}a A\{a0} according to which Ba sets their τ-biased beliefs belong to: namely, the set Sa contains boundary signals s under which τµ0 + (1 τ)µs Ba. We then combine the signals in Sa. Specifically, consider the normalized weighted average of the true posterior beliefs associated with the signals in Sa, denoted by µa:

s Sa π(s )µs.

Note that the τ-biased version of µa is also in the set Ba because Ba is a convex set:

τµ0 + (1 τ)µa = X

π(s) P s Sa π(s )

τµ0 + (1 τ)µs Ba.

This means that if a signal a induces true posterior µa, then this signal is a boundary signal.

After defining µa as above for every a A \ {a0}, let s consider the set of internal signals of the signaling scheme π, which we denote by SI. For each internal signal s SI, the τ-biased belief satisfies τµ0 + (1 τ)µs Ra0. Similar to above, we combine all the signals in SI: define µa0 to be the normalized weighted average of the posteriors associated with all internal signals:

s SI π(s )µs.

Then, the τ-biased version of µa0 must be in Ra0 because Ra0 is a convex set:

τµ0 + (1 τ)µa0 = X

τµ0 + (1 τ)µs Ra0.

This means that, if a signal induces posterior µa0, then this signal is internal.

From the splitting lemma (Lemma 2.1), we know that the convex combination of the original posteriors P

s S π(s)µs is equal to the prior µ0. This means that the following convex combination of the new posteriors {µa}a A\a0 and µa0 is also equal to the prior: X

s Sa π(s )µa + X

s SI π(s )µa0 = X

s Sa π(s)µs + X

s SI π(s)µs = X

s S π(s)µs = µ0

where the convex combination weight of µa is P

s Sa π(s ) for every a A \ {a} and the convex combination weight of µa0 is P

s SI π(s ). One can easily verify that the weights sum to 1. Then, by the splitting lemma (Lemma 2.1), there exists a signaling scheme π with signal space of size |A| (so we simply denote the signal space by A) where each signal a A induces posterior µa. We show that this new signaling scheme π satisfies the properties in Lemma 4.5:

Signal a0 induces posterior µa0 whose τ-biased version satisfies τµ0 + (1 τ)µa0 Ra0. So, given signal a0, action a0 is the optimal action for an agent with bias level τ.

For each signal a A \ {a0}, the induced posterior µa satisfies τµ0 + (1 τ)µa Ba Ra0. So, by the definition of Ba, an agent with bias level τ is indifferent between actions a and a0 and these two actions are better than other actions. Also, this signal is a boundary signal by Definition 4.1, which satisfies the following according to Lemma 4.2: if the agent s bias level w < τ, then the agent strictly prefers a over a0; if w > τ, then the agent strictly prefers a0 over a.

The sample complexity of π is the same as π because: (1) the sample complexity is equal to the inverse of the total probability of boundary signals (as a corollary of Lemma 4.4), and (2) the total probability of boundary signals of the two signaling schemes are the same: X

a A\{a0} π (a) = X

s Sa π(s ) = X

s a A\{a0}Sa π(s).

So, Tτ(π ) = Tτ(π).

B Missing Proofs from Section 5

B.1 Proof of Lemma 5.1

Proof. For µ (Θ), by convexity of (Θ), we have (1 τ)µ + τµ0 (Θ). Then,

µ Ia,τ (1 τ)µ + τµ0 Ia c a ((1 τ)µ + τµ0) = 0

(1 τ)c a µ + τc a µ0 = 0

c a µ = τ 1 τ c a µ0.

B.2 Proof of Theorem 5.2

We first prove the first part of Theorem 5.2, then prove the the second and third parts.

B.2.1 Proof of Part 1 of Theorem 5.2

We want to prove that w τ or w τ can be tested with a single sample if and only if the prior µ0 is in the convex hull formed by the translated sets Ia,τ for all non-default actions a A \ {a0}: µ0 Convex Hull a A\{a0} Ia,τ .

The if part. Suppose µ0 Convex Hull a A\{a0} Ia,τ , namely, there exist a set of positive weights {ps}s S and a set of posterior beliefs {µs}s S such that

where each µs Ia,τ for some a A \ {a0}. By definition, the τ-biased belief τµ0 + (1 τ)µs is in the indifference set Ia. Recall the definition of the boundary set Ra0 (Equation (3)), which is the set of beliefs under which the agent is indifferent between a0 and some other action and these two actions are better any other actions. The τ-biased belief τµ0 + (1 τ)µs Ia may or may not belong to Ra0, depending on whether a and a0 are better than any other actions:

If τµ0 + (1 τ)µs Ra0, then s is a boundary signal (by Definition 4.1) and hence useful for testing whether w τ or w τ (Lemma 4.2). Denote µ s = µs in this case.

If τµ0 + (1 τ)µs / Ra0, then there must exist some action a that is strictly better than a and a0 for the agent at the τ-biased belief, hence τµ0 + (1 τ)µs ext Ra0 (so s is an external signal). Then, according to the argument in Lemma 4.3, we can find another belief µ s on the line segment between µs and µ0 such that the τ-biased version of µ s lies exactly on the boundary set Ra0:

τµ0 + (1 τ)µ s Ra0, µ s = tµs + (1 t)µ0 for some t [0, 1].

After the above discussion, we have found a µ s that is either equal to µs or on the line segment between µs and µ0, for every s S. So, µ0 can be written as a convex combination of {µ s}s S:

s S p sµ s.

Moreover, the µ s defined above satisfies τµ0 + (1 τ)µ s Ra0. So, a signal inducing true posterior µ s will be a boundary signal and useful for testing w τ or w τ (Lemma 4.2). Finally, by the splitting lemma (Lemma 2.1), we know that there must exist a signaling scheme π with signal space S where each signal s S indeed induces posterior µ s. Such a signaling scheme sends useful (boundary) signals with probability 1. Hence, the sample complexity of it is 1.

The only if part. Suppose whether w τ or w τ can be tested with a single sample. This means that the optimal signaling scheme obtained from the linear program in Algorithm 1 must satisfy P

a A\{a0} π(a) = P

θ Θ π(a|θ)µ0(θ) = 1, namely, the total probability of useful signals (signals in A \ {a0}) is 1. Then, by the splitting lemma, the prior µ0 can be expressed as the convex combination µ0 = X

a A\{a0} π(a)µa

where π(a) = P

θ Θ µ0(θ)π(a|θ) is the unconditional probability of signal a and µa is the true posterior induced by signal a. Moreover, the indifference constraint (6) in the linear program ensures that the agent is indifferent between a and a0 upon receiving signal a if the agent has bias level τ: mathematically, τµ0 + (1 τ)µa Ia. This means µa Ia,τ by definition. So, we obtain

µ0 Convex Hull [

a A\{a0} Ia,τ

B.2.2 Proof of Parts 2 and 3 of Theorem 5.2

We first prove that, if whether w τ or w τ can be tested with finite sample complexity, then Ia,τ = for at least one a A \ {a0}.

According to Lemma 4.1, if we can test whether w τ or w τ with finite sample complexity using adaptive algorithms, then we can do this using a constant signaling scheme. Lemma 4.3 further ensures that we can do this using a constant signaling scheme π with only boundary and internal signals. But according to Lemma 4.4, internal signals are not useful for testing w τ or w τ. So, the signaling scheme π must send some boundary signal s with positive probability. Let µs be the true posterior induced by s. By the definition of boundary signal, τµ0 + (1 τ)µs Ra0, implying that the agent is indifferent between a0 and some action a A \ {a0} if their belief is τµ0 +(1 τ)µs (and a0 and a are better than any other actions). This means τµ0 +(1 τ)µs Ia, so µs Ia,τ by definition. Hence, Ia,τ = .

We then prove the opposite direction: if Ia,τ = for at least one a A \ {a0}, then whether w τ or w τ can be tested with finite sample complexity.

Let a1 A \ {a0} be an action for which Ia1,τ = . We claim that: Claim B.1. There exists a state θ1 Θ for which the agent weakly prefers action a1 over action a0 if the true posterior is state θ1 with probability 1 and the agent has bias level τ. In notation, let eθ1 (Θ) be the vector whose θ1th component is 1 and other components are 0. The agent weakly prefers action a1 over action a0 under belief τµ0 + (1 τ)eθ1.

Proof. Suppose on the contrary that no such state θ1 exists. Then the agent strictly prefers a0 over a1 under belief τµ0 + (1 τ)eθ for every state θ Θ. This implies that, for any belief µ (Θ),

the agent should also strictly prefer a0 over a1 under the belief τµ0 + (1 τ)µ, due to linearity of the agent s utility with respect to the belief. The agent strictly preferring a0 over a1 implies τµ0 + (1 τ)µ / Ia, so µ cannot be in Ia,τ by definition. This holds for any µ (Θ), so Ia,τ = , a contradiction.

Let θ1 be the state in the above claim. The prior µ0 can be trivially written as the convex combination of eθ1 and eθ for other states θ:

µ0 = µ0(θ1)eθ1 + X

θ Θ\{θ1} µ0(θ)eθ.

Since the agent does not prefer a0 under belief τµ0 +(1 τ)eθ1, the belief τµ0 +(1 τ)eθ1 cannot be in the region Ra0. The prior µ0 is in the region Ra0. Consider the line segment connecting eθ1 and the prior µ0. There must exist a point µ = teθ1 + (1 t)µ0 on the line segment such that the τ-biased belief τµ0 + (1 τ)µ lies exactly on the boundary of Ra0. Clearly, the prior can also be written as a convex combination of µ and eθ for θ Θ \ {θ1}:

µ0 = p µ + X

θ Θ\{θ1} p θeθ.

Then by the splitting lemma (Lemma 2.1), there exists a signaling scheme with |Θ| signals where one signal induces posterior µ and the other signals induce posteriors {eθ}θ Θ\{θ1}. In particular, the signal inducing µ is a boundary signal since τµ0 + (1 τ)µ Ra0 by construction. By Lemma 4.2, that signal is useful for testing w τ or w τ. When that signal is sent (which happens with positive probability p > 0 at each time step), we can tell w τ or w τ. This finishes the proof.

The two directions proved above together prove the parts 2 and 3 of Theorem 5.2.

C A More General Bias Model

We define a more general model of biased belief than the linear model. The agent s bias is captured by some function ϕ : (Θ) (Θ) [0, 1] (Θ). Given prior µ0 (Θ), true posterior µs (Θ), and bias level w [0, 1], the agent has biased belief ϕ(µ0, µs, w). The linear model is the special case where ϕ(µ0, µs, w) = wµ0+(1 w)µs. We make the following natural assumptions on ϕ:

Assumption C.1.

ϕ(µ0, µs, 0) = µs (no bias), ϕ(µ0, µs, 1) = µ0 (full bias).

ϕ(µ0, µs, w) is continuous in µ0, µs, w.

We then make some joint assumptions on the bias model ϕ and the agent s utility function U. Recall that the notation Ra0 = µ (Θ) | a A \ {a0}, c a µ > 0 is the region of beliefs under which the agent strictly prefers action a0, Ra0 is the boundary of Ra0, and ext Ra0 is the exterior of Ra0 where the agent strictly not prefers a0.

Assumption C.2. When receiving no information, the agent will take the default action: µ0 (Θ), w [0, 1], ϕ(µ0, µ0, w) Ra0.

Definition C.1. We say that a posterior belief µ (Θ) satisfies single-crossing if the curve {ϕ(µ0, µ, w) : w [0, 1]} passes the boundary Ra0 only once: namely, there exists w [0, 1] such that

w [0, w), ϕ(µ0, µ, w) ext Ra0; ϕ(µ0, µ, w) Ra0; w (w, 1], ϕ(µ0, µ, w) Ra0.

We assume that all posteriors outside Ra0 satisfy single-crossing, and all posteriors inside Ra0 do not cross the boundary when the bias level varies in [0, 1]:

Assumption C.3.

Any µ / Ra0 satisfies single-crossing.

For any µ Ra0, any w [0, 1], ϕ(µ0, µ, w) Ra0.

Under the above general bias model with the stated assumptions, our results regarding the optimality of constant signaling schemes (Lemma 4.1) still holds. The geometric characterization of testability of bias (Theorem 5.2) holds after redefining some notations. Let Ia = {µ (Θ) | c a µ = 0} still be the set of beliefs where the agent is indifferent between actions a and a0. Let Ia,τ still be the set of posterior beliefs for which an agent with bias level τ will be indifferent between a and a0, but with a more general expression than the linear model:

Ia,τ := {µ (Θ) | ϕ(µ0, µ, τ) Ia}.

With the above definition of Ia,τ, Theorem 5.2 still holds. We omit the proofs because they are almost identical to the proofs for the linear model.

The revelation principle (Lemma 4.5) and the linear program algorithm for computing the optimal signaling scheme (Algorithm 1 and Theorem 4.6) do not apply to the general bias model because ϕ is not linear. Designing an efficient algorithm to compute a good signaling scheme to test bias in a more general model is an interesting future direction.