# active_fairness_auditing__8da3b08f.pdf

Active Fairness Auditing

Tom Yan * 1 Chicheng Zhang * 2

The fast spreading adoption of machine learning (ML) by companies across industries poses signiﬁcant regulatory challenges. One such challenge is scalability: how can regulatory bodies efﬁciently audit these ML models, ensuring that they are fair? In this paper, we initiate the study of query-based auditing algorithms that can estimate the demographic parity of ML models in a query-efﬁcient manner. We propose an optimal deterministic algorithm, as well as a practical randomized, oracle-efﬁcient algorithm with comparable guarantees. Furthermore, we make inroads into understanding the optimal query complexity of randomized active fairness estimation algorithms. Our ﬁrst exploration of active fairness estimation aims to put AI governance on ﬁrmer theoretical foundations.

1. Introduction

With the growing usage of artiﬁcial intelligence across industries, governance efforts are increasingly ramping up. A key challenge in regulatory efforts is the problem of scalability. Even for well-resourced countries like Norway, which is a pioneer in AI governance, regulators are only able to monitor and engage with a small fraction of the companies (Mc Carthy, 2021). This growing issue calls for a better understanding of efﬁcient algorithms that can audit machine learning (ML) models, which we now formalize.

Problem Formulation: A regulatory institution is interested in auditing an unknown model h : X { 1, 1} held by a company (e.g. a lending company in the ﬁnance sector), where X is the feature space (e.g. of all information supplied by users). We assume that the regulatory institution only has knowledge of the hypothesis class H where

*Equal contribution 1Carnegie Mellon University 2University of Arizona. Correspondence to: Tom Yan <tyan2@andrew.cmu.edu>, Chicheng Zhang <chichengz@cs.arizona.edu>.

Proceedings of the 39 th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022. Copyright 2022 by the author(s).

h comes from (e.g. the family of linear classiﬁers), and it would like to estimate µ(h ) for a function µ that measures the model property of interest. To this end, the institution is allowed to send black-box queries to the model h , i.e. send the company a query example x and receive h (x). The regulatory institution s goal is to efﬁciently estimate µ(h ) to within an error of at most ϵ > 0.

We measure an algorithm s efﬁciency in terms of both its query complexity and computational complexity. Having an auditing algorithm with low query and computational complexity naturally helps to address the scalability challenge: greater efﬁciency means that each audit may be processed faster and more audits may be processed at a time.

Property of Interest: While which properties µ to assess is still heavily debated by regulators, we initiate the study of algorithms that audit fairness, a mainstay in regulatory efforts. In particular, we will take µ to be Demographic Parity (DP)1: given distribution DX over X {0, 1} (where feature x and sensitive attribute x A are jointly drawn from), µDX(h) = Pr(x,x A) DX(h(x) = 1|x A = 1) Pr(x,x A) DX(h(x) = 1|x A = 0). For brevity, when it is clear from context, we abbreviate Pr DX, µDX as Pr, µ, respectively. DP measures the degree of disparate treatment of model h on the two sub-populations x | x A = 0 and x | x A = 1, which we assume are non-negligible: p := min(Pr(x A = 1), Pr(x A = 0)) = Ω(1). Achieving a small Demographic Parity may be thought of as a stronger version of the US Equal Employment Opportunity Commission s four-ﬁfths rule .2

To focus on query complexity, we will abstract away the difﬁculty of evaluating µ by assuming that DX is known, which means that for any h we may evaluate µ(h) to arbitrary precision; for instance, this may be achieved with the availability of an arbitrarily large number of (unlabeled) samples randomly drawn from x | x A = 0 and x | x A = 1. Our main challenge is that we do not know h and only want to query h insofar as to be able to accurately estimate µ(h ).

1While fairness is the focus of our work, our algorithm may be adapted to any µ which is a function of X and h . 2The selection rate for any race, sex, or ethnic group [must be at least] four-ﬁfths (4/5) (or eighty percent) of the rate for the group with the highest rate.

Active Fairness Auditing

Guarantees of the Audit: In this paper, we investigate algorithms that can provide two types of guarantees. The ﬁrst is the natural, direct estimation accuracy: the estimate returned by the algorithm should be within ϵ of µ(h ).

The second is that of manipulation-proof (MP) estimation. Audits can be very consequential to companies as they may be subject to hefty penalties if caught with violations. Not surprisingly, there have been effortful attempts in the past to avoid being caught with violations (e.g. Hotten, 2015) by gaming the audit. We formulate our notion of manipulation-proofness in light of one way the audit may be gamed, which we now describe. Note that all the auditor knows about the model used by the company is that it is consistent with the queried labels in the audit. So, while our algorithm may have estimated µ(h ) accurately during audit-time, nothing stops the company from changing its model post-audit from h to a different model hnew H (e.g to improve proﬁt), so long as hnew is still consistent with the queries seen during the audit. With this, we also look to understand: given this post-hoc possibility of manipulation, can we devise an algorithm that nonetheless ensures the algorithm s estimate is within ϵ of µ(hnew)?

Indeed, a robust set of audit queries would serve as a certiﬁcate that no matter which model the company changes to after the audit, its µ-estimation would remain accurate. Given a set of classiﬁers V , a classiﬁer h, and a unlabeled dataset S, deﬁne the version space (Mitchell, 1982) induced by S to be V (h, S) := h V : h (S) = h(S) . An auditing algorithm is ϵ-manipulation-proof if, for any h , it outputs a set of queries S and estimate ˆµ that guarantees that maxh H(h ,S) µ(h) ˆµ ϵ.

Baseline: i.i.d Sampling: One natural baseline that comes to mind for the direct estimation is i.i.d sampling. We sample O(1/ϵ2) examples i.i.d from the distribution x | x A = i for i {0, 1}, query h on these examples and take the average to obtain an estimate of Pr(h (x) = +1 | x A = i). Finally, we take the difference of these two estimates as our ﬁnal DP estimate. By Hoeffding s Inequality, with high probability, this estimate is ϵ-accurate, and this estimation procedure makes O(1/ϵ2) queries.

However, i.i.d sampling is not necessarily MP. To see an example, let there be 2n points in group x A = 1 with n = 1/ϵ2 that are shattered by H and DX is uniform over these points. Suppose that all points in group x A = 0 are labeled the same: Pr DX(h(x) = 1|x A = 0) = 0, h H. Then, µ-estimation reduces to estimating the proportion of positives in group x A = 1. i.i.d sampling will randomly choose n of these data points to see, and it will produce an ϵ-accurate estimate of µ(h ). However, we do not see the other n points. Since the 2n points are shattered by H, after the queried points are determined, we see that the company can increase or decrease DP by up to 1/2 by switching to a

different model.

To perform both direct and MP estimation, it seems promising then to examine algorithms that make use of non-iid sampling. Moreover, for MP, we observe that the auditing algorithm should leverage knowledge of the hypothesis class as well, which i.i.d sampling is agnostic to.

Baseline: Active Learning: PAC active learning (Hanneke, 2014) (where PAC stands for Probably Approximately Correct (Valiant, 1984)) algorithms are a set of algorithms that can achieve both direct and MP estimation accuracy. PAC active learning algorithms guarantee that, with high probability, ˆh in the resultant version space is such that P(ˆh(x) = h (x)) pϵ = O(ϵ). With this, we have µ(ˆh) µ(h ) ϵ (see Lemma C.1 in Appendix C for a formal proof).

To mention a setting where learning is favored over i.i.d sampling, learning homogeneous linear classiﬁers under certain well-behaved unlabeled data distributions requires only O(d log 1/ϵ) queries (e.g. Dasgupta, 2005b; Balcan & Long, 2013) and would thus be far more efﬁcient than O(1/ϵ2) for low-dimensional learning settings with high auditing precision requirements.

Still, as our goal is only to estimate the µ values of the induced version space, it is unclear if we always need to go as far as to learn the model itself. In this paper, we investigate whether, and if so when, it may be possible to design adaptive approaches to efﬁciently directly and MP estimate µ(h ) using knowledge of H.

To the best of our knowledge, we are the ﬁrst to theoretically investigate active approaches for direct and MP estimation of µ(h ). Our ﬁrst exploration of active fairness estimation seeks to provide a more complete picture of the theory of auditing machine learning models. Our hope is that our theoretical results can pave the way for subsequent development of practical and efﬁcient algorithms.

Our Contributions: Our main contributions are on two fronts, MP and direct estimation of µ(h ):

For the newly introduced notion of manipulationproofness, we identify a statistically optimal, but computationally intractable deterministic algorithm. We gain insights into its query complexity through comparisons to the two baselines, i.i.d sampling and PAC active learning. In light of the computational intractability of the optimal deterministic algorithm, we design a randomized algorithm that enjoys oracle efﬁciency (e.g. Dasgupta et al., 2007): it has an efﬁcient implementation given access to a mistake-bounded online learning oracle, and an constrained empirical risk minimization oracle for the hypothesis class H. Furthermore, its query

Active Fairness Auditing

performance matches that of the optimal deterministic algorithm up to polylog|H| factors. Finally, on the direct estimation front, we obtain bounds on information-theoretic query complexity. We establish that MP estimation may be more expensive than direct estimation, thus highlighting the need to develop separate algorithms for the two guarantees. Then, we establish the usefulness of randomization in algorithm design and develop an optimal, randomized algorithm for linear classiﬁcation under Gaussian subpopulations. Finally, to shed insight on auditing in general settings, we develop distribution-free lower bounds for direction estimation under general VC classes. This lower bound charts the query complexity that any optimal randomized auditing algorithms must attain.

1.1. Additional Notations

We now introduce some additional useful notation used throughout the paper. Let [m] denote {1, ..., m}. For an unlabeled dataset S, and two classiﬁers h, h , we say h(S) = h (S) if for all x S, h(x) = h (x). Given a set of classiﬁers V and a labeled dataset T, deﬁne V [T] := h V : (x, y) T, h(x) = y . Furthermore,

denote by V y x = V h (x, y) i for notational simplicity. Given a set of classiﬁers V and fairness measure µ, denote by diamµ(V ) := maxh,h V µ(h) µ(h ) the µ-diameter of V . Given a set of labeled examples T, denote by Pr T ( ) the probability over the uniform distribution on T; given a classiﬁer h, denote by err(h, T) = Pr T (h(x) = y) the empirical error of h on T.

Throughout this paper, we will consider active fairness auditing under the membership query model, similar to membership query-based active learning (Angluin, 1988). Specifically, a deterministic active auditing algorithm A with label budget N is formally deﬁned as a collection of N + 1 (computable) functions f1, f2, . . . , f N, g such that:

1. For every i [N], fi : (X Y)i 1 X is the label querying function used at step i, that takes into input the ﬁrst (i 1) labeled examples (x1, y1), . . . , (xi 1, yi 1) obtained so far, and chooses the i-th example xi for label query.

2. g : (X Y)N R is the estimator function that takes into input all N labeled examples (x1, y1), . . . , (x N, y N) obtained throughout the interaction process, and outputs ˆµ, the estimate of µ(h ).

When A interacts with a target classiﬁer h, let the resultant queried unlabeled dataset be SA,h = x1, . . . , x N , and the ﬁnal µ estimate be ˆµA,h.

Similar to deterministic algorithms, a randomized active

auditing algorithm A with label budget N and B bits of random seed is formally deﬁned as a collection of N + 1 (computable) functions f1, . . . , f N, g, where fi : (X Y)i 1 {0, 1}B X and g : (X Y)N {0, 1}B R. Note that each function now take as input a B-bit random seed; as a result, when A interacts with a ﬁxed h , its output ˆµ is now a random variable. Note also that under the above deﬁnition, a randomized active auditing algorithm A that uses a ﬁxed seed b may be viewed as a deterministic active auditing algorithm Ab.

We will be comparing our algorithms query complexities with those of disagreement-based active learning algorithms (Cohn et al., 1994; Hanneke, 2014). Given a classiﬁer h and r > 0, deﬁne B(h, r) = n h H : Pr DX h (x) = h(x) r o as the disagreement ball centered at h with radius r. Given a set of classiﬁers V , deﬁne its disagreement region DIS(V ) = x X : h, h V : h(x) = h (x) . For a hypothesis class H and an unlabeled data distribution DX, an important quantity that characterizes the query complexity of disagreement-based active learning algorithm is the disagreement coefﬁcient θ(r), deﬁned as

θ(r) = sup h H,r r

Pr DX(x DIS(B(h, r )))

2. Related Work

Our work is most related to the following two lines of work, both of which are concerned with estimating some property of a model without having to learn the model itself.

Sample-Efﬁcient Optimal Loss Estimation: Dicker (2014); Kong & Valiant (2018) propose U-statistics-based estimators that estimate the optimal population mean square error in d-dimensional linear regression, with a sample complexity of O(

d) (much lower than O(d), the sample complexity of learning optimal linear regressor). Kong & Valiant (2018) also extend the results to a well-speciﬁed logisitic regression setting, where the goal is to estimate the optimal zero-one loss. Our work is similar in focusing on the question of efﬁcient µ(h ) estimation without having to learn h . Our work differs in focusing on fairness property instead of the optimal MSE or zero-one loss. Moreover, our results apply to arbitrary H, and not just to linear models.

Interactive Veriﬁcation: Goldwasser et al. (2021) studies veriﬁcation of whether a model h s loss is near-optimal with respect to a hypothesis class H and looks to understand when veriﬁcation is cheaper than learning. They prove that veriﬁcation is cheaper than learning for speciﬁc hypothesis classes and is just as expensive for other hypothesis classes. Again, our work differs in focusing on a different property of the model, fairness.

Active Fairness Auditing

Our algorithm also utilizes tools from active learning and machine teaching, which we review below.

Active Learning and Teaching: The task of learning h

approximately through membership queries has been wellstudied (e.g. Angluin, 1988; Heged us, 1995; Dasgupta, 2005a; Hanneke, 2006; 2007). Our computationally efﬁcient algorithm for active fairness auditing is built upon the connection between active learning and machine teaching (Goldman & Kearns, 1995), as ﬁrst noted in Heged us (1995); Hanneke (2007). To achieve computational efﬁciency, our work builds on recent work on black-box teaching (Dasgupta et al., 2019), which implicitly gives an efﬁcient procedure for computing an approximate-minimum specifying set; we adapt Dasgupta et al. (2019) s algorithm to give a similar procedure for approximating the minimum specifying set that speciﬁes the µ value.

In the interest of space, please see discussion of additional related work in Appendix A.

3. Manipulation-Proof Algorithms

3.1. Optimal Deterministic Algorithm

We begin our study of the MP estimation of µ(h ) by identifying an optimal deterministic algorithm based on dynamic programming. Inspired by a minimax analysis of exact active learning with membership queries (Hanneke, 2006), we recursively deﬁne the following value function for any version space V H:

( 0, diamµ(V ) 2ϵ 1 + minx maxy Cost(V [(x, y)]), otherwise

Note that Cost(V ) is similar to the minimax query complexity of exact active learning (Hanneke, 2006), except that the induction base case is different here the base case is diamµ(V ) 2ϵ, which implies that subject to h V , we have identiﬁed µ(h ) up to error ϵ. In contrast, in exact active learning, Hanneke (2006) s induction base case is |V | = 1, where we identify h through V .

The value function Cost also has a game-theoretic interpretation. Imagine that a learner plays a multi-round game with an adversary. The learner makes sequential queries of examples to obtain their labels, and the adversary reveals the labels of the examples, subject to the constraint that all labeled examples shown agree with some classiﬁer in H. The version space V encodes the state of the game: it is the set of classiﬁers that agrees with all the labeled examples shown so far in the game. The interaction between the learner and the adversary ends when all classiﬁers in V has µ values 2ϵ-close to each other. The learner would like to minimize its total cost, which is the number of rounds. Cost(V ) can be viewed as the minimax-optimal future cost, subject to the

Algorithm 1 Minimax optimal deterministic auditing Require: Finite hypothesis class H, target error ϵ, fairness measure µ Ensure: ˆµ, an estimate of µ(h )

1: Let V H 2: while diamµ(V ) > 2ϵ do 3: Query x argminx maxy Cost (V y x ), obtain label h (x) 4: V V (h , {x}) 5: return 1

2 maxh V µ(h) + minh V µ(h)

game s current state being represented by version space V .

Based on the notion of Cost, we design an algorithm, Algorithm 1, that has a worst-case label complexity at most Cost(H). Speciﬁcally, it maintains a version space V H, initialized to H (line 1). At every iteration, if the µ-diameter of V , diamµ(V ) = maxh,h V µ(h) µ(h ), is at most 2ϵ, then since µ(h ) I = [minh V µ(h), maxh V µ(h)] returning the midpoint of I gives us an ϵ-accurate estimate of µ(h ) (line 5). Otherwise, Algorithm 1 makes a query by choosing the x that minimizes the worst-case future value functions (line 3). After receiving h (x), it updates its version space V (line 4). By construction, the interaction between the learner and the labeler lasts for at most Cost(V ) rounds, which gives the following theorem.

Theorem 3.1. If Algorithm 1 interacts with some h H, then it outputs ˆµ such that ˆµ µ(h ) ϵ, and queries at most Cost(H) labels.

By the minimax nature of Cost, we also show that among all deterministic algorithms, Algorithm 1 has the optimal worst-case query complexity:

Theorem 3.2. If A is a deterministic algorithm with query budget N Cost(H) 1, there exists some h H, such that ˆµ, the output of A after querying h , satisﬁes ˆµ µ(h ) > ϵ.

The proofs of Theorems 3.1 and 3.2 are deferred to Appendix D.1.

3.1.1. COMPARISON TO BASELINES

To gain a better understanding of Cost(H), we relate it to the label complexity of the two baselines, i.i.d sampling and active learning. To establish the comparison, we prove that we can derandomize existing i.i.d sampling-based and active learning-based auditing algorithms with a small overhead on label complexity.

Our ﬁrst result is that the label complexity of Algorithm 1 is within a factor of O(ln |H|) of the label complexity of i.i.d sampling.

Proposition 3.3. Cost(H) O 1

ϵ2 ln |H| .

Active Fairness Auditing

Our second result is that the label complexity of Algorithm 1 is always no worse than the distribution-dependent label complexity of CAL (Cohn et al., 1994; Hanneke, 2014), a well-known PAC active learning algorithm. We believe that similar bounds comparing Cost(H) to the complexity of generic active learning algorithms can also be shown; these algorithms include the Splitting Algorithm (Dasgupta, 2005b) or the conﬁdence-based algorithm of Zhang & Chaudhuri (2014), through suitable derandomization procedures.

Proposition 3.4. Cost(H) O θ(ϵ) ln |H| ln 1

ϵ , where θ is the disagreement coefﬁcient of H with respect to DX (recall Section 1.1 for its deﬁnition).

Proof sketch. We present Algorithm 2, which is a derandomized version of the Phased CAL algorithm (Hsu, 2010, Chapter 2). To prove this proposition, using Theorem 3.2, it sufﬁces to show that Algorithm 2 has a deterministic label complexity bound of O θ(ϵ) ln |H| ln 1

ϵ . We only present the main idea here, and defer a precise version of the proof to Appendix D.3.

We ﬁrst show that for every n, the optimization problem in line 5 is always feasible. To see this, observe that if we draw Sn, a sample of size mn, drawn i.i.d from DX, we have:

1. By Bernstein s inequality, with probability 1 1

Pr Sn(x DIS(Vn)) 2 Pr DX(x DIS(Vn))+ln 8

2. By Bernstein s inequality and union bound over h, h H, we have with probability 1 1

h, h H : Pr S(h(x) = h (x)) = 0

= Pr DX(h(x) = h (x)) 16 ln |H|

By union bound, with nonzero probability, the above two condition hold simultaneously, showing the feasibility of the optimization problem.

We then argue that for all n, Vn+1 B(h , 16 ln |H|

mn ). This is because for each h Vn+1, h and h are both in Vn and therefore they agree on Sn \ Tn; on the other hand, h and h agree on Tn by the deﬁnition of of Vn+1. As a consequence, Pr Sn(h(x) = h (x)) = 0, which implies that Pr DX(h(x) = h (x)) 16 ln |H|

mn . As a consequence, for all h VN+1, Pr(h(x) = h (x)) pϵ, which, combined with Lemma C.1, implies that µ(h) µ(h ) ϵ.

Algorithm 2 Derandomized Phased CAL for Auditing Require: Hypothesis class H, target error ϵ, minority population proportion p, fairness measure µ Ensure: ˆµ, an estimate of µ(h )

1: Let N = log2 16 ln |H|

2: Let V1 H 3: for n = 1, . . . , N do 4: Let mn = 2n

5: Find (the lexicographically smallest) Sn X mn such that:

Pr Sn(x DIS(Vn)) 2 Pr DX(x DIS(Vn))+ln 8

h, h H : Pr Sn(h(x) = h (x)) = 0

= Pr DX(h(x) = h (x)) 16 ln |H|

6: Query h for the labels of examples in Tn := Sn DIS(Vn) 7: Vn+1 Vn(h , Tn). 8: return µ(h) for an arbitrary h VN+1.

Finally, to upper bound Algorithm 2 s label complexity:

n=1 mn (2 Pr DX(x DIS(Vn)) + ln 8

n=1 mn (2θ(ϵ)16 ln |H|

O θ(ϵ) ln |H| ln 1

3.1.2. COMPUTATIONAL HARDNESS OF IMPLEMENTING ALGORITHM 1

Although Algorithm 1 attains the optimal label complexity of deterministic algorithms, we show in the following proposition that, under standard complexity-theoretic assumptions (NP TIME(n O(log log n))), even approximating Cost(H) is computationally intractable.

Proposition 3.5. If there is an algorithm that can approximate Cost(H) to within 0.3 ln |H| factor in poly(|H|, |X|, 1/ϵ) time, then NP TIME(n O(log log n)).

We remark that the constant 0.3 can be improved to a constant arbitrarily smaller than 1. The main insight behind this proposition is a connection between Cost(H) and optimaldepth decision trees (see Theorem D.4). Using the hardness of computing an approximately-optimal-depth decision

Active Fairness Auditing

tree (Laber & Nogueira, 2004) and taking into account the structure of µ, we establish the intractability of approximating Cost(H).

Owing to the intractability of Algorithm 1, in the next section, we turn to the design of a computationally efﬁcient algorithm whose label complexity nears that of Algorithm 1 (i.e. Cost(H)).

3.2. Efﬁcient Randomized Algorithm with Competitive Guarantees

We present our efﬁcient algorithm in this section, which also serves as a ﬁrst upper bound on the statistical complexity of computationally tractable algorithms. Our algorithm, Algorithm 3, is inspired by the exact active learning literature (Heged us, 1995; Hanneke, 2007), based on a connection between machine teaching (Goldman & Kearns, 1995) and active learning.

Algorithm 3 takes into input two oracles, a mistake-bounded online learning oracle O and an constrained empirical risk minimization (ERM) oracle C-ERM, deﬁned below.

Deﬁnition 3.6. An online-learning oracle O is said to have a mistake bound of M for hypothesis class H, if for any classiﬁer h H, and any sequence of examples x1, x2, . . ., at every round t N, given historical examples (xs, h (xs))t 1 s=1, outputs classiﬁer ˆht such that P t=1 I(ˆht(xt) = h (xt)) M.

Well-known implementations of mistake bounded online learning oracle include the halving algorithm and its efﬁcient sampling-based approximations (Bertsimas & Vempala, 2004) as well as the Perceptron / Winnow algorithm (Littlestone, 1988; Ben-David et al., 2009). For instance, if O is the halving algorithm, a mistake bound of M = log2 |H| may be achieved.

We next deﬁne the constrained ERM oracle, which has been previously used in a number of works on oracle-efﬁcient active learning (Dasgupta et al., 2007; Hanneke, 2011; Huang et al., 2015).

Deﬁnition 3.7. An constrained ERM oracle for hypothesis class H, C-ERM, is one that takes as input labeled datasets A and B, and outputs a classiﬁer ˆh argmin err(h, A) : h H, err(h, B) = 0 .

The high-level idea of Algorithm 3 is as follows: at every iteration, it uses the mistake-bounded online learning oracle to generate some classiﬁer ˆh (line 3); then, it aims to construct a dataset T of small size, such that after querying h for the labels of examples in T, one of the following two happens: (1) ˆh disagrees with h on some example in T; (2) for all classiﬁers in the version space V = h H : x T, h(x) = h (x) , we have diamµ(V ) 2ϵ. In case (1), we have found a counterex-

ample for ˆh, which can be fed to the online learning oracle to learn a new model, and this can happen at most M times; in case (2), we are done: our queried labeled examples ensure that our auditing estimate is ϵ-accurate, and satisﬁes manipulation-proofness. Dataset T of such property is called a (µ, ϵ)-specifying set for ˆh, as formally deﬁned in Deﬁntion D.7 in Appendix D.5.

Another view of the µ-specifying set is a set T such that for all h, h with µ(h) µ(h ) > 2ϵ, there exists some x T, such that h(x) = ˆh(x) or h (x) = ˆh(x). The requirements on T can be viewed as a set cover problem, where the universe U is (h, h ) H2 : µ(h) µ(h ) > 2ϵ , and the set system is C = {Cx : x X}, where (h, h ) is in Cx if h(x) = ˆh(x) or h (x) = ˆh(x).

This motivates us to design efﬁcient set cover algorithms in this context. A key challenge of applying standard ofﬂine set cover algorithms (such as the greedy set cover algorithm) to construct approximate minimum (µ, ϵ)-specifying set is that we cannot afford to enumerate all elements in the universe U as U can be exponential in size.

In face of this challenge, we draw inspiration from online set cover literature (Alon et al., 2009; Dasgupta et al., 2019) to design an oracle-efﬁcient algorithm that computes O(log |H| log |X|)-approximate minimum (µ, ϵ)- specifying sets, which avoids enumeration over U.

Our key idea is to simulate an online set cover process. We build the cover set3 T iteratively, starting from T = (line 4). At every inner iteration, we ﬁrst try to ﬁnd a pair (h1, h2) in U not yet covered by the current T. As we shall see next, this step (line 7) can be implemented efﬁciently given the constrained ERM oracle C-ERM. If such a pair (h1, h2) can be found, we use the online set cover algorithm implicit in (Dasgupta et al., 2019) to ﬁnd a new example that covers this pair, add it to T, and move onto the next iteration (lines 11 to 14). Otherwise, T has successfully covered all the elements in U, in which case we break the inner loop (line 9).

To see how line 7 ﬁnds an uncovered pair in U, we note that it can be also written as:

(h1, h2) = argmax h,h H

n µ(h) µ(h ) : h(T) = h (T) = ˆh(T) o

Thus, if µ(h1) µ(h2) > 2ϵ, then the returned pair (h1, h2) corresponds to a pair in universe U that is not covered by T. Otherwise, by the optimality of (h1, h2), T covers all elements in U.

Furthermore, we note that optimization problems (1) and (2) can be implemented with access to C-ERM. We show this for program (1) and the reasoning for program (2) is

3When it is clear from context, we slightly abuse notations and say x covers (h, h ) if (h, h ) Cx.

Active Fairness Auditing

analogous. Observe that maximizing µ(h) from h H subject to constraint h(T) = ˆh(T) is equivalent to minimizing (a weighted) empirical error of h H on dataset (x, +1) : x X, x A = 0 (x, 1) : x X, x A = 1 , subject to h having zero error on {(x, ˆh(x)) : x T}.

We are now ready to present the label complexity guarantee of Algorithm 3.

Algorithm 3 Oracle-efﬁcient Active Fairness Auditing Require: Hypothesis class H, online learning oracle O with mistake bound M, constrained ERM oracle C-ERM, target error ϵ, fairness measure µ. Ensure: ˆµ, an estimate of µ(h )

1: Initialize S 2: while true do 3: ˆh O(S) 4: Let T //Computing an approximate minimum (µ, ϵ)- specifying set for ˆh 5: Initialize weights w(x) = 1 |X| and threshold τx Exponential(ln(|H|2M/δ)) //random initialization of thresholds 6: while true do 7: Use C-ERM to solve separate programs:

h1 ﬁnd max h H µ(h), s.t. h(T) = ˆh(T) (1)

h2 ﬁnd min h H µ(h), s.t. h(T) = ˆh(T) (2)

//T is an (µ, ϵ)-specifying set for ˆh 8: if µ(h1) µ(h2) 2ϵ then 9: break 10: else 11: //Add examples to T to cover (h1, h2), using the online set cover algorithm implicit in (Dasgupta et al., 2019) Determine (h1, h2) = {x X : h1(x) = ˆh(x) or h2(x) = ˆh(x)} 12: while P

x (h1,h2) w(x) 1 do 13: Double weights w(x) for all x in (h1, h2) 14: Update T x X : w(x) τx

15: Query h on T 16: S S T 17: if ˆh(T) = h (T) then 18: return 1

2(µ(h1) + µ(h2))

Theorem 3.8. If the online learning oracle O makes a total of M mistakes, then with probability 1 δ, Algorithm 3 outputs ˆµ such that ˆµ µ(h ) ϵ, with its number of

label queries bounded by:

O Cost(H)M log |H|M

δ log |X| .

The proof of Theorem 3.8 is deferred to Appendix D.5. In a nutshell, it combines the following observations. First, Algorithm 3 has at most M outer iterations using the mistake bound guarantee of oracle O. Second, for each ˆh in each inner iteration, its minimum (µ, ϵ)- specifying set has size at most Cost(H); this is based on a nontrivial connection between the optimal deterministic query complexity and (µ, ϵ)-extended teaching dimension (see Deﬁnition D.9), which we present in Lemma D.10. Third, by the O log |H|M

δ log |X| -approximation guarantee of the online set cover algorithm implicit in (Dasgupta et al., 2019), each outer iteration makes at most O Cost(H) log |H|M

δ log |X| label queries.

Remark 3.9. Via an argument similar to that in Proposition 3.5, we can show that, for computationally-efﬁcient algorithms, the approximation factor in constructing an approximately-minimum (µ, ϵ)-specifying set for ˆh cannot be signiﬁcantly improved to, say, 0.3 ln |H|.

Finally, in Appendix F, we empirically explore the performance of Algorithm 3 and active learning, and compare them with i.i.d sampling. As expected, our experiments conﬁrm that under a ﬁxed budget, Algorithm 3 is most effective at inducing a version space with a small µ-diameter, and can thus provide the strongest manipulation-proofness guarantee.

4. Statistical Limits of Estimation

In this section, we turn to direct estimation, the second of the two main guarantees one may wish to have for auditing. In particular, we focus on the statistical limits of direct estimation, which involves designing an efﬁcient auditing algorithm that can output ˆµ such that ˆµ µ(h ) ϵ with a small number of queries.

4.1. Separation between Estimation with and without Manipulation-proofness

To start, it is natural to contrast the guarantee of ϵmanipulation-proofness against ϵ-direct estimation accuracy. Indeed, if the two guarantees are one and the same, we may simply use the MP estimation algorithms for direct estimation as well.

More speciﬁcally, we look to answer the question of whether achieving MP is strictly harder, and we answer this question in the afﬁrmative. Indeed, following simple example suggests that MP estimation can sometimes require a much higher label complexity than direct estimation.

Active Fairness Auditing

Example 4.1. Let ϵ = 1 4 and n 1. X = {0, 1, . . . , n}, and x | x A = 0 Uniform({0}), and x | x A = 1 Uniform({1, . . . , n}). Let H = h : X { 1, +1} , h(0) = 1 .

First, as ϵ = 1 4, the iid sampling baseline makes O(1) queries and ensures that it estimates µ(h ) with error at most ϵ with probability 0.9.

However, for manipulation-proof estimation, at least Ω(n) labels are needed to ensure that the queried dataset S satisﬁes diamµ(H(h , S)) ϵ. Indeed, let h 1. For any unlabeled dataset S of size n/2, by the deﬁnition of H, there always exist h, h H(h , S), such that for all x {1, . . . , n} \ S, h(x) = 1 and h (x) = +1. As a re-

sult, µ(h) = 0

1 = 0, and µ(h ) = |{1,...,n}\S|

2, which implies that diamµ(H(h , S)) 1

4.2. Randomized Algorithms for Direct Estimation

The separation result above suggests that different algorithms may be needed if we are only interested in efﬁcient direct estimation. Motivated by our previous exploration, a ﬁrst question to answer is whether randomization should be a key ingredient in algorithm design. That is, can a randomized algorithm achieve query complexity smaller than that of the optimal deterministic algorithm? Through the example below, we answer this question in the afﬁrmative.

Example 4.2. Same as the setting of Example 4.1; recall that iid sampling, a randomized algorithm, estimates µ(h ) with error at most ϵ = 1

4 with probability 0.9; it has a query complexity of O(1).

In contrast, consider any deterministic algorithm A with label budget N n

2 ; we consider its interaction history with classiﬁer h0 1, which can be summarized by a sequence of unlabeled examples S = x1, . . . , x N . Now, consider an alternative classiﬁer h1 such that h1(x) = 1 on S {0}, but h1(x) = +1 on {1, . . . , n} \ S. By an inductive argument, it can be shown that the interaction history between A and h1 is also S, which implies that when the underlying hypotheses h = h0 and h = h1, A must output the same estimate ˆµ (see Lemma B.1 in Appendix B for a formal proof); however, µ(h0) µ(h1) 1

2, implying that under at least one of the two hypotheses, we must have ˆµ µ(h ) 1

In summary, in this setting, a randomized algorithm has a query complexity of O(1), much smaller than Ω(n), the optimal query complexity of deterministic algorithms.

4.3. Case Study: Non-homogeneous Linear Classiﬁers under Gaussian Populations

In this subsection, we identify a practically-motivated setting where we are able to comprehensively characterize the

minimax (randomized) active fairness auditing query complexity up to logarithmic factors. Speciﬁcally, we present a positive result in the form of an algorithm that has a query complexity of O min(d, 1

ϵ2 ) as well as a matching lower bound that shows any (possibly randomized) algorithm must have a query complexity of Ω min(d, 1

Example 4.3. Let d 2 and X = Rd. x | x A = 0 N(m0, Σ0), whereas x | x A = 1 N(m1, Σ1). Let hypothesis class Hlin = ha,b(x) := sign( a, x + b) : a Rd, b R be the class of non-homogenenous linear classiﬁers.

Recall that i.i.d sampling has a label complexity of O 1

ϵ2 . On the other hand, through a membership query-based active learning algorithm (Algorithm 6 in Appendix E.2), we can approximately estimate µ(h ) (up to scaling) by doing d-binary searches, using active label queries. This approach incurs a total label complexity of O(d). Choosing the better of these two algorithms gives an active fairness auditing strategy of label complexity O min(d, 1

We only present the main idea of Algorithm 6 here, with its full analysis deferred to Appendix E.2. Its core component is Algorithm 4 below, which label-efﬁciently estimates γ(h ) = Px N(0,Id)(h (x) = +1), with blackbox label queries to h (x) = sign( a , x + b ). Algorithm 4 is based on the following insights. First, observe that γ(h ) = Φ b a 2

=: Φ(sr), where Φ is the stan-

dard normal CDF, s := sign(b ), and r := q

1 Pd i=1 m 2 i ,

for mi := b

a i . On the one hand, s can be easily obtained

by querying h on 0 (line 2). On the other hand, estimating r can be reduced to estimating each mi. However, some mi s can be unbounded, which makes their estimation challenging. To get around this challenge, we prove the following lemma, which shows that it sufﬁces to accurately estimate those mi s that are not unreasonably large (i.e. mi s for i S, deﬁned below):

Lemma 4.4. Let α := q

2d 5 2 (ln 1

ϵ ) 3 4 ( 1

ϵ ) 1 2 . Suppose r α. If there is some S [d], such that:

1. for all i / S,|mi| β,

2. for all i S, | ˆmi mi| ϵ;

i S ˆm 2 i r 2ϵ.

Algorithm 4 carefully utilizes this lemma to estimate r. First, it tests whether for all i, h (αei) = h ( αei); if yes, for

all i, |mi| α, and r q

ϵ , and γ(h ) is ϵ-close to 0 or 1 depending on the value of s (line 5). Otherwise, it

Active Fairness Auditing

must be the case that r α. In this case, we go over each coordinate i, ﬁrst testing whether|mi| β (line 8); if no, we skip this coordinate (do not add it to S); otherwise, we include i in S and estimate mi to precision ϵ using binary search (line 11). By the guarantees of Lemma 4.4, we have|sˆr sr| 2ϵ, which, by the 1

2π-Lipschitzness of Φ,

implies that ˆγ γ(h ) ϵ. The total query complexity of Algorithm 4 is 1 + 2d + 2d + d log2 β

Algorithm 4 ESTIMATE-POSITIVE: A label efﬁcient estimation algorithm for γ(h ) for non-homogenoeus linear classiﬁers Require: query access to h Hlin, target error ϵ. Ensure: ˆγ such that ˆγ γ(h ) ϵ.

1: Let α = q

ϵ , β = 2d 5 2 (ln 1

ϵ ) 3 4 ( 1

2: s Query h on 0 3: Query h on ραei : ρ { 1} , i [d]

4: if for all i [d], h (αei) = h ( αei) then 5: return 1 if s = +1, 0 if s = 1 //Otherwise, r α = q

ϵ 6: S 7: for i = 1, . . . , d do 8: Query h on βei and βei 9: if h (βei) = h ( βei) then 10: S S {i} //Use binary search to obtain ˆmi, an estimate of mi = b

a i with precision ϵ

11: ˆmi BINARY-SEARCH(i, β, ϵ) (Algorithm 5)

i S ˆm 2 i //ˆr is an estimate of r

13: return Φ(sˆr)

Algorithm 5 BINARY-SEARCH Require: i, β such that h (βei) = h ( βei), precision ϵ Ensure: m, an ϵ-accurate estimate of mi = b

ai 1: u β, l β 2: while u l ϵ do 3: m u+l

2 4: Query h on mei 5: if h (mei) = h (lei) then 6: l m 7: else 8: u m 9: return m

For the lower bound, we formulate a hypothesis testing problem, such that under hypotheses H0 and H1, the µ(h ) values are approximately ϵ-separated. This is used to show that any active learning algorithm with label query budget Ω min(d, 1

ϵ2 ) cannot effectively distinguish H0 and H1. Our construction requires a delicate analysis on the KL divergence between the observation distributions under the

two hypotheses, and we refer the readers to Theorem E.3 for details.

4.4. General Distribution-Free Lower Bounds

Finally, in this subsection, we move beyond the Gaussian population setting and derive general query complexity lower bounds for randomized estimation algorithms that audit general hypothesis classes with ﬁnite VC dimension d. This result suggests that, when d 1 ϵ2 , or equivalently ϵ 1

d, there exists some hard data distribution and target classiﬁer in H, such that active fairness auditing has a query complexity lower bound of Ω( 1

ϵ2 ). Put another way, iid sampling is near-optimal. Theorem 4.5 (Lower bound for randomized auditing). Fix ϵ (0, 1

40] and a hypothesis class H with VC dimension d 1600. For any (possibly randomized) algorithm A with label budget N O(min(d, 1

ϵ2 )), there exists a distribution DX over X and h H, such that A s output ˆµ when interacting with h , satisﬁes:

P ˆµ µ(h ) > ϵ > 1

The proof of Theorem 4.5 can be found at Appendix E.1. The lower bound construction follows from a similar setting as in Example 4.1, except that we now choose h in a randomized fashion.

5. Conclusion

In this paper, we initiate the theoretical study of queryefﬁcient algorithms that audit ML models. We focus on auditing demographic parity, one of the canonical fairness notions. We investigate the natural auditing guarantee of direct estimation accuracy, and introduce a new guarantee based on the possibility of post-audit manipulation: manipulationproofness. We identify an optimal deterministic algorithm, a matching randomized algorithm and develop upper and lower bounds that mark the performance that any optimal auditing algorithm must meet. Our ﬁrst exploration of active fairness estimation seeks to provide a more complete picture of the theory of auditing. A natural next direction is to explore guarantees for other fairness notions such as equalized odds. Indeed, how does one construct queryefﬁcient algorithms when µ is a function of both h (x) and y? Another natural question, motivated by the connection to disagreement-based active learning, is to design active fairness auditing algorithms based on some notion of disagreement with respect to µ.

Acknowledgments. We thank Stefanos Poulis for sharing the implementation of the black-box teaching algorithm of Dasgupta et al. (2019), and special thanks to Steve Hanneke and Sanjoy Dasgupta for helpful discussions. We also thank the anonymous ICML reviewers for their feedback.

Active Fairness Auditing

Agarwal, A., Beygelzimer, A., Dud ık, M., Langford, J., and Wallach, H. A reductions approach to fair classiﬁcation. In International Conference on Machine Learning, pp. 60 69. PMLR, 2018.

Alon, N., Awerbuch, B., Azar, Y., Buchbinder, N., and Naor, J. The online set cover problem. SIAM Journal on Computing, 39(2):361 370, 2009.

Angluin, D. Queries and concept learning. Machine learning, 2(4):319 342, 1988.

Balcan, M.-F. and Long, P. Active and passive learning of linear separators under log-concave distributions. In Conference on Learning Theory, pp. 288 316. PMLR, 2013.

Balcan, M.-F., Blais, E., Blum, A., and Yang, L. Active property testing. In 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, pp. 21 30. IEEE, 2012.

Ben-David, S., P al, D., and Shalev-Shwartz, S. Agnostic online learning. In COLT, volume 3, pp. 1, 2009.

Bertsimas, D. and Vempala, S. Solving convex programs by random walks. Journal of the ACM (JACM), 51(4): 540 556, 2004.

Blais, E., Ferreira Pinto Jr, R., and Harms, N. Vc dimension and distribution-free sample-based testing. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pp. 504 517, 2021.

Blanc, G., Gupta, N., Lange, J., and Tan, L.-Y. Estimating decision tree learnability with polylogarithmic sample complexity. Advances in Neural Information Processing Systems, 33, 2020.

Blum, A. and Hu, L. Active tolerant testing. In Conference On Learning Theory, pp. 474 497. PMLR, 2018.

Cohn, D., Atlas, L., and Ladner, R. Improving generalization with active learning. Machine learning, 15(2): 201 221, 1994.

Dasgupta, S. Analysis of a greedy active learning strategy. Advances in neural information processing systems, 17: 337 344, 2005a.

Dasgupta, S. Coarse sample complexity bounds for active learning. In NIPS, volume 18, pp. 235 242, 2005b.

Dasgupta, S., Hsu, D. J., and Monteleoni, C. A general agnostic active learning algorithm. Advances in Neural Information Processing Systems, 20:353 360, 2007.

Dasgupta, S., Hsu, D., Poulis, S., and Zhu, X. Teaching a black-box learner. In International Conference on Machine Learning, pp. 1547 1555. PMLR, 2019.

Dicker, L. H. Variance estimation in high-dimensional linear models. Biometrika, 101(2):269 284, 2014.

Feige, U. A threshold of ln n for approximating set cover. Journal of the ACM (JACM), 45(4):634 652, 1998.

Goldman, S. A. and Kearns, M. J. On the complexity of teaching. Journal of Computer and System Sciences, 50 (1):20 31, 1995.

Goldreich, O., Goldwasser, S., and Ron, D. Property testing and its connection to learning and approximation. Journal of the ACM (JACM), 45(4):653 750, 1998.

Goldwasser, S., Rothblum, G. N., Shafer, J., and Yehudayoff, A. Interactive proofs for verifying machine learning. In 12th Innovations in Theoretical Computer Science Conference (ITCS 2021). Schloss Dagstuhl-Leibniz-Zentrum f ur Informatik, 2021.

Hanneke, S. The cost complexity of interactive learning. Unpublished manuscript, 2006.

Hanneke, S. Teaching dimension and the complexity of active learning. In International Conference on Computational Learning Theory, pp. 66 81. Springer, 2007.

Hanneke, S. Rates of convergence in active learning. The Annals of Statistics, pp. 333 361, 2011.

Hanneke, S. Theory of active learning. Foundations and Trends in Machine Learning, 7(2-3), 2014.

Heged us, T. Generalized teaching dimensions and the query complexity of learning. In Proceedings of the eighth annual conference on Computational learning theory, pp. 108 117, 1995.

Hotten, R. Volkswagen: The scandal explained. BBC News, 2015. URL https://www.bbc.com/news/ business-34324772.

Hsu, D. J. Algorithms for active learning. Ph D thesis, UC San Diego, 2010.

Huang, T.-K., Agarwal, A., Hsu, D. J., Langford, J., and Schapire, R. E. Efﬁcient and parsimonious agnostic active learning. Advances in Neural Information Processing Systems, 28, 2015.

Kong, W. and Valiant, G. Estimating learnability in the sublinear data regime. Advances in Neural Information Processing Systems, 31:5455 5464, 2018.

Active Fairness Auditing

Laber, E. S. and Nogueira, L. T. On the hardness of the minimum height decision tree problem. Discrete Applied Mathematics, 144(1-2):209 212, 2004.

Larson, J., Mattu, S., Kirchner, L., and Angwin, J. How we analyzed the compas recidivism algorithm. Pro Publica (5 2016), 9(1):3 3, 2016.

Littlestone, N. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine learning, 2(4):285 318, 1988.

Mc Carthy, D. To regulate ai, try playing in a sandbox. Emerging Tech Brew, 2021. URL https://www.morningbrew.com/ emerging-tech/stories/2021/05/26/ regulate-ai-just-play-sandbox.

Mitchell, T. M. Generalization as search. Artiﬁcial intelligence, 18(2):203 226, 1982.

Rastegarpanah, B., Gummadi, K., and Crovella, M. Auditing black-box prediction models for data minimization compliance. Advances in Neural Information Processing Systems, 34, 2021.

Ron, D. Property testing: A learning theory perspective. Now Publishers Inc, 2008.

Sabato, S., Sarwate, A. D., and Srebro, N. Auditing: active learning with outcome-dependent query costs. In Proceedings of the 26th International Conference on Neural Information Processing Systems-Volume 1, pp. 512 520, 2013.

Valiant, L. G. A theory of the learnable. Communications of the ACM, 27(11):1134 1142, 1984.

Xu, Z., Yu, T., and Sra, S. Towards efﬁcient evaluation of risk via herding. Negative Dependence: Theory and Applications in Machine Learning, 2019.

Zhang, C. and Chaudhuri, K. Beyond disagreement-based agnostic active learning. Advances in Neural Information Processing Systems, 27, 2014.

Active Fairness Auditing

A. Additional Related Works

Property Testing: Our notion of auditing that leverages knowledge of H is similar in theme to the topic of property testing (Goldreich et al., 1998; Ron, 2008; Balcan et al., 2012; Blum & Hu, 2018; Blanc et al., 2020; Blais et al., 2021) which tests whether h is in H, or h is far away from any classiﬁer in H, given query access to h . These works provide algorithms with testing query complexity of lower order than sample complexity for learning with respect to H, for speciﬁc hypothesis classes such as monomials, DNFs, decision trees, linear classiﬁers, etc. Our problem can be reduced to property testing by testing whether h is in h H : µ(h) [iϵ, (i + 1)ϵ] for all i 0, 1, . . . , 1

ϵ ; however, to the best of our knowledge, no such result is known in the context of property testing.

Feature Minimization Audits: Rastegarpanah et al. (2021) study another notion of auditing, focusing on assessing whether the model is trained inline with the GDPR s Data Minimization principle. Speciﬁcally, this work evaluates the necessity of each individual feature used in the ML model, and this is done by imputing each feature with constant values and checking the extent of variation in the predictions. One commonality with our work, and indeed across all auditing works, is the concern with minimizing the number queries needed to conduct the audit.

Herding for Sample-efﬁcient Mean Estimation: Additionally, the estimation of DP may be viewed as estimating the difference of two means. Viewed in this light, herding (Xu et al., 2019) offers a way to use non-iid sampling to more efﬁciently estimate means. However, the key difference needed in herding is that h , whose output is { 1, 1}, may be well-approximated by w, φ(x) for some mapping φ known apriori.

Comparison with Sabato et al. (2013): Lastly, Sabato et al. (2013) also uses the term auditing in the context of active learning with outcome-dependent query costs; although the term auditing is shared, our problem settings are completely different: (Sabato et al., 2013) focuses on active learning the model h as opposed to just estimating µ(h ).

B. A General Lemma on Deterministic Query Learning

In this section, we present a general lemma inspired by Hanneke (2007), which are used in our proofs for establishing lower bounds on deterministic active fairness auditing algorithms.

Lemma B.1. If an deterministic active auditing algorithm A with label budget N interacts with labeling oracle that uses classiﬁer h0, and generates the following interaction history: (x1, h0(x1)), (x2, h0(x2)), . . . , (x N, h0(x N)) , and there exists a classiﬁer h1 such that h1(x) = h0(x) for all x {x1, . . . , x N}. Then A, when interacting with h1, generates the same interaction history, and outputs the same auditing estimate; formally, SA,h1 = SA,h0 and ˆµA,h1 = ˆµA,h0.

Proof. Recall from Section 1.1 that deterministic active auditing algorithm A can be viewed as a sequence of N + 1 functions f1, f2, . . . , f N, g, where {fi}N i=1 are the label query function used at each iteration, and g is the ﬁnal estimator function. We show by induction that for steps i = 0, 1, . . . , N, the interaction histories of A with h0 and h1 agree on their ﬁrst i elements.

Base case. For step i = 0, both interaction histories are empty and agree trivially.

Inductive case. Suppose that the statement holds for step i, i.e. A, when interacting with both h0 and h1, generates the same set of labeled examples Si = (x1, y1), . . . , (xi, yi) ,

up to step i.

Now, at step i + 1, A applies the query function fi+1 and queries the same example xi+1 = fi+1(Si). By assumption of this lemma, h1(xi+1) = h0(xi+1), which implies that the (i + 1)-st labeled example obtained when A interacts with h1, (xi+1, h1(xi+1)) is identical to (xi+1, h1(xi+1)), the (i + 1)-st example when A interacts with h0. Combined with the inductive hypotheses that the two histories agree on the ﬁrst i examples, we have shown that A, when interacting with h0 and h1, generates the same set of labeled examples

Si+1 = (x1, y1), . . . , (xi, yi), (xi+1, yi+1)

up to step i + 1.

This completes the induction.

Active Fairness Auditing

As the interaction histories A with h0 and h1 are identical, the unlabeled data part of the history are identical, formally, SA,h1 = SA,h0. In addition, as in both interactive processes, A applies deterministic function g to the same interaction history of length N to obtain estimate ˆµ, we have ˆµA,h1 = ˆµA,h0.

C. Deferred Materials from Section 1

The following lemma formalizes the idea that PAC learning with O(ϵ) error is sufﬁcient for fairness auditing, given that p = min Pr DX(x A = 0), Pr DX(x A = 1) is Ω(1).

Lemma C.1. If h is such that P(h(x) = h (x)) α, then µ(h) µ(h ) α

Proof. First observe that Pr(h(x) = +1 | x A = 0) Pr(h (x) = +1 | x A = 0)

Pr(h(x) = h (x) | x A = 0)

=Pr(h(x) = h (x), x A = 0)

Pr(x A = 0)

Pr(h(x) = h (x), x A = 0)

where the ﬁrst inequality is by triangle inequality; the second inequality is by the deﬁnition of p. Symmetrically, we have Pr(h(x) = +1 | x A = 1) Pr(h (x) = +1 | x A = 1) Pr(h(x) =h (x),x A=1)

p . Adding up the two inequalities, we have:

Pr(h(x) = +1 | x A = 0) Pr(h (x) = +1 | x A = 0) + Pr(h(x) = +1 | x A = 1) Pr(h (x) = +1 | x A = 1)

Pr(h(x) = h (x), x A = 0)

p + Pr(h(x) = h (x), x A = 1)

=Pr(h(x) = h (x))

D. Deferred Materials from Section 3

D.1. Proof of Theorems 3.1 and 3.2

Proof of Theorem 3.1. Suppose Algorithm 1 (denoted as A throughout the proof) interacts with some target classiﬁer h H.

We will show the following claim: at any stage of A, if the set of labeled examples L shown so far induces a version V = H[L], then A will subsequently query at most Cost(V ) more labels before exiting the while loop.

Note that Theorem 3.1 follows from this claim by taking L = and V = H: after Cost(H) label queries, it exits the while loop, which implies that, the queried unlabeled examples SA,h induces version space V = H(h , SA,h ) with

max h V µ(h) min h V µ(h) = diamµ(V ) 2ϵ.

Also, note that h V ; this implies that µ(h ) [minh V µ(h), maxh V µ(h)]. Combining these two observations, we have ˆµ µ(h ) 1

max h V µ(h) min h V µ(h) ϵ.

We now come back to proving this claim by induction on Cost(V ).

Base case. If Cost(V ) = 0, then A immediately exits the while loop without further label queries.

Active Fairness Auditing

Inductive case. Suppose the claim holds for all V such that Cost(V ) n. Now consider a version space V with Cost(V ) = n + 1. In this case, ﬁrst recall that

Cost(V ) = 1 + min x X max y { 1,+1} Cost (V y x ) ,

i.e. minx X maxy { 1,+1} Cost (V y x ) = Cost(V ) 1 = n. Also, recall that by the deﬁnition of Algorithm 1, when facing version space V , the next query example x0 chosen by A is a solution of the following minimax optimization problem:

x0 = argmin x X max y { 1,+1} Cost (V y x ) ,

which implies that maxy { 1,+1} Cost (V y x ) = n. Speciﬁcally, this implies that the version space at the next iteration,

V h , {x0} = V h (x0) x0 , satisﬁes that Cost(V h , {x0} ) n. Combining with the inductive hypothesis, we have seen that after a total of 1 + Cost(V h , {x0} ) n + 1 = Cost(V ) number of label queries, A will exit the while loop.

This completes the inductive proof of the claim.

Proof of Theorem 3.2. Fix a deterministic active fairness auditing algorithm A. We will show the following claim: If A has already obtained an ordered sequence of labeled examples L, and has a remaining label budget N Cost(H[L]) 1, then there exists h H[L], such that, A, when interacting with h as the target classiﬁer:

1. obtains a sequence of labeled examples L in the ﬁrst |L| rounds;

2. has ﬁnal version space H(h, SA,h) with µ-diameter > 2ϵ.

The theorem follow from this claim by taking L = . To see why, we let h H[ ] = H be the classiﬁer described in the claim. First, note that there exists some other classiﬁer h = h in the ﬁnal version space H(h, SA,h), such that µ(h ) µ(h) > 2ϵ. For such h , h (SA,h) = h(SA,h). Therefore, by Lemma B.1, SA,h = SA,h (which we denote by S subsequently), and h and h have the exact same labeling on S, and ˆµA,h = ˆµA,h . This implies that, for A, at least one of the following must be true: ˆµA,h µ(h) > ϵ or ˆµA,h µ(h ) > ϵ,

showing that it does not guarantee an estimation error ϵ under all target h H.

We now turn to proving the above claim by induction on A s remaining label budget N. In the following, denote by V = H[L].

Base case. If N = 0 and Cost(V ) 1, then A at this point has zero label budget, which means that it is not allowed to make more queries. In this case, SA,h = L, and H(SA,h, h) = V . As Cost(V ) 1, we know that

max h1,h2 H(h,SA,h)

µ(h1) µ(h2) = max h1,h2 V

µ(h1) µ(h2) > 2ϵ.

This completes the proof of the base case.

Inductive case. Suppose the claim holds for all N n. Now, suppose in the learning process, A has a remaining label budget N = n + 1, and has obtained labeled examples L such that V = H[L] satisﬁes Cost(V ) n + 2. Let x be the next example A queries. By the deﬁnition of Cost, there exists some y { 1, +1}, such that

Cost H h L (x, y) i = Cost (V y x ) Cost(V ) 1 n + 1,

and after making this query, the learner has a remaining label budget of N 1 = n.

By inductive hypothesis, there exists some h H h L (x, y) i , such that when A interacts with h subsequently (with

obtained labeled examples L (x, y) and label budget < n), the ﬁnal unlabeled dataset SA,h satisﬁes

diamµ H(h, SA,h) = max h1,h2 H(h,SA,h)

µ(h1) µ(h2) > 2ϵ.

In addition, when interacting with h, A obtains the example sequence L, (x, y) in its ﬁrst |L| + 1 rounds of interaction, which implies that it obtains the example sequence L in its ﬁrst |L| rounds of interaction with h. This completes the induction.

Active Fairness Auditing

D.2. Proof Sketch of Proposition 3.3

Proof sketch. Let S1 and S2 be O 1

ϵ2 ln |H| i.i.d samples from DX | x A = 1 and DX | x A = 0, respectively. Deﬁne

ˆµ(h, S1, S2) = Prx S1(h(x) = +1) Prx S2(h(x) = +1).

Hoeffding s inequality and union bound guarantees that with probability at least 1

2, h H, |ˆµ(h, S1, S2) µ(h)| ϵ. Now consider the following deterministic algorithm A:

Let n = O 1

ϵ2 ln |H| ; Find (the lexicographically smallest) S1 and S2 in X n, such that

h H, ˆµ(h, S1, S2) µ(h) ϵ. (3)

This optimization problem is feasible, because as we have seen, a random choice of S1, S2 makes Equation (3) happen with nonzero probability. Return ˆµ(h , S1, S2) with 2n label queries to examples in S1 S2.

By its construction, A queries 2n = O 1

ϵ2 ln |H| labels and returns ˆµ that is ϵ-close to µ(h ).

D.3. Proof of Proposition 3.4

Before we prove Proposition 3.4, we ﬁrst recall the well-known Bernstein s inequality:

Lemma D.1 (Bernstein s inequality). Given a set of iid random variables Z1, . . . , Zn with mean µ and variance σ2; in addition, |Zi| b almost surely. Then, with probability 1 δ,

δ n + b ln 2

Proof of Proposition 3.4. We will analyze Algorithm 2, a derandomized version of the Phased CAL algorithm (Hsu, 2010, Chapter 2). To prove this proposition, using Theorem 3.2, it sufﬁces to show that Algorithm 2 has a deterministic label complexity bound of O θ(ϵ) ln |H| ln 1

We ﬁrst show that for every n, the optimization problem in line 5 is always feasible. To see this, observe that if we draw Sn = {x1, . . . , xmn} as sample of size mn drawn iid from DX, we have:

1. By Bernstein s inequality with Zi = I(xi DIS(Vn)), with probability 1 1

Pr Sn(x DIS(Vn)) Pr DX(x DIS(Vn)) +

2 Pr DX(x DIS(Vn)) ln 8

2 Pr DX(x DIS(Vn)) + ln 8

where the second inequality uses Arithmetic Mean-Geometric Mean (AM-GM) inequality.

2. By Bernstein s inequality and union bound over h, h H, we have with probability 1 1

h, h H : Pr DX(h(x) = h (x)) Pr Sn(h(x) = h (x)) +

4 Pr DX(h(x) = h (x)) ln |H|

mn + 4 ln |H|

h, h H : Pr Sn(h(x) = h (x)) = 0 = Pr DX(h(x) = h (x)) 16 ln |H|

Active Fairness Auditing

By union bound, with nonzero probability, the above two condition hold simultaneously, showing the feasibility of the optimization problem.

We then argue that for all n, Vn+1 B(h , 16 ln |H|

mn ). This is because for all h Vn+1, it and h are both in Vn and therefore they agree on Sn \ Tn; on the other hand, h and h agree on Tn by the deﬁnition of of Vn+1. As a consequence, Pr Sn(h(x) = h (x)) = 0, which implies that Pr DX(h(x) = h (x)) 16 ln |H|

mn . As a consequence, for all h VN+1,

Pr(h(x) = h (x)) 16 ln |H|

m N pϵ, implying that µ(h) µ(h ) ϵ (recall Lemma C.1).

We now turn to upper bounding Algorithm 2 s label complexity:

n=1 mn (2 Pr DX(x DIS(Vn)) + ln 8

n=1 mn (θ(ϵ) 16 ln |H|

O θ(ϵ) ln |H| ln 1

where the inequality uses the observation that for every n [N],

Pr DX(x DIS(Vn)) Pr DX

x DIS(B(h , 16 ln |H|

2 ) 16 ln |H|

mn θ(ϵ) 16 ln |H|

where the second inequality is from the deﬁnition of disagreement coefﬁcient (recall Section 1.1), and the last inequality is from a basic property of disagreement coefﬁcient (Hanneke, 2014, Corollary 7.2).

D.4. Proof of Proposition 3.5

We ﬁrst prove the following theorem that gives a decision tree-based characterization of the Cost( ) function. Connections between active learning and optimal decision trees have been observed in prior works (e.g. Laber & Nogueira, 2004; Balcan et al., 2012).

Deﬁnition D.2. An example-based decision tree T for (instance domain, hypothesis set) pair (X, V ) is such that:

1. T s internal nodes are examples in X; every internal node has two branches, with the left branch labeled as +1 and the right labeled as 1.

2. Every leaf l of T corresponds to a set of classiﬁers Vl V , such that all h Vl agree with the examples that appear in the root-to-leaf path to l. Formally, suppose the path from the root to leaf l is an alternating sequence of examples and labels x1, y1, . . . , xn, yn , then for every i [n], h(xi) = yi.

Deﬁnition D.3. Fix DX. An example-based decision tree T is said to (µ, ϵ)-separate a hypothesis set V , if for every leaf l of T , Vl satisﬁes diamµ(Vl) 2ϵ.

Theorem D.4. Given a version space V , Cost(V ) is the minimum depth of all decision trees that (µ, ϵ)-separates V .

Proof. We prove the theorem by induction on Cost(V ).

Base case. If Cost(V ) = 0, then diamµ(V ) 2ϵ. Then there exists a trivial decision tree (with leaf only) of depth 0 that (µ, ϵ)-separates V , which is also the smallest depth possible.

Inductive case. Suppose the statement holds for any V such that Cost(V ) = n. Now consider V such that Cost(V ) = n + 1.

1. We ﬁrst show that there exists a decision tree of depth n + 1 that (µ, ϵ)-separates V . Indeed, pick x = argminx X maxy Cost(V y x ).

Active Fairness Auditing

With this choice of x, we have both Cost(V 1 x ) and Cost(V +1 x ) are equal to n. Therefore, by inductive hypothesis for V 1 x and V +1 x , we can construct decision trees T and T + of depths n that (µ, ϵ)-separate the two hypothesis classes respectively. Now deﬁne T to be such that it has root node x, and has left subtree T + and right subtree T , we see that T has depth n + 1 and (µ, ϵ)-separates V .

2. We next show that any decision tree of depth n does not (µ, ϵ)-separate V . Indeed, assume for the sake of contradiction that such tree T exists. Then consider the example x at the root of the tree; by the deﬁnition of Cost, one of Cost(V 1 x ) and Cost(V +1 x ) must be n. Without loss of generality, assume that V = V +1 x is such that Cost(V ) n. Therefore, there must exists some subset V V such that Cost(V ) = n. Applying the inductive hypothesis on V , no decision tree of depth n 1 can (µ, ϵ)-separate V . This contradicts with the observation that the left subtree of T , which is of depth n 1, (µ, ϵ)-separates V .

We now restate a more precise version of Proposition 3.5. First we deﬁne the computational task of computing a 0.3 ln(|H|)- approximation of Cost(H) by the following problem:

Problem Minimax-Cost (MC): Input: instance space X, hypothesis class H, data distribution DX, precision parameter ϵ. Output: a number L such that Cost(H) L 0.3 ln(|H|)Cost(H).

Proposition D.5 (Proposition 3.5 restated). If there is an algorithm that solves Minimax-Cost in poly(|H|, |X|, 1/ϵ) time, then NP TIME(n O(log log n)).

Proof of Proposition D.5. Our proof takes after (Laber & Nogueira, 2004) s reduction from set cover (SC) to Decision Tree Problem (DTP). Here, we reduce from SC to the Minimax-Cost problem (MC), i.e. computing Cost(H) for a given hypothesis class H, taking into account the unique structure of active fairness auditing. Speciﬁcally, the following gap version of SC s decision problem has been shown to be computationally hard4:

Problem Gap-Set-Cover (Gap-SC): Input: a universe U = {u1, ..., un} of size n with n 10, and a family of subsets C = {C1, ..., Cm}, and an integer k, such that either of the following happens:

Case 1: OPTSC k, Case 2: OPTSC 0.99k ln n,

where OPTSC denotes the minimum set cover size of (U, C).

Output: 1 or 2, which case the instance is in.

Speciﬁcally, it is well-known that obtaining a polynomial time algorithm for the above decision problem5 on minimum set cover would imply that NP TIME(n O(log log n)) (Feige, 1998), which is believed to be false.

To start, recall that an instance of Gap-SC problem ISC = (U, C, k); an instance of the MC problem IMC = (H, X, DX, ϵ).

With this, we deﬁne a coarse reduction β that constructs a MC-instance from a Gap-SC instance with universe U = {u1, ..., un} and sets C = {C1, ..., Cm}, which will be reﬁned shortly:

1. Let H = {h0, h1, . . . , hn}, where h0(x) 1 always, and for all j [n], hj corresponds to uj (the deﬁnitions of hj s will be given shortly).

2. Create example x0 such that for all h H, h(x0) = 1.

3. For every i [m], create basis example xi to correspond to Ci such that for every j [n], hj(xi) = 1 iff uj Ci.

4The deﬁnition of Gap-SC requires that n 10, which is without loss of generality: all Gap-SC instances with n < 10 are solvable in constant time. 5The constant 0.99 can be changed to any constant < 1 (Feige, 1998).

Active Fairness Auditing

4. For each set Ci, create |Ci| 1 auxiliary x s as follows: Given set Ci with |Ci| = si that corresponds to {hi1, .., hisi}, create a balanced binary tree Ti with each leaf corresponding to a hij. Create an auxiliary example associated with each internal node in Ti as follows: for each internal node in the tree, deﬁne the corresponding auxiliary sample x such that its label is +1 under all the classiﬁers in the leaves of the subtree rooted at its left child, and its label is 1 under all remaining classiﬁers in H. The total number of auxiliary x s is m (n 1).

5. Deﬁne X as the union of the example sets constructed in the above three items, which has at most N mn + 1 examples. Deﬁne DX to be such that: x | x A = 0 Uniform(X \ {x0}), and x | x A = 1 Uniform({x0}), and set ϵ = 1/(2N). With this setting of ϵ, for every h H such that h = h0, µ(h) µ(h0) = Pr(h(x) = +1 | x A = 0) Pr(h0(x) = +1 | x A = 0) 1 N 1 > 2ϵ.

Recall that OPTSC is deﬁned as the size of an optimal solution for SC instance (U, C); we let OPTMC denote the height of the tree corresponding to the optimal query strategy for the MC instance IMC obtained through reduction β. We have the following result:

Lemma D.6. OPTSC OPTMC OPTSC + max C C log |C|.

Proof. Let k = OPTSC. We show the two inequalities respectively.

1. By Theorem D.4, it sufﬁces to show that any example-based decision tree T that (µ, ϵ)-separates H must have depth at least k. To see this, ﬁrst note that by item 5 in the reduction β and the deﬁnition of (µ, ϵ)-separation, the leaf in T that contains h0 must not contain other hypotheses in H. In addition, as h0 1, h0 must lie in the rightmost leaf of T .

Now to prove the statement, we know that the examples along the rightmost path of T corresponds to a collection of sets that form a set cover of C. It sufﬁces to show that this set cover has size no greater than the set cover of ISC. This is because the examples along the rightmost path are either xi s, which correspond to some set in C, or auxiliary examples which correspond to some subset of a set in C. A set cover instance with U and C where C comprises of sets from C and subsets of sets from C will not have a smaller set cover.

Therefore, the length of the path from the root to the rightmost leaf is at least k, the size of the smallest set cover of the original SC instance ISC.

2. Let an optimal solution for ISC be G = {i1, ..., ik}. Below, we construct an example-based decision tree T of depth k + max C C log |C| that (µ, ϵ)-separates H:

Let the rightmost path of T contain nodes corresponding to xi1, ..., xik (the order of these are not important). At level l = 1, ..., k, the left subtree of xil is deﬁned to be Til as deﬁned in step 4 of reduction β. Note that this may result in T with potentially empty leaves, in that for some h covered by multiple xil s, it only appears in xio where o = min l : h(xil) = +1 .

We will prove that by the above construction, T (µ, ϵ)-separates H, as every leaf corresponds to a version space V that is a singleton set (and thus has diamµ(V ) = 0 2ϵ):

(a) For all but the rightmost leaf, this holds by the construction of Til s. (b) For the rightmost leaf, we will show that only h0 is in the version space. Since G is a set cover, we have that k l=1Cil = U. Therefore, j [n], l [k] such that uj Cil hj(xil) = 1 by construction. This implies that the all zero labeling of xi1, ..., xik can only correspond to h0. Therefore, the version space at the rightmost leaf V satisﬁes |V | = {h0}.

Recall from Theorem D.4 that the depth of T upper bounds OPTMC. T s maximum root to leaf path is of length at most k + max C C log |C|.

Built from β, we now construct an improved gap preserving reduction β , deﬁned as follows. Given any Gap-SC instance ISC = (U, C, k) with universe U = {u1, ..., un} and sets C = {C1, ..., Cm}:

1. Take constant z = log n. Construct a Gap-SC instance ISC,z = (U z, Cz, kz), containing z copies of the original set covering instance: U z = {u1 1, . . . , u1 n, . . . , uz 1, . . . , uz n}, Cz = {C1, . . . , Czm}, where C(p 1)m+i = {up i1, . . . , up isi} for p [z], i [m]. Note that OPTSC,z = k OPTSC.

Active Fairness Auditing

2. Apply reduction β to obtain IMC,z from ISC,z.

Now, we will argue that β is a gap-preserving reduction:

1. Suppose the original Gap-SC instance ISC = (U, C, k) is in case 1, i.e., OPTSC k. Then, OPTSC,z kz. By Lemma D.6, OPTMC,z kz + max C Cz log |C| kz + log n z(k + 1) 2zk.

2. Suppose the original Gap-SC instance ISC = (U, C, k) is in case 2, i.e., OPT 0.99k ln n. Then, OPTSC,z 0.99zk ln n, which by Lemma D.6, yields that OPTMC,z 0.99zk ln n.

Now suppose that there exists an algorithm A that solves the MC problem in poly(|H|, |X|, 1

ϵ ) time. We propose the following algorithm A that solves the Gap-SC problem in polynomial time, which, as mentioned above, implies that NP TIME(n O(log log n)):

Input: ISC = (U, C, k).

Apply β on ISC to obtain an instance of MC, IMC,z Let L A(IMC,z). Output 1 if L 0.7zk ln n, and 2 otherwise.

Correctness. As seen above, if ISC is in case 1, then OPTMC,z 2zk. For n 10, by the guarantee of A, L 0.3 ln |H| OPTMC,z 0.6 ln(n log n) zk 0.7zk ln n, and A outputs 1. Otherwise, ISC is in case 2, then OPTMC,z 0.99zk ln n, and by the guarantee of A, L 0.99zk ln n > 0.7zk ln n, and A outputs 2.

Time complexity. In IMC,z, |X| (mz nz + 1) = O(mn log2 n), |H| = nz = n log n, and ϵ = 1 2N = 1 2(mz nz+1) = Ω( 1 mn log2 n). As A runs in time O(poly(|X|, |H|, 1

ϵ )), A runs in time O(poly(m, n)).

D.5. Deferred Materials for Section 3.2

D.5.1. (µ, ϵ)-SPECIFYING SET, (µ, ϵ)-TEACHING DIMENSION AND THEIR PROPERTIES

The following deﬁnitions are inspired by the teaching and exact active learning literature (Heged us, 1995; Hanneke, 2007).

Deﬁnition D.7 ((µ, ϵ)-specifying set). Fix hypothesis class H and any function h : X Y,6 a set of unlabeled examples S is said to be a (µ, ϵ)-specifying set for h and H, if h1, h2 H(h, S) |µ(h1) µ(h2)| 2ϵ.

Deﬁnition D.8 ((µ, ϵ)-extended teaching dimension). Fix hypothesis class H and any function h : X Y, deﬁne t(h, H, µ, ϵ) as the size of the minimum (µ, ϵ)-specifying set for h and H, i.e. it is the optimal solution of the following optimization problem (OP-h):

min |S|, s.t. h1, h2 H(h, S) |µ(h1) µ(h2)| 2ϵ

Deﬁnition D.9. We deﬁne the µ-extended teaching dimension XTD(H, µ, ϵ) := maxh:X Y t(h, H, µ, ϵ).

The improper teaching dimension is related to Cost(H) in that:

Lemma D.10.

XTD(H, µ, ϵ) Cost(H).

Proof. Let h0 = argmaxh:X Y t(h, H, µ, ϵ). Let k denote t(h0, H, µ, ϵ) 1. It sufﬁces to show that Cost(H) k. To see this, ﬁrst note that

Cost(H) = 1 + min x max y Cost(H[(x, y)])

1 + min x1 X Cost(H[(x, h0(x))])

2 + min x1 X min x2 X Cost(H[{(x1, h0(x1)), (x2, h0(x2))}])

6Note that h is allowed to be outside H.

Active Fairness Auditing

We can repeatedly unroll the above expression as long as diamµ(H[{(x1, h0(x1)), . . . , (xi, h0(xi))]) is at least > 2ϵ. After unrolling k 1 times where Uk 1 = x1, . . . , xk 1 , we have

Cost(H) k 1 + min Uk 1 Cost(H(h0, Uk 1)).

By the deﬁnition of t(h, H, µ, ϵ), for any U with U k 1, there exists h , h H(h0, U) such that |µ(h ) µ(h )| > ϵ diamµ(H(h0, U)) > ϵ. Thus, for any unlabeled dataset Uk 1 of size k 1, Cost(H(h0, Uk 1)) 1. Therefore, Cost(H) k.

D.5.2. PROOF OF THEOREM 3.8

Proof. We prove the theorem as follows:

Correctness. Observe that right before Algorithm 3 returns, it must execute lines 9 and 17. Since the condition on line 17 is also satisﬁed, the dataset T must be such that ˆh(T) = h (T). Combined with the deﬁnitions of optimization problems (1) and (2), this implies that, the h1 and h2 used in line 9 right before return satisfy that

µ(h1) = min h H(h ,T ) µ(h), µ(h2) = max h H(h ,T ) µ(h).

Therefore, µ(h ) [minh H(h ,T ) µ(h), maxh H(h ,T ) µ(h)] = [µ(h1), µ(h2)]. Furthermore, by line 9, µ(h1) µ(h2) 2ϵ. Hence, ˆµ, the output of Algorithm 3, satisﬁes that,

ˆµ µ(h ) = 1 2 µ(h1) + µ(h2) µ(h ) ϵ.

Label complexity. We now bound the label complexity of the algorithm, speciﬁcally, in terms of XTD(H, µ, ϵ).

First, at the end of the t-th iteration of the outer loop, the newly collected dataset Tt must be such that x Tt and ˆh(x) = h (x). As O has a mistake bound of M, the total number of outer loop iterations, denoted by N, must be most

M. In addition, by Lemma D.11 given below, with probability 1 δ/M, |Tt| O XTD(H, µ, ϵ) log |H|M

δ log |X| . Therefore, by a union bound, with probability 1 δ, the total number of label queries made by Algorithm 3 is at most

t=1 |Tt| O M XTD(H, µ, ϵ) log |H|M

δ log |X| .

Lemma D.11. For every outer iteration of Algorithm 3, with probability 1 δ

M , T, the dataset at the end of this iteration,

satisﬁes|T| O XTD(H, µ, ϵ) log |H|M

δ log |X| .

Proof. The inner loop is similar to the black-box teaching algorithm of (Dasgupta et al., 2019) except that we are teaching µ(ˆh) as opposed to ˆh itself. Although (Dasgupta et al., 2019) s algorithm was originally designed for exact (interactive) teaching, it implicitly gives an oracle-efﬁcient algorithm for approximately computing the minimum set cover; we will use this insight throughout the proof. As the analysis of (Dasgupta et al., 2019) is only on the expected number of teaching examples, we use a different ﬁltration to obtain a high probability bound over the number of teaching examples.

First we setup some useful notations for the proof. let X = {x1, . . . , xm}. Recall that λ = ln |H|2M

δ . Let Wi(x) denote the weight of point x X (denoted by w(x) in the algorithm) at the end of round i of the inner loop and let τxj be the exponentially-distributed threshold associated with xj. Deﬁne random variable Ui,j = 1{τxj > Wi(xj)}. Let Mi denotes the number of teaching examples selected in the ith round of doubling; it can be seen that Mi = P

j [m] Ui,j. Also deﬁne (i, j) (i , j ) iff (i, j) precedes (i , j ) lexicographically.

Deﬁne two ﬁltrations:

1. Let Fi,j be the sigma-ﬁeld of all indicator events {Ui ,j : (i , j ) (i, j)}. As a convention, Fi,0 := Fi 1,m.

Active Fairness Auditing

2. Let Fi be the sigma-ﬁeld of all indicator events {Ui ,j : j [m], 1 i i}; this is the ﬁltration used by (Dasgupta et al., 2019). It can be easily seen that Fi = Fi,m.

Deﬁne Yi,j = P (i ,j ) (i,j) Zi ,j , where Zi,j = Ui,j E Ui,j | Fi,j 1 [ 1, +1]. Then Yi,j is a martingale as E[Yi,j|Fi,j 1] = E[Zi,j|Fi,j 1] + E[Yi,j 1|Fi,j 1] = Yi,j 1.

Let N be the total number of rounds, which by item 1 of Lemma D.13, is O(XTD(H, µ, ϵ) ln |X|) (Lemma 4 of (Dasgupta et al., 2019)) with probability 1. We may then apply Freedman s inequality (Lemma D.12): since Yi,j Yi,j 1 = Zi,j 1 almost surely, for any s and any σ2 > 0,

n, m, Ynm s, X

(i,j) (n,m) E[Z2 ij|Fi(j 1)] σ2

2(σ2 + s/3)

Next, we let σ2 = λ(1 + XTD(H, µ, ϵ) ln(2|X|)); we have for any n, m: X

(i,j) (n,m) E[Z2 ij|Fi(j 1)]

(i,j) (n,m) E[U 2 ij|Fi(j 1)] E[Uij|Fi(j 1)]2

(i,j) (n,m) E[U 2 ij|Fi(j 1)]

(i,j) (n,m) E[Uij|Fi(j 1)]

i=1 EFi 1 [Mi]

x X Wn(x) (Lemma D.14)

λ(1 + XTD(H, µ, ϵ) ln(2|X|)) = σ2. (Lemma D.13)

Meanwhile, we choose s = 1

, which ensures that the right hand

side of Eq. (4) is at most δ.

Thus, by Equation (4), we have with probability 1 δ, for all n, m,

(i ,j ) (n,m) Ui j

i=1 EFi 1 [Mi] O

Also, using Lemma D.14 and D.13, with probability 1, PN i=1 EFi 1 [Mi] λ(1 + XTD(H, µ, ϵ) ln(2|X|)).

Therefore, for YNm in particular,

YNm O λ(1 + XTD(H, µ, ϵ) ln(2|X|)) + p

λ(1 + XTD(H, µ, ϵ) ln(2|X|)) ln(1/δ) + ln(1/δ)

= O λ(1 + XTD(H, µ, ϵ) ln(2|X|)) + ln 1

= O XTD(H, µ, ϵ) ln(|X|) ln((|H|M)/δ) .

Lemma D.12 (Freedman s Inequality). Let martingale {Yk} k=0 with difference sequence {Xk} k=0 be such that Xk R a.s for all k and Y0 = 0. Let Wk = Pk j=1 Ej 1[X2 j ]. Then, for all t 0 and σ2 > 0:

Pr( k 0 : Yk t Wk σ2) exp

t2/2 σ2 + Rt/3

Active Fairness Auditing

Lemma D.13. For any outer iteration of Algorithm 3:

1. The number of inner loop iterations is at most XTD(H, µ, ϵ) log(2|X|).

2. At any point in the inner loop, we have that, P

x X w(x) 1 + XTD(H, µ, ϵ) log(2|X|).

Proof. The proof is very similar to Dasgupta et al. (2019, Lemma 4) with some differences; for completeness, we include a proof here.

We ﬁrst prove the second item. First, note that at any point of the algorithm, for all x, w(x) 2. Let S (ˆh) be the optimal solution of optimization problem (OP-ˆh) - we have |S (ˆh)| = t(ˆh, H, µ, ϵ) XTD(H, µ, ϵ). Note that every time when line 13 is called, by the feasibility of S (ˆh) with respect to (OP-ˆh), (h1, h2) S (ˆh) = , therefore, the weight of some element x S (ˆh) gets doubled. This implies that the total number of times line 13 is executed is at most |S (ˆh)| log(2|X|). Otherwise, if the number of time line 13 is executed is |S (ˆh)| log(2|X|) + 1, by the pigeonhole principle, there must exist some element x S (ˆh) whose weight exceeds 1, which is a contradiction.

Finally, note that each weight doubling only increases the total weight by 1, we have the ﬁnal total weight is at most

1 + 1 |S (ˆh)| log(2|X|) 1 + XTD(H, µ, ϵ) log(2|X|).

The ﬁrst item follows since the number of inner iterations is at most the number of weight doublings.

Lemma D.14. For every inner iteration, E[Mi|Fi 1] P

x X λ(Wi(x) Wi 1(x)).

Proof. The proof is almost a verbatim copy of Dasgupta et al. (2019, Lemma 6), which we include here:

E[Mi|Fi 1] = X

x X Pr(x chosen in round i|x not chosen before round i, Fi 1)

x X 1 Pr(τx > Wi(x)|τx > Wi 1(x))

x X (1 exp( λ(Wi(x) Wi 1(x))))

x X λ(Wi(x) Wi 1(x)).

E. Deferred Materials from Section 4

E.1. Distribution-free Query Complexity Lower Bounds for Auditing with VC classes

Theorem E.1 (Lower bound for randomized auditing). If hypothesis class H has VC dimension d 1600, and ϵ (0, 1

40], then for any (possibly randomized) algorithm A, there exists a distribution D realizable by h H, such that when A is given a querying budget N Ω(min(d, 1

ϵ2 )), its output ˆµ is such that

P ˆµ µ(h ) > ϵ > 1

Proof. We will be using Le Cam s method with several subtle modiﬁcations. First, we will reduce the estimation problem to a hypothesis testing problem, where under different hypotheses, the µ(h ) will be centered around two Ω(ϵ)-separated values with high probability. Second, we will upper bound the distribution divergence of the interaction history under the two hypotheses; this requires some delicate handling, as the label on a queried example depends not only on the identity of the example, but also historical labeled examples.

Active Fairness Auditing

Step 1: the construction. As VC(H) = d, there exists a set of examples Z = {z0, z1, . . . , zd 1} X shattered by H. Let Z+ = {z1, . . . , zd 1}. Let DX be as follows: x | x A = 0 is uniform over Z+, whereas x | x A = 1 is the delta mass on z0.

Let ϵ = 10 max(ϵ, 1

d); by the conditions that d 1600 and ϵ 1 40, we have ϵ 1

4. Let label budget N = 1 24 ϵ2 = Ω min(d, 1

Consider two hypotheses that choose h randomly from { 1, +1}Z+, subject to h (z0) = 0:

H0: choose h such that for every i [d 1], independently, h (zi) =

( +1, with probability 1

2 ϵ 1, with probability 1

H1: choose h such that for every i [d 1], independently, h (zi) =

( +1, with probability 1

2 + ϵ 1, with probability 1

We have the following simple claim that shows the separation of µ(h ) under the two hypotheses. Its proof is deferred to the end of the main proof.

Claim E.2. Ph H0 µ(h ) 1

16, and Ph H1 µ(h ) 1

Step 2: upper bounding the statistical distance. Next, we show that H0 and H1 are hard to distinguish with A having a label budget of N. To this end, we upper bound the KL divergence of the joint distributions of (x1, y1), . . . , (xn, yn) =: (x, y) n under H0 and H1, denoted as P0 and P1 respectively. Applying Lemma E.14, we have:

KL(P0, P1) =

i=1 E h KL P0(yi = | (x, y) i 1, xi)), P1(yi = | (x, y) i 1, xi) i . (5)

We claim that for every i and ((x, y) i 1, xi) (X Y)i 1 X on the support of P0,

KL P0(yi = | (x, y) i 1, xi)), P1(yi = | (x, y) i 1, xi) 3 ϵ2. (6)

First, observe that if (x, y) i 1, xi is in the support of P0, there must exists some h : Z { 1, +1} such that h (xj) = yj for all j [i 1]; in particular, this means there must not exist j1 = j2 in [i 1], such that xj1 = xj2 but yj1 = yj2.

Next, we note that, under H0, conditioned on (x, y) i 1, the posterior distribution of h is supported over the set h | h : Z { 1, +1} , j [i 1], h(xj) = yj , and speciﬁcally, for all x Z \ xj : j [i 1] , the h (x) s are independent conditioned on (x, y) i 1, and

P0 h (x) = +1 | (x, y) i 1 = 1

The same statement holds for H1 except that for all x Z \ xj : j [i 1] , we now have P1(h (x) = +1 | (x, y) i 1) = 1 2 + ϵ. In addition, the conditional distribution of yi | (x, y) i 1, xi, equals the conditional distribution of h (xi) | (x, y) i 1, under both H0 and H1. We now perform a case analysis:

1. If xi xj : j [i 1] , then under both H0 and H1, the distributions of h (xi) | (x, y) i 1 are equal: they both equal to the delta mass supported on the only element of the singleton set yj : j [i 1], xj = xi . In this case, KL P0(yi = | (x, y) i 1, xi)), P1(yi = | (x, y) i 1, xi) = 0 3 ϵ2.

2. Otherwise, xi / xj : j [i 1] . Under H0, h (xi) | (x, y) i 1 takes value +1 with probability 1

2 ϵ, and takes value 1 with probability 1 2 + ϵ; similarly, under H1, h (xi) | (x, y) i 1 takes value +1 with probability 1 2 + ϵ, and takes value 1 with probability 1 2 ϵ. In this case, by Fact E.13 and that ϵ 1 4, KL P0(yi = | (x, y) i 1, xi)), P1(yi = | (x, y) i 1, xi) = kl 1

2 + ϵ 3 ϵ2.

In summary, in both cases, Equation (6) holds, and plugging this back to Equation (5) with n = 1 24 ϵ2 , we have

KL(P0, P1) 3n ϵ2 1

8. By Pinsker s inequality (Lemma E.11), d TV(P0, P1) q

1 2KL(P0, P1) 1

2. By Le Cam s

Active Fairness Auditing

Lemma (Lemma E.10), for any hypothesis tester ˆb, we have

1 2P0 ˆb = 1 + 1

2P1 ˆb = 0 1

2 1 d TV(P0, P1) 1

Step 3: concluding the proof. Given A s output auditing estimate ˆµ, consider the following hypothesis test:

( 0, ˆµ < 1

Plugging into Equation (7), we have

Now, recall Claim E.2, and using the fact that P(A B) P(A) P(BC) = P(A) + P(B) 1, we have

Symmetrically, we also have

Combining Equations (8), (9), and (10), we have

2 ϵ > ϵ, and the left hand side can be viewed as the total probability of ˆµ µ(h ) > ϵ when h is drawn from the uniform mixture distribution of the h distributions under H0 and H1. By the probabilistic method, there exists some h

such that Ph ,A ˆµ µ(h ) > ϵ > 1

Proof of Claim E.2. Without loss of generality, we show the ﬁrst inequality; the second inequality can be shown symmetrically. Note that under H0, the random h s DP value satisﬁes

µ(h ) = Pr(h (x) = +1 | x A = 0) Pr(h (x) = +1 | x A = 1) = 1 d 1

i=1 1{h (zi) = +1},

where the second equality follows from that Pr(h (x) = +1 | x A = 1) = 0 as h (z0) = 1 is always true.

Under H0, (d 1)µ(h ) is the sum of (d 1) iid Bernoulli random variables with mean parameter 1

2 ϵ. Therefore, by Hoeffding s inequality, we have

where the second inequality uses the fact that ϵ = 10 max ϵ, 1

E.2. Query Complexity for Auditing Non-homogeneous Halfspaces under Gaussian Subpopulations

Theorem E.3 (Lower bound). Let d 6400 and ϵ (0, 1

80]. If DX is such that x | x A = 0 N(0d, Id), whereas x | x A = 1 N(0d, (0)d d) (i.e. the delta-mass supported at 0d). For any (possibly randomized) algorithm A, there exists h in Hlin the class of nonhomogeneous linear classiﬁers, such that when A is given a query budget N Ω min(d, 1

ϵ2 ) , its output ˆµ is such that

PA,h ˆµ µ(h ) > ϵ > 1

Active Fairness Auditing

Proof. Similar to the proof of Theorem E.1, we will use Le Cam s method. In addition to the same challenges in the proof of Theorem E.1, in the active fairness auditing for halfspaces setting, we are faced with the extra challenge that the posterior distributions of h (xi) | (x, y) i 1 deviates signiﬁcantly from the prior distribution of h (xi), and cannot be easily calculated in closed form. To get around this difﬁculty, using the chain rule of KL divergence, along with the posterior formula for noiseless Bayesian linear regression with Gaussian prior, we calculate a tight upper bound on the KL divergence between two carefully constructed, well-separated hypotheses.

Step 1: the construction. Let ϵ = 40 max(ϵ, 1

d); by the assumption that ϵ 1 80 and d 6400, we have ϵ 1

2. Let label budget N = 1 64 ϵ2 = Ω min(d, 1

Consider two hypotheses that choose h = ha ,b , such that b = 1, and a is chosen randomly from different distributions:

H0 : a N(0, 1

d(1 + ϵ)Id) H1 : a N(0, 1

We have the following claim that shows the separation of µ(h ) under the two hypotheses. Its proof is deferred to the end of the main proof.

Claim E.4. Ph H0 µ(h ) > Φ( 1) + ϵ 36 15 16, and Ph H1 µ(h ) < Φ( 1) ϵ 36 15 16, where Φ(z) = R z 1

2 dz is the standard normal CDF.

Step 2: upper bounding the statistical distance. Next, we show that H0 and H1 are hard to distinguish with A making n N label queries. To this end, we upper bound the KL divergence of the joint distributions of (x, y) n under H0 and H1, denoted as P0 and P1 respectively. To this end, deﬁne yi = a , xi 1 for i [n], and yi = sign( yi). Deﬁne P0 and P1 (resp. Q0 and Q1) as the joint distributions of (x, y) n (resp. (x, y, y) n) under H0 and H1 respectively. By the chain rule of KL divergence (Lemma E.12 with Z = (x, y) n, W = y n and Z = (x, y) n, W = y n respectively), we get:

KL(Q0((x, y, y) n), Q1((x, y, y) n)

= KL(Q0((x, y) n), Q1((x, y) n) | {z } KL(P0,P1)

+ KL(Q0(( y) n | (x, y) n), Q1(( y) n | (x, y) n)) | {z } 0

= KL(Q0((x, y) n), Q1((x, y) n) | {z } KL( P0, P1)

+ KL(Q0((y) n | (x, y) n), Q1((y) n | (x, y) n)) | {z } 0

where the last term is 0 because under both Q0 and Q1, (y) n | (x, y) n is the delta mass supported on (sign( y)) n. As a consequence, KL(P0, P1) KL( P0, P1)

Also, note that A can be viewed as a query learning algorithm that at round i, receives (x, y) i 1 as input, and choose the next example for query (i.e., it elects to only use the thresholded value yj s as opposed to the yj s). Applying Lemma E.14, we have:

KL( P0, P1) =

i=1 E KL(P0( yi = | (x, y) i 1, xi)), P1( yi = | (x, y) i 1, xi)) . (11)

We claim that for every i and ((x, y) i 1, xi) (X Y)i 1 X on the support of P0,

KL(P0( yi = | (x, y) i 1, xi)), P1( yi = | (x, y) i 1, xi)) 3 ϵ2. (12)

First, by Lemma E.5 (deferred to the end of the proof), under H0, conditioned on (x, y) i 1 on the support of P0, the posterior distribution of a is the same as a N(0, 1

d(1 + ϵ)Id) conditioned on the afﬁne set S = a Rd : a, xl + 1 = yl, l [i 1] . Denote Xi 1 = [x 1 ; x 2 ; . . . , x i 1] R(i 1) d, and Yi 1 = ( y1, . . . , yi 1); for (x, y) i 1 on the support of P0, it must be the case that S = , and as a result, ˆa = X i 1( Yi 1 1i 1) S. Also, denote by X i 1 a matrix whose columns are an orthonormal basis of span(x1, . . . , xi 1); such a X i 1 is always well-deﬁned as i 1 n 1 d 1. Applying Lemma E.17, we have

Active Fairness Auditing

a | (x, y) i 1 N ˆa, 1

d(1 + ϵ)X i 1(X i 1) ,

with its covariance matrix 1

d(1 + ϵ)X i 1(X i 1) being rank-deﬁcient.

Now, observe that yi | (x, y) i 1, xi has the same distribution as a , xi + 1 | (x, y) i 1, which is N ˆa, xi + 1, 1

d(1 + ϵ)x i X i 1(X i 1) xi .

Similarly, under H1, we have yi | (x, y) i 1, xi has distribution N ˆa, xi + 1, 1

d(1 ϵ)x i X i 1(X i 1) xi . We now prove (12) by a case analysis:

1. If xi span(x1, . . . , xi 1), then (X i 1) xi = 0, and under both H0 and H1, the posterior distributions of yi | (x, y) i 1, xi are both delta mass on ˆa, xi + 1, and therefore, KL(P0( yi = | (x, y) i 1, xi)), P1( yi = | (x, y) i 1, xi)) = 0 3 ϵ2.

2. If xi / span(x1, . . . , xi 1), then (X i 1) xi = 0, and under H0 and H1, the posterior distributions of yi | (x, y) i 1, xi are N(ˆµi, (1 + ϵ)σ2 i ) and N(ˆµi, (1 ϵ)σ2 i ) respectively, where ˆµi = ˆa, xi + 1, and σ2 i = 1

dx i X i 1(X i 1) xi. In this case, by Fact E.15,

KL P0( yi = | (x, y) i 1, xi)), P1( yi = | (x, y) i 1, xi)

=KL N(ˆµi, (1 + ϵ)σ2 i ), N(ˆµi, (1 ϵ)σ2 i )

1 ϵ 1 + ln(1 ϵ

where the ﬁrst inequality is by the fact that ln(1 + x) x x2 when x 0, and taking x = 2 ϵ 1 ϵ, and the second inequality is from ϵ 1

2 and algebra.

In summary, in both cases, Equation (12) holds, and plugging this back to Equation (11) with n 1 64 ϵ2 , we have

KL(P0, P1) 8n ϵ2 1

8. By Pinsker s inequality (Lemma E.11), d TV(P0, P1) q

1 2KL(P0, P1) 1

2. Le Cam s lemma

(Lemma E.10) implies that, for any hypothesis tester ˆb, we have

1 2P0(ˆb = 1) + 1

2P1(ˆb = 0) = 1

2(1 d TV(P0, P1)) 1

Step 3: concluding the proof. Given A s output auditing estimate ˆµ, consider the following hypothesis tester:

( 0, ˆµ > Φ( 1), 1, ˆµ Φ( 1).

Plugging into Equation (7), we have

1 2P0 ˆµ Φ( 1) + 1

2P1 ˆµ > Φ( 1) 1

Now, recall Claim E.4, and using the fact that P(A B) P(A) P(BC) = P(A) + P(B) 1, we have

ˆµ Φ( 1), µ(h ) > Φ(1) 1

36 ϵ P0 ˆµ Φ( 1) +15

16 1 P0 ˆµ Φ( 1) 1

(15) Symmetrically, we also have

36 ϵ P1 ˆµ > Φ( 1) 1

Active Fairness Auditing

Combining Equations (14), (15), and (16), we have

As 1 36 ϵ ϵ, and the left hand side can be viewed as the total probability of ˆµ µ(h ) ϵ when h is drawn from the uniform mixture distribution of the h distributions under H0 and H1. By the probabilistic method, there exists some h H such that Ph ˆµ µ(h ) > ϵ > 1

Lemma E.5. Given the same setting above. For any ﬁxed i N and (x, y) i, the posterior distribution a | (x, y) i is the same as a | {a U}, where U = a : j [i] : xj, a + 1 = yj .

Proof. We use the Bayes formula to expand the posterior; below denotes equality up to a multiplicative factor independent of a .

P(a | (x, y) i) P(a , (x, y) i)

j=1 P(xj | a , (x, y) j 1)P( yj | xj, a , (x, y) j 1)

j=1 P(xj | (x, y) j 1)1 yj = xj, a + 1

j=1 1 yj = xj, a + 1

where the second equality uses the deﬁnition of conditional probability; the third equality uses the fact that for any ﬁxed query learning algorithm A, xj is independent of a conditioned on (x, y) j 1, and the observation that given xj and a , yj = xj, a + 1 deterministically. This concludes the proof.

Proof of Claim E.4. For h (x) = sign( a , x + b ) where b = 1, it can be seen that,

P0(h (x) = +1 | x A = 1) = 0,

On the other hand,

P0(h (x) = +1 | x A = 0) = Pz N(0,Id)( a , z 1) = Pz N(0,Id)

Also, note that under H0, d a 2 2 (1+ ϵ) χ2(d); Therefore, by Fact E.16, we have that with probability 15 16, d a 2 2 (1+ ϵ)

1 d), which implies that

(1 + ϵ)(1 10 q

1 (1 + ϵ)(1 ϵ

Therefore, as for every a, b [ 3

4, 1], Φ(a) Φ(b) minξ [ 3

4 ,1] Φ (ξ)|a b| 1

9|a b|, we have:

36) Φ( 1) + ϵ

This concludes the proof of the ﬁrst inequality. The second inequality is proved symmetrically.

Active Fairness Auditing

We now present our (deterministic) active fairness auditing algorithm, Algorithm 6 and its guarantees. Algorithm 6 works under the setting when the two subpopulations are Gaussian, whose mean and covariance parameters (m0, Σ0), (m1, Σ1) are known. It also assumes access to black-box queries to h Hlin = ha,b(x) := sign( a, x + b) : a Rd, b R , and aims to estimate µ(h ) within precision ϵ. Recall that

µ(h ) = Prx DX h (x) = 1 | x A = 0 Prx DX h (x) = 1 | x A = 1 ,

it sufﬁces to estimate γb := Prx DX h (x) = 1 | x A = 0 within precision ϵ/2, for each b {0, 1}. To this end, we note that γb = Prx N(mb,Σb) h (x) = 1 = Pr x N(0,Id) h (mb + Σ1/2 b x) = 1 ;

if we deﬁne hb : Rd { 1, +1} such that

hb( x) = h (mb + Σ1/2 b x), (17)

γb equals to γ( hb), where γ(h) = P x N(0,Id) h( x) = 1 is the probability of positive prediction of h under the standard Gaussian distribution. Importantly, as h is a linear classiﬁer, hb is also a linear classiﬁer and lies in Hlin.

Recall that procedure ESTIMATE-POSITIVE (Algorithm 4) label-efﬁciently estimates γ(h) for any h Hlin, using query access to h. Algorithm 6 uses it as a subprocedure to estimate γb = γ( hb) (line 3). To simulate label queries to hb using query access to h , according to Equation (17), it sufﬁces to apply an afﬁne transformation on the input x, obtaining transformed input mb + Σ1/2 b x, and query h on the transformed input.

Finally, after ˆγ0, ˆγ1, ϵ/2-accurate estimators of γ0, γ1 are obtained, Algorithm 6 takes their difference as our estimator ˆµ for µ(h ) (line 4).

Algorithm 6 Active fairness auditing for nonhomogeneous linear classiﬁers under Gaussian subpopulations Require: Subpopulation parameters (m0, Σ0), (m1, Σ1), query access to h Hlin, target error ϵ. Ensure: ˆµ such that ˆµ µ(h ) ϵ. 1: for b {0, 1} do

2: Deﬁne hb : Rd { 1, +1} such that hb( x) = h (mb +Σ1/2 b x); // hb Hlin, and each query to hb can be simulated by one query to h

3: ˆγb ESTIMATE-POSITIVE( hb, ϵ

2) 4: return ˆγ0 ˆγ1

Theorem E.6 (Upper bound). If h Hlin, DX is such that x | x A = 0 N(m0, Σ0), x | x A = 1 N(m1, Σ1). Algorithm 6 outputs ˆµ, such that with probability 1, ˆµ µ(h ) ϵ; moreover, Algorithm 6 makes at most O(d ln d

ϵ ) label queries to h .

Proof. As we will see from Lemma E.7, for b {0, 1}, the respective calls of ESTIMATE-POSITIVE ensures that

| ˆγb γb| ϵ

Therefore, ˆµ µ(h ) | ˆγ0 γ0| +| ˆγ1 γ1| ϵ.

Moreover, for every b, Lemma E.7 ensures that each call to ESTIMATE-POSITIVE only makes at most O(d ln d

ϵ ) label queries to hb; as simulating each query to hb takes one query to h , for every b, it also makes at most O(d ln d

ϵ ) label queries to h . Summing the number of label queries over b {0, 1}, the total number of label queries by Algorithm 6 is O(d ln d

We now turn to presenting the guarantee of the key subprocedure ESTIMATE-POSITIVE and its proof. This expands the analysis sketch in Section 4.3.

Active Fairness Auditing

Lemma E.7 (Guarantees of ESTIMATE-POSITIVE). Recall that γ(h) = Prx N(0,Id)(h(x) = +1). ESTIMATE-POSITIVE (Algorithm 4) receives inputs query access to h Hlin, and target error ϵ, and outputs ˆγ such that ˆγ γ(h ) ϵ. (18)

Furthermore, it makes at most O(d ln d

ϵ ) queries to h .

Proof. Let h (x) = sign( a , x +b ) be the target classiﬁer. First, observe that γ(h ) = Φ b a 2

=: Φ(sr), where Φ is

the standard normal CDF, s := sign(b ), and r := q

1 Pd i=1 m 2 i , for mi := b

a i . Note that line 2 of ESTIMATE-POSITIVE

correctly obtains s, as s = h ( 0) = sign( a , 0 + b) = sign(b).

Recall that α = q

ϵ and β = 2d 5 2 (ln 1

ϵ ) 3 4 ( 1

ϵ ) 1 2 . We consider two cases depending on the line in which ESTIMATE-POSITIVE returns:

1. If ESTIMATE-POSITIVE returns in line 5, then it must be the case that for all i [d], h (αei) = h ( αei). In this

case, by Lemma E.9, we have that for every i,|mi| α. This implies that r = q

1 Pd i=1 m 2 i q

For the case that s = 1, we have that γ(h ) = Φ(sr) ϵ, where we use the standard fact that Φ(x) exp( x2

2 ) for x 0; in this case ˆγ = 0 ensures Equation (18) holds; for the symmetric case that s = +1, γ(h ) = Φ(sr) 1 ϵ and ˆγ = 1, which also ensures Equation (18).

2. On the other hand, ESTIMATE-POSITIVE returns in line 13, it must be the case that there exists some i0 [d], such that |mi0| α. This implies that r = q

1 Pd i=1 m 2 i q 1 m 2 i0 =|mi0| α.

Now, ESTIMATE-POSITIVE must execute lines 6 to 11. The ﬁnal S it computes has the following properties: for every i S added, by the guarantee of procedure BINARY-SEARCH (Algorithm 5),| ˆmi mi| ϵ; otherwise, for i / S, it must be the case that h (βei) = h ( βei), which, by Lemma E.9, implies that|mi| β. Therefore, all the conditions of Lemma 4.4 are satisﬁed, and thus,|ˆr r| 2ϵ. This also yields that|sˆr sr| 2ϵ. Finally, note that Φ is 1

2π-Lipschitz, we have

ˆγ γ(h ) = Φ(sˆr) Φ(sr) 1

2π |sˆr sr| ϵ.

In summary, in both cases, ESTIMATE-POSITIVE outputs ˆγ such that Equation (18) is satisﬁed.

We now calculate the total query complexity of ESTIMATE-POSITIVE. Line 2 makes 1 label query; line 3 makes 2d label queries; for each i [d], line 8 makes 2 label queries, and BINARY-SEARCH makes log 2β

ϵ label queries. In summary, the total label query complexity of ESTIMATE-POSITIVE is:

1 + 2d + d(2 + log 2β

ϵ ) = O d ln d

We now present the proof of Lemma 4.4, which is key to the proof of Lemma E.7.

Proof of Lemma 4.4. First, by Lemma E.8, and the assumption that for all i S,| ˆmi mi| ϵ, we have

It remains to prove that

1 Pd i=1 m 2 i

Active Fairness Auditing

which combined with the above inequality, will conclude the proof.

To see this, let z = Pd i=1 m 2 i and z S = P

i S m 2 i ; since for all i / S,|mi| β, this implies that

Also, note that q

1 Pd i=1 m 2 i = r α implies that z 1 α2 = 1 2d ln 1

ϵ ; therefore, z S z 2ϵ

ϵ ) 3 2 1 4d ln 1

ϵ . Now, by

Lagrange mean value theorem, 1 z S 1 z

max z (z S,z) 1 2(z ) 3

This concludes the proof.

Lemma E.8. Let l N+ and f(m1, . . . , ml) := q

1 Pl i=1 m 2 i ; then f is 1-Lipschitz with respect to .

Proof. First, we show that f is 1-Lipschitz with respect to in each of the orthants of Rl. Without loss of generality, we focus on the positive orthant R =: m Rl : mi 0, i . We now check that for any two points m and n in R, f( m) f( n) m n . By Lagrange mean value theorem, there exists some θ t m + (1 t) n : t (0, 1) , such that f( m) f( n) = f(θ), m n f(θ) 1 m n ,

where the second inequality is from H older s inequalty. Therefore, it sufﬁces to check that for all m in the R0 =: m Rl : mi > 0, i (interior of R), f(m1, . . . , ml) 1 1. To see this, note that

f(m1, . . . , md) =

m 3 1 (Pl i=1 m 2 i ) 3 2 , . . . , m 3 l (Pl i=1 m 2 i ) 3 2

Observe that Pl i=1|gi|

2 3 = 1; this implies that for every i [l],|gi| 1, and therefore,

i=1 |gi| 1.

Now consider m, n Rl that do not necessarily lie in the same orthant. Suppose the line segment t m + (1 t) n : t [0, 1] consists of k pieces, where piece i is t m + (1 t) n : t [ti 1, ti] , where 1 = t0 > t1 > . . . > tk = 0, where each piece is contained in an orthant. Then we have:

f( m) f( n)

f(ti 1 m + (1 ti 1) n) f(ti m + (1 ti) n)

i=1 (ti 1 m + (1 ti 1) n) (ti m + (1 ti) n)

i=1 (ti 1 ti) m n

where the second inequality uses the Lipchitzness of f within the orthant that contains piece i, for each i in [k].

Lemma E.9. Given i [d] and ξ > 0, if h (ξei) = h ( ξei), then|mi| ξ.

Proof. Suppose h (ξei) = h ( ξei) = +1; in this case, bi ξa i bi, and therefore, ξa i bi, which implies that |mi| ξ. The case of h (ξei) = h ( ξei) = +1 can be proved symmetrically.

Active Fairness Auditing

E.3. Auxiliary Lemmas for Query Learning Lower Bounds

In this subsection we collect a few standard and useful lemmas for establishing lower bounds for general adaptive sampling and query learning algorithms, including active fairness auditing algorithms. Throughout, denote by P the distribution of interaction transcript (the sequence of N labeled examples (x1, y1), . . . , (x N, y N) ) obtained by the query learning algorithm by interacting with the environment, and use the shorthand (x, y) i to denote (x1, y1), . . . , (xi, yi) .

Lemma E.10 (Le Cam s Lemma). Given two distributions P0, P1 over observation space z Z, and let ˆb : Z {0, 1} be any hypothesis tester. Then,

1 2P0 ˆb(Z) = 1 + 1

2P1 ˆb(Z) = 0 1

2 1 d TV(P0, P1) ,

where d TV(P0, P1) denotes the total variation distance between P0 and P1.

Lemma E.11 (Pinsker s Inequality). For two distributions P and Q, d TV(P0, P1) q

1 2KL(P, Q).

Lemma E.12 (Chain rule of KL divergence). For two distributions Q0(Z, W) and Q1(Z, W) over Z W, we have

KL(Q0, Q1) =KL(Q0 Z, Q1 Z) + Ez Q0 Z

h KL(Q0 W |Z( | z), Q1 W |Z( | z)) i .

Fact E.13. Let kl( , ) denote the binary relative entropy function. For a, b [ 1

4], kl(a, b) 3(b a)2.

The following lemma is well-known.

Lemma E.14 (Divergence decomposition). For a (possibly randomized) query learning algorithm A with label budget N, under two hypotheses H0, H1 (represented by distributions over the target concept h ), we have:

KL(P0, P1) =

i=1 E KL(P0(yi = | (x, y) i 1, xi)), P1(yi = | (x, y) i 1, xi))

Proof. We simplify KL(P0, P1) as follows:

KL(P0, P1) = X

(x,y) N P0((x, y) N) ln P0((x, y) N)

P0((x, y) N)

(x,y) N P0((x, y) N)

i=1 ln PA(xi | (x, y) i 1)

PA(xi | (x, y) i 1) + ln P0(yi | (x, y) i 1, xi)

P1(yi | (x, y) i 1, xi)

(x,y) i P0((x, y) i) ln P0(yi | (x, y) i 1, xi)

P1(yi | (x, y) i 1, xi)

(x,y) i 1,xi P0((x, y) i 1, xi) X

yi P0(yi | (x, y) i 1, xi) ln P0(yi | (x, y) i 1, xi)

P1(yi | (x, y) i 1, xi)

i=1 E KL(P0(yi = | (x, y) i 1, xi)), P1(yi = | (x, y) i 1, xi)) ,

where the ﬁrst equality is by the deﬁnition of KL divergence; the second equality is from the chain rule of conditional probability; the third equality is by canceling out the conditional probabilities of unlabeled examples given history, as we run the same algorithm A under two environments; the fourth equality is by the law of total probability; the ﬁfth equality is again by the deﬁnition of the KL divergence.

Fact E.15 (KL divergence between Gaussians of the same mean). If µ R and σ1, σ2 > 0, then,

KL N(µ, σ2 1), N(µ, σ2 2)) = σ2 1 σ2 2 1 + ln σ2 2 σ2 1 .

Active Fairness Auditing

Fact E.16 (Concentration of χ2 random variables). For d 1, Z χ2(d), and δ > 0,

Speciﬁcally,

The lemma below is a standard fact on normal distribution conditioned on afﬁne subspaces; we include a proof here as we cannot ﬁnd a reference.

Lemma E.17. Suppose U = θ Rd : Xθ = y is an nonempty afﬁne subspace of Rd, where X Rm d has rows x1, . . . , xm Rd. Let dim(span(x1, . . . , xm)) = l, and let W Rd (d l) be a matrix whose columns form an orthonormal basis of span(x1, . . . , xm) . Consider Z N(0, Id); then,

Z | {Z U} N(X y, WW ).

Proof. Denote by ˆθ = X y the least norm solution of equation Xθ = y. It is well-known that ˆθ span(x1, . . . , xm). As

U = , X ˆθ = y. We now claim that U can be equivalently written as n ˆθ + Wα : α Rd lo :

1. On one hand, for all θ = ˆθ + Wα, Xθ = X ˆθ + XWα = y + 0 = y.

2. On the other hand, for every θ U, as Xθ = y, we have X(θ ˆθ) = 0, which implies that θ ˆθ span(x1, . . . , xm) . Therefore, there exists some α Rd l such that θ = ˆθ + Wα.

Deﬁne V Rd l to be a matrix whose columns form an orthonormal basis of span(x1, . . . , xm). We also claim that given a vector z Rd, z U V z = V ˆθ:

1. If z U, by the previous claim, z = ˆθ + Wα, and therefore V z = V ˆθ + V Wα = V ˆθ.

2. If V z = V ˆθ, then note that z = V V z +WW z = V V ˆθ +W(W z) = ˆθ +W(W z), where the last equality follows from that ˆθ span(x1, . . . , xm). Taking αz = W z Rd l, we have z = ˆθ + Wαz, implying that z U.

For the rest of the proof, let d= denote equality in distribution. Consider random variable Z d= N(0, Id). Let ϵV =

V Z, ϵW = W Z. Now, note that the matrix T =

Rd d is a orthonormal matrix,

Z = TZ d= N(0, Id),

Therefore, ϵV , ϵW are two independent, standard normal random variables with distributions N(0, Il) and N(0, Id l), respectively.

Note from the second claim that the event {Z U} is equivalent to {ϵV = V ˆθ}; therefore, ϵW | {Z U} d= N(0, Id l). As a result, Z | {Z U} d= V ϵV + WϵW | {Z U} d= ˆθ + WϵW | {Z U} d= N(X y, WW ).

F. Experiments

In this section, we empirically explore the shrinkage of the version space under various baseline methods and Algorithm 3. The two baseline methods of sampling we will consider are: 1) i.i.d sampling (without replacement) 2) active learning (CAL).

Active Fairness Auditing

Procedure: We train a logistic regression model to ﬁnd h on two datasets commonly used in Fairness literature. The ﬁrst is COMPAS (Larson et al., 2016), where the two groups are deﬁned to be Caucasian and non-Caucasian. And the second is the Student Performance Dataset, where the two groups are deﬁned to be Female and Male. Then, we run the three methods with an alloted label budget of: 20, 50, 80, 100, 120. These are a small fraction of the total dataset size (much smaller for COMPAS than Student Performance).

Evaluation: Our evaluation will be on the version space induced by the labels requested by the three methods. We will evaluate the version space in two ways:

1. Given H[S], we will compute its µ-diameter maxh,h H[S] µ(h) µ(h ). The µ-diameter of the version space captures the largest extent that the algorithm s µ estimate may be changed by post-hoc manipulation. The smaller it is the higher the degree of manipulation-proofness.

To compute maxh,h H[S] µ(h) µ(h ) , we will evaluate maxh H[S] µ(h) and minh H[S] µ(h). Let G1 = {x X : x A = 1} and G0 = {x X : x A = 0}. To implement the maximization program, we may move the constraint into the objective as a Lagrangian:

max h 1 |G1|

x G1 1{h(x) = 1} 1 |G0|

x G0 1{h(x) = 1} + λ( X

x S 1{h(x) = h (x)})

or equivalently:

max h 1 |G1|

x G1 1{h(x) = 1} + 1 |G0|

x G0 1{h(x) = 1} + λ( X

x S 1{h(x) = h (x)})

As mentioned earlier, we observe that this objective may be framed as a cost-sensitive classiﬁcation problem, which is commonly used in fairness literature (Agarwal et al., 2018). In particular, the cost for predicting 1 for x G1 is 1 |G1| and 0 o.w, the cost for predicting 1 is 0 for x G0 and 1 |G0| o.w and the cost for predicting h (x) for x S is λ and 0 o.w. By using iterative doubling and grid search, we look for the smallest λ such that we may enforce h(x) = h (x) x S (since these hard constraints) and ﬁnd the maximizing h in the version space given this λ. The same procedure is applied for the minimizing h in the version space.

2. Since we may choose any µ(h) for h H[S] to return as an estimate for µ(h ), we will evaluate Eh unif(H[S])[|µ(h) µ(h )|] this corresponds to the average error and is proportional to estimation accuracy.

For sampling from the version space, we will use the classic hit-and-run algorithm and sample 500 models from the version space at each budget and then average the error.

Results: In terms of the µ-diameter of the version space, which may be interpreted as the maximum possible degree of post-audit manipulation of µ, we see in Figure 1 that Algorithm 3 is the best of the three methods at all budgets. This is expected since Algorithm 3 is designed to make use of maxh H[S] µ(h) and minh H[S] µ(h) estimates in its query selection to shrink the version space in µ-space. Behind Algorithm 3, CAL looks to be generally better or on-par with i.i.d sampling.

In terms of estimation error, going by the average µ estimation error in the version space, we see in Figure 2 that in general, one of the active approaches outperforms that of i.i.d sampling. Between the two active approaches, there are budgets setting where one is better than the other and vice versa.

Active Fairness Auditing

Figure 1. Left: Comparison of the three methods on the Student Performance dataset on µ-diameters of the ﬁnal version spaces, as a function of label query budget. Right: Comparison of the three methods on the COMPAS dataset. For the error bars, a 95 percent conﬁdence interval is constructed using the 50 repeats at each budget.

Figure 2. Left: Comparison of the three methods on the Student Performance dataset on average µ-estimation errors of the ﬁnal version spaces, as a function of label query budget. Right: Comparison of the three methods on the COMPAS dataset. For the error bars, a 95 percent conﬁdence interval is constructed using the 50 repeats at each budget.