# certifying_some_distributional_fairness_with_subpopulation_decomposition__e7851e19.pdf Certifying Some Distributional Fairness with Subpopulation Decomposition Mintong Kang UIUC mintong2@illinois.edu UIUC linyi2@illinois.edu Maurice Weber ETH Zurich maurice.weber@inf.ethz.ch Yang Liu UC Santa Cruz yangliu@ucsc.edu Ce Zhang ETH Zurich ce.zhang@inf.ethz.ch Bo Li UIUC lbo@illinois.edu Extensive efforts have been made to understand and improve the fairness of machine learning models based on different fairness measurement metrics, especially in high-stakes domains such as medical insurance, education, and hiring decisions. However, there is a lack of certified fairness on the end-to-end performance of an ML model. In this paper, we first formulate the certified fairness of an ML model trained on a given data distribution as an optimization problem based on the model performance loss bound on a fairness constrained distribution, which is within bounded distributional distance with the training distribution. We then propose a general fairness certification framework and instantiate it for both sensitive shifting and general shifting scenarios. In particular, we propose to solve the optimization problem by decomposing the original data distribution into analytical subpopulations and proving the convexity of the sub-problems to solve them. We evaluate our certified fairness on six real-world datasets and show that our certification is tight in the sensitive shifting scenario and provides non-trivial certification under general shifting. Our framework is flexible to integrate additional non-skewness constraints and we show that it provides even tighter certification under different real-world scenarios. We also compare our certified fairness bound with adapted existing distributional robustness bounds on Gaussian data and demonstrate that our method is significantly tighter. 1 Introduction As machine learning (ML) has become ubiquitous [24, 18, 5, 11, 8, 13], fairness of ML have attracted a lot of attention from different perspectives. For instance, some automated hiring systems are biased towards males due to gender imbalanced training data [3]. Different approaches have been proposed to improve ML fairness, such as regularized training [16, 22, 26, 30], disentanglement [12, 28, 40], duality [44], low-rank matrix factorization [34], and distribution alignment [4, 29, 53]. In addition to existing approaches that evaluate fairness, it is important and challenging to provide certification for ML fairness. Recent studies have explored the certified fair representation of ML [39, 4, 36]. However, there lacks certified fairness on the predictions of an end-to-end ML model trained on an arbitrary data distribution. In addition, current fairness literature mainly focuses on training an ML model on a potentially (im)balanced distribution and evaluate its performance in a target domain measured by existing statistical fairness definitions [17, 20]. Since in practice these selected target domains can encode certain forms of unfairness of their own (e.g., sampling bias), the evaluation would be more informative if we can evaluate and certify fairness of an ML model The first two authors contributed equally. 36th Conference on Neural Information Processing Systems (Neur IPS 2022). on an objective distribution. Taking these factors into account, in this work, we aim to provide the first definition of certified fairness given an ML model and a training distribution by bounding its end-to-end performance on an objective, fairness constrained distribution. In particular, we define certified fairness as the worst-case upper bound of the ML prediction loss on a fairness constrained test distribution Q, which is within a bounded distance to the training distribution P. For example, for an ML model of crime rate prediction, we can define the model performance as the expected loss within a specific age group. Suppose the model is deployed in a fair environment that does not deviate too much from the training, our fairness certificate can guarantee that the loss of crime rate prediction for a particular age group is upper bounded, which is an indicator of model s fairness. We mainly focus on the base rate condition as the fairness constraint for Q. We prove that our certified fairness based on a base rate constrained distribution will imply other fairness metrics, such as demographic parity (DP) and equalized odds (EO). Moreover, our framework is flexible to integrate other fairness constraints into Q. We consider two scenarios: (1) sensitive shifting where only the joint distribution of sensitive attribute and label can be changed when optimizing Q; and (2) general shifting where everything including the conditioned distribution of non-sensitive attributes can be changed. We then propose an effective fairness certification framework to compute the certificate. In our fairness certification framework, we first formulate the problem as constrained optimization, where the fairness constrained distribution is encoded by base rate constraints. Our key technique is to decompose both training and the fairness constrained test distributions to several subpopulations based on sensitive attributes and target labels, which can be used to encode the base rate constraints. With such a decomposition, in sensitive shifting, we can decompose the distance constraint to subpopulation ratio constraints and prove the transformed low-dimensional optimization problem is convex and thus efficiently solvable. In general shifting case, we propose to solve it based on divide and conquer: we first partition the feasible space into different subpopulations, then optimize the density (ratio) of each subpopulation, apply relaxation on each subpopulation as a sub-problem, and finally prove the convexity of the sub-problems with respect to other low-dimensional variables. Our framework is applicable for any black-box ML models and any distributional shifts bounded by the Hellinger distance, which is a type of f-divergence studied in the literature [47, 14, 7, 25, 15]. To demonstrate the effectiveness and tightness of our framework, we evaluate our fairness bounds on six real-world fairness related datasets [3, 2, 19, 48]. We show that our certificate is tight under different scenarios. In addition, we verify that our framework is flexible to integrate additional constraints on Q and evaluate the certified fairness with additional non-skewness constraints, with which our fairness certificate is tighter. Finally, as the first work on certifying fairness of an end-to-end ML model, we adapt existing distributional robustness bound [43] for comparison to provide more intuition. Note that directly integrating the fairness constraint to the existing distributional robustness bound is challenging, which is one of the main contributions for our framework. We show that with the fairness constraints and our effective solution, our bound is strictly tighter. Technical Contributions. In this work, we take the first attempt towards formulating and computing the certified fairness on an end-to-end ML model, which is trained on a given distribution. We make contributions on both theoretical and empirical fronts. 1. We formulate the certified fairness of an end-to-end ML model trained on a given distribution P as the worst-case upper bound of its prediction loss on a fairness constrained distribution Q, which is within bounded distributional distance with P. 2. We propose an effective fairness certification framework that simulates the problem as constrained optimization and solve it by decomposing the training and fairness constrained test distributions into subpopulations and proving the convexity of each sub-problem to solve it. 3. We evaluate our certified fairness on six real-world datasets to show its tightness and scalability. We also show that with additional distribution constraints on Q, our certification would be tighter. 4. We show that our bound is strictly tighter than adapted distributional robustness bound on Gaussian dataset due to the added fairness constraints and our effective optimization approach. Related Work Fairness in ML can be generally categorized into individual fairness and group fairness. Individual fairness guarantees that similar inputs should lead to similar outputs for a model and it is analyzed with optimization approaches [49, 33] and different types of relaxations [21]. Group fairness indicates to measure the independence between the sensitive features and model prediction, the separation which means that the sensitive features are statistically independent of model prediction given the target label, and the sufficiency which means that the sensitive features are statistically independent of the target label given the model prediction [27]. Different approaches are proposed to analyze group fairness via static analysis [46], interactive computation [41], and probabilistic approaches [1, 10, 6]. In addition, there is a line of work trying to certify the fair representation [39, 4, 36]. In [9], the authors have provided bounds for how group fairness transfers subject to bounded distribution shift. Our certified fairness differs from existing work from three perspectives: 1) we provide fairness certification considering the end-to-end model performance instead of the representation level, 2) we define and certify fairness based on a fairness constrained distribution which implies other fairness notions, and 3) our certified fairness can be computed for any black-box models trained on an arbitrary given data distribution. 2 Certified Fairness Based on Fairness Constrained Distribution In this section, we first introduce preliminaries, and then propose the definition of certified fairness based on a bounded fairness constrained distribution, which to the best of our knowledge is the first formal fairness certification on end-to-end model prediction. We also show that our proposed certified fairness relates to established fairness definitions in the literature. Notations. We consider the general classification setting: we denote by X and Y = [C] the feature space and labels, [C] := {1, 2, , C}. h : X ! |Y| represents a mapping function parameterized with 2 , and : |Y| Y ! R+ is a non-negative loss function such as cross-entropy loss. Within feature space X, we identify a sensitive or protected attribute Xs that takes a finite number of values: Xs := [S], i.e., for any X 2 X, Xs 2 [S]. Definition 1 (Base Rate). Given a distribution P supported over X Y, the base rate for sensitive attribute value s 2 [S] with respect to label y 2 [C] is b P s,y = Pr(X,Y ) P[Y = y | Xs = s]. Given the definition of base rate, we define a fair base rate distribution (in short as fair distribution). Definition 2 (Fair Base Rate Distribution). A distribution P supported over X Y is a fair base rate distribution if and only if for any label y 2 [C], the base rate b P s,y is equal across all s 2 [S], i.e., 8i 2 [S], 8j 2 [S], b P Remark. In the literature, the concepts of fairness are usually directly defined at the model prediction level, where the criterion is whether the model prediction is fair against individual attribute changes [39, 36, 50] or fair at population level [54]. In this work, to certify the fairness of model prediction, we define a fairness constrained distribution on which we will certify the model prediction (e.g., bound the prediction error), rather than relying on the empirical fairness evaluation. In particular, we first define the fairness constrained distribution through the lens of base rate parity, i.e., the probability of being any class should be independent of sensitive attribute values, and then define the certified fairness of a given model based on its performance on the fairness constrained distribution as we will show next. The choice of focusing on fair base rate may look restrictive but its definition aligns very well with the celebrated fairness definition Demographic Parity [51], which promotes that Pr[h (X) = 1|Xs = i] = Pr[h (X) = 1|Xs = j]. In this case, the prediction performance of h on Q with fair base rate will relate directly to Pr[h (X) = 1|Xs = i]. Secondly, under certain popular data generation process, the base rate sufficiently encodes the differences in distributions and a fair base rate will imply a homogeneous (therefore equal or fair") distribution over X, Y : consider when Pr(X|Y = y, Xs = i) is the same across different group Xs. Then Pr(X, Y |Xs = i) is simply a linear combination of basis distributions Pr(X|Y = y, Xs = i), and the difference between different groups joint distribution of X, Y is fully characterized by the difference in base rate Pr(Y = y|Xs). This assumption will greatly enable trackable analysis and is not an uncommon modeling choice in the recent discussion of fairness when distribution shifts [52, 37]. 2.1 Certified Fairness Now we are ready to define the fairness certification based on the optimized fairness constrained distribution. We define the certification under two data generation scenarios: general shifting and sensitive shifting. In particular, consider the data generative model Pr(Xo, Xs, Y ) = Pr(Y ) Pr(Xs|Y ) Pr(Xo|Y, Xs), where Xo and Xs represent the non-sensitive and sensitive features, respectively. If all three random variables on the RHS are allowed to change, we call it general shifting; if both Pr(Y ) and Pr(Xs|Y ) are allowed to change to ensure the fair base rate (Def. 2) while Pr(Xo|Y, Xs) is the same across different groups, we call it sensitive shifting. In Section 3 we will introduce our certification framework for both scenarios. Problem 1 (Certified Fairness with General Shifting). Given a training distribution P supported on X Y, a model h ( ) trained on P, and distribution distance bound > 0, we call 2 R a fairness certificate with general shifting, if upper bounds Q E(X,Y ) Q[ (h (X), Y )] s.t. dist(P, Q) , Q is a fair distribution, where dist( , ) is a predetermined distribution distance metric. In the above definition, we define the fairness certificate as the upper bound of the model s loss among all fair base rate distributions Q within a bounded distance from P. Besides the bounded distance constraint dist(P, Q) , there is no other constraint between P and Q so this satisfies general shifting . This bounded distance constraint, parameterized by a tunable parameter , ensures that the test distribution should not be too far away from the training. In practice, the model h may represent a DNN whose complex analytical forms would pose challenges for solving Problem 1. As a result, as we will show in Equation (2) we can query some statistics of h trained on P as constraints to characterize h , and thus compute the upper bound certificate. The feasible region of optimization problem 1 might be empty if the distance bound is too small, and thus we cannot provide fairness certification in this scenario, indicating that there is no nearby fair distribution and thus the fairness of the model trained on the highly unfaired" distribution is generally low. In other words, if the training distribution P is unfair (typical case) and there is no feasible fairness constrained distribution Q within a small distance to P, fairness cannot be certified. This definition follows the intuition of typical real-world scenarios: The real-world training dataset is usually biased due to the limitation in data curation and collection processes, which causes the model to be unfair. Thus, when the trained models are evaluated on the real-world fairness constrained test distribution or ideal fair distribution, we hope that the model does not encode the training bias which would lead to low test performance. That is to say, the model performance on fairness constrained distribution is indeed a witness of the model s intrinsic fairness. We can further constrain that the subpopulation of P and Q parameterized by Xs and Y does not change, which results in the following sensitive shifting fairness certification. Problem 2 (Certified Fairness with Sensitive Shifting). Under the same setting as Problem 1, we call a fairness certificate against sensitive shifting, if upper bounds Q E(X,Y ) Q[ (h (X), Y )] s.t. dist(P, Q) , Ps,y = Qs,y 8s 2 [S], y 2 [C], Q is a fair distribution, where Ps,y and Qs,y are the subpopulations of P and Q on the support {(X, Y ) : X 2 X, Xs = s, Y = y} respectively, and dist( , ) is a predetermined distribution distance metric. The definition adds an additional constraint between P and Q that each subpopulation, partitioned by the sensitive attribute Xs and label Y , does not change. This constraint corresponds to the scenario where the distribution shifting between training and test distributions only happens on the proportions of different sensitive attributes and labels, and within each subpopulation the shifting is negligible. In addition, to model the real-world test distribution, we may further request that the test distribution Q is not too skewed regarding the sensitive attribute Xs by adding constraint (1). We will show that this constraint can also be integrated into our fairness certification framework flexibly in Section 4.3. 8i 2 [S], 8j 2 [S], !!!! Pr (X,Y ) Q[Xs = i] Pr (X,Y ) Q[Xs = j] !!!! S. (1) Connections to Other Fairness Measurements. Though not explicitly stated, our goal of certifying the performance on a fair distribution Q relates to certifying established fairness definitions in the literature. Consider the following example: Suppose Problem 2 is feasible and returns a classifier h that achieves certified fairness per group and per label class l := Pr(X,Y ) Q[h (X) 6= Y |Y = y, Xs = i] on Q. We will then have the following proposition: Proposition 1. h achieves -Demographic Parity (DP) [51] and -Equalized Odds (EO) [18]: -DP: |Pr Q[h (X) = 1|Xs = i] Pr Q[h (X) = 1|Xs = j]| , 8i, j. -EO: |Pr Q[h (X) = 1|Y = y, Xs = i] Pr Q[h (X) = 1|Y = y, Xs = j]| , 8y, i, j. Remark. The detailed proof is omitted to appendix C.1. (1) When = 0, Proposition 1 can guarantee perfect DP and EO simultaneously. We achieve so because we evaluate with a fair distribution Q, where fair distribution stands for equalized base rate and according to [23, Theorem 1.1, page 5] both DP and EO are achievable for this fair distribution. This observation in fact motivated us to identify the fair distribution Q for the evaluation since it is this fair distribution that allows the fairness measures to hold at the same time. Therefore, another way to interpret our framework is: given a model, we provide a framework that certifies worst-case unfairness bound in the context where perfect fairness is achievable. Such a worse-case bound serves as the gap to a perfectly fair model and could be a good indicator of the model s fairness level. (2) In practice, is not necessarily zero. Therefore, Proposition 1 only provides an upper lower bound of DP and EO, namely -DP and -EO, instead of absolute DP and EO. The approximate fairness guarantee renders our results more general. Meanwhile, there is a higher flexiblity in simultaneously satisfying approximate fairness metrics (for example when DP = 0, but EO = , which is plausible for a proper range of epsilon, regardless of the distribution Q being fair or not). But again, similar to (1), -DP and -EO can be achieved at the same time easily since the test distribution satisfies base rate parity. The bounds in Proposition 1 are tight. Consider the distribution Q with binary classes and binary sensitive attributes (i.e., Y, Xs 2 {0, 1}). When the distribution Q and classifier h satisfy the conditions that Pr Q[h (X) 6= Y |Y = 0, Xs = 0] = , Pr Q[h (X) 6= Y |Y = 0, Xs = 1] = 0 and Pr Q[Y = 0] = 1, Pr Q[Y = 1] = 0, the bounds in Proposition 1 are tight. From Pr Q[Y = 0] = 1, Pr Q[Y = 1] = 0, we can observe that -DP is equivalent to -EO. From Pr Q[h (X) 6= Y |Y = 0, Xs = 0] = , Pr Q[h (X) 6= Y |Y = 0, Xs = 1] = 0 and Pr Q[h (X) 6= Y |Y = 0, Xs = i] = Pr Q[h (X) = 1|Y = 0, Xs = i] for i 2 {0, 1}, we know that -EO holds with tightness since |Pr Q[h (X) = 1|Y = 0, Xs = 0] Pr Q[h (X) = 1|Y = 0, Xs = 1]| = . To this point, we show that both bounds in Proposition 1 are tight. 3 Fairness Certification Framework We will introduce our fairness certification framework which efficiently computes the fairness certificate defined in Section 2.1. We first introduce our framework for sensitive shifting (Problem 2) which is less complex and shows our core methodology, then general shifting case (Problem 1). Our framework focuses on using the Hellinger distance to bound the distributional distance in Problems 1 and 2. The Hellinger distance H(P, Q) is defined in Def. 3 (in Appendix B.1). The Hellinger distance has some nice properties, e.g., H(P, Q) 2 [0, 1], and H(P, Q) = 0 if and only if P = Q and the maximum value of 1 is attained when P and Q have disjoint support. The Hellinger distance is a type of f-divergences which are widely studied in ML distributional robustness literature [47, 14] and in the context of distributionally robust optimization [7, 25, 15]. Also, using Hellinger distance enables our certification framework to generalize to total variation distance (or statistic distance) δ(P, Q)2 directly with the connection, H2(P, Q) δ(P, Q) 2H(P, Q) ([45], Equation 1). We leave the extension of our framework to other distance metrics as future work. 3.1 Core Idea: Subpopulation Decomposition The core idea in our framework is (finite) subpopulation decomposition. Consider a generic optimization problem for computing the loss upper bound on a constrained test distribution Q, given training distribution P and trained model h ( ), we first characterize model h ( ) based on some statistics, e.g., mean and variance for loss of the model: h ( ) satisfies ej(P, h ) vj, 1 j L. Then we characterize the properties (e.g., fair base rate) of the test distribution Q: gj(Q) uj, 1 j M. As a result, we can upper bound the loss of h ( ) on Q as the following optimization: Q, E(X,Y ) Q[ (h (X), Y )] s.t. H(P, Q) , ej(P, h ) vj 8j 2 [L], gj(Q) uj 8j 2 [M]. (2) Now we decompose the space Z := X Y to N partitions: Z := U Zi, where Z is the support of both P and Q. Then, we denote P conditioned on Zi by Pi and similarly Q conditioned on Zi by Qi. As a result, we can write P = P i2[N] pi Pi and Q = P i2[N] qi Qi. Since P is known, pi s are known. In contrast, both Qi and qi s are optimizable. Our key observation is that H(P, Q) () 1 2 ppiqi(1 H(Pi, Qi)2) 0 (3) 2δ(P, Q) = sup A2F |P(A) Q(A)| where F is a σ-algebra of subsets of the sample space . which leads to the following theorem. Theorem 1. The following constrained optimization upper bounds Equation (2): max Qi,qi, i, qi E(X,Y ) Qi[ (h (X), Y )] (4a) i ) 0, (4b) H(Pi, Qi) i 8i 2 [N], qi = 1, qi 0 8i 2 [N], i 0 8i 2 [N], (4c) j({Pi}i2[N], {pi}i2[N], h ) v0 j 8j 2 [L], g0 j({Qi}i2[N], {qi}i2[N]) u0 j 8j 2 [M], (4d) if ej(P, h ) vj implies e0 j({Pi}i2[N], {pi}i2[N], h ) v0 j for any j 2 [L], and gj(Q) uj implies g0 j({Qi}i2[N], {qi}i2[N]) u0 j for any j 2 [M]. In Problem 2, the challenge is to deal with the fair base rate constraint. Our core technique in Thm. 1 is subpopulation decomposition. At a high level, thanks to the disjoint support among different subpopulations, we get Equation (3). This equation gives us an equivalence relationship between distribution-level (namely, P and Q) distance constraint and subpopulation-level (namely, Pi s and Qi s) distance constraint. As a result, we can rewrite the original problem (2) using sub-population as decision variables as in Equation (4b) and then imposing the unity constraint (Equation (4c)) to get Thm. 1. We provide a detailed proof in Appendix C.2. Although the optimization problem (Equation (4)) may look more complicated then the original Equation (2), this optimization simplifies the challenging fair base rate constraint, allows us to upper bound each subpopulation loss E(X,Y ) Qi[ (h (X), Y )] individually, and hence makes the whole optimization tractable. 3.2 Certified Fairness with Sensitive Shifting For the sensitive shifting case, we instantiate Thm. 1 and obtain the following fairness certificate. Theorem 2. Given a distance bound > 0, the following constrained optimization, which is convex, when feasible, provides a tight fairness certificate for Problem 2: ksry Es,y, s.t. ry = 1, ks 0 8s 2 [S], ry 0 8y 2 [C], ps,yksry 0, where Es,y := E(X,Y ) Ps,y[ (h (X), Y )] and ps,y := Pr(X,Y )2P[Xs = s, Y = y] are constants. Proof sketch. We decompose distribution P and Q to Ps,y s and Qs,y s according to their sensitive attribute and label values. In sensitive shifting, Pr(Xo|Y, Xs) is fixed, i.e., Ps,y = Qs,y, which means E(X,Y ) Qs,y[ (h (X), Y )] = Es,y and s,y = H(Ps,y, Qs,y) = 0. We plug these properties into Thm. 1. Then, denoting qs,y to Pr(X,Y ) Q[Xs = s, Y = y], we can represent the fairness constraint in Def. 2 as qs0,y0 = for any s0 2 [S] and y0 2 [C]. Next, we parameterize qs,y with ksry. Such parameterization simplifies the fairness constraint and allow us to prove the convexity of the resulting optimization. Since all the constraints are encoded equivalently, the problem formulation provides a tight certification. Detailed proof in Appendix C.3. As Thm. 2 suggests, we can exploit the expectation information Es,y = E(X,Y ) Ps,y[ (h (X), Y )] and density information ps,y = Pr(X,Y ) P[Xs = s, Y = y] of each P s subpopulation to provide a tight fairness certificate in sensitive shifting. The convex optimization problem with (S +C) variables can be efficiently solved by off-the-shelf packages. 3.3 Certified Fairness with General Shifting For the general shifting case, we leverage Thm. 1 and the parameterization trick qs,y := ksry used in Thm. 2 to reduce Problem 1 to the following constrained optimization. Lemma 3.1. Given a distance bound > 0, the following constrained optimization, when feasible, provides a tight fairness certificate for Problem 1: max ks,ry,Q, s,y ksry E(X,Y ) Qs,y[ (h (X), Y )] (6a) ry = 1, ks 0 8s 2 [S], ry 0 8y 2 [C], (6b) ps,yksry(1 2 s,y) 1 2 (6c) H(Ps,y, Qs,y) s,y 8s 2 [S], y 2 [C], (6d) where ps,y := Pr(X,Y )2P[Xs = s, Y = y] is a fixed constant. The Ps,y and Qs,y are the subpopulations of P and Q on the support {(X, Y ) : X 2 X, Xs = s, Y = y} respectively. Proof sketch. We show that Equation (6b) ensures a parameterization of qs,y = Pr(X,Y )2Q[Xs = s, Y = y] that satisfies fairness constraints on Q. Then, leveraging Thm. 1 we prove that the constrained optimization provides a fairness certificate. Since all the constraints are either kept or equivalently encoded, this resulting certification is tight. Detailed proof in Appendix C.4. Now the main obstacle is to solve the non-convex optimization in Problem 6. Here, as the first step, we upper bound the loss of h ( ) within each shifted subpopulation Qs,y, i.e., upper bound E(X,Y ) Qs,y[ (h (X), Y )] in Equation (6a), by Thm. 4 in Appendix B.2 [47]. Then, we apply variable transformations to make some decision variables convex. For the remaining decision variables, we observe that they are non-convex but bounded. Hence, we propose the technique of grid-based sub-problem construction. Concretely, we divide the feasible region regarding non-convex variables into small grids and consider the optimization problem in each region individually. For each sub-problem, we relax the objective by pushing the values of non-convex variables to the boundary of the current grid and then solve the convex optimization sub-problems. Concretely, the following theorem states our computable certificate for Problem 1, with detailed proof in Appendix C.5. Theorem 3. If for any s 2 [S] and y 2 [Y ], H(Ps,y, Qs,y) γs,y and 0 sup(X,Y )2X Y (h (X), Y ) M, given a distance bound > 0, for any region granularity T 2 N+, the following expression provides a fairness certificate for Problem 1: = max {is2[T ]:s2[S]},{jy2[T ]:y2[C]} C , where (7) {[ks, ks]}S s=1, {[ry, ry]}C ksry (Es,y + Cs,y)+ + ksry (Es,y + Cs,y) xs,y(1 xs,y) Vs,y ksryxs,y(Cs,y)+ ksryxs,y(Cs,y) ps,yksryxs,y 1 2, (1 γ2 s,y)2 xs,y 1 8s 2 [S], y 2 [C], (8c) where ( )+ = max{ , 0}, ( ) = min{ , 0}; Es,y = E(X,Y ) Ps,y[ (h (X), Y )], Vs,y = V(X,Y ) Ps,y[ (h (X), Y )], ps,y = Pr(X,Y ) P[Xs = s, Y = y], Cs,y = M Es,y Vs,y M Es,y , s,y = 1 (1 + (M Es,y)2/Vs,y) 1 2 . Equation (7) only takes C s value when it is feasible, and each C queried by Equation (7) is a convex optimization. Implications. Thm. 3 provides a fairness certificate for Problem 1 under two assumptions: (1) The loss function is bounded (by M). This assumption holds for several typical losses such as 0-1 loss and JSD loss. (2) The distribution shift between training and test distribution within each subpopulation is bounded by γs,y, where γs,y is determined by the model s statistics on P. In practice, this additional distance bound assumption generally holds, since γs,y for common choices of . In Thm. 3, we exploit three types of statistics of h ( ) on P to compute the fairness certificates: the expectation Es,y = E(X,Y ) Ps,y[ (h (X), Y )], the variance Vs,y = V(X,Y ) Ps,y[ (h (X), Y )], and the density ps,y = Pr(X,Y ) P[Xs = s, Y = y], all of which are at the subpopulation level and a high-confidence estimation of them based on finite samples are tractable (Section 3.4). Using Thm. 3, after determining the region granularity T, we can provide a fairness certificate for Problem 1 by solving T SC convex optimization problems, each of which has SC decision variables. Note that the computation cost is independent of h , and therefore we can numerically compute the certificate for large DNN models used in practice. Specifically, when S = 2 (binary sensitive attribute) or C = 2 (binary classification) which is common in the fairness evaluation setting, we can construct the region for only one dimension k1 or r1, and use 1 k1 or 1 r1 for the other dimension. Thus, for the typical setting S = 2, C = 2, we only need to solve T 2 convex optimization problems. Note that for Problem 2, our certificate in Thm. 2 is tight, whereas for Problem 1, our certificate in Thm. 3 is not. This is because in Problem 1, extra distribution shift exists within each subpopulation, i.e., Pr(Xo|Y, Xs) changes from P to Q, and to bound such shift, we need to leverage Thm. 2.2 in [47] which has no tightness guarantee. Future work providing tighter bounds than [47] can be seamlessly incorporated into our framework to tighten our fairness certificate for Problem 1. 3.4 Dealing with Finite Sampling Error In Section 3.2 and Section 3.3, we present Thm. 2 and Thm. 3 that provide computable fairness certificates for sensitive shifting and general shifting scenarios respectively. In these theorems, we need to know the quantities related to the training distribution and trained P and model h ( ): Es,y = E (X,Y ) Ps,y[ (h (X), Y )], Vs,y = V (X,Y ) Ps,y[ (h (X), Y )], ps,y = Pr (X,Y ) P[Xs = s, Y = y]. (9) Section 3.3 further requires Cs,y and γs,y which are functions of Es,y and Vs,y. However, a practical challenge is that common training distributions do not have an analytical expression that allows us to precisely compute these quantities. Indeed, we only have access to a finite number of individually drawn samples, i.e., the training dataset, from P. Thus, we will provide high-confidence bounds for Es,y, Vs,y, and ps,y in Lemma D.1 (stated in Appendix D.1). For Thm. 2, we can replace Es,y in the objective by the upper bounds of Es,y and replace the concrete quantities of ps,y by interval constraints and the unit constraint P y ps,y = 1, which again yields a convex optimization that can be effectively solved. For Thm. 3, we compute the confidence intervals of Cs,y and s,y, then plug in either the lower bounds or the upper bounds to the objective (8a) based on the coefficient, and finally replace the concrete quantities of ps,y by interval constraints and the unit constraint P y ps,y = 1. The resulting optimization is proved to be convex and provides an upper bound for any possible values of Es,y, Vs,y, and ps,y within the confidence intervals. We defer the statement of Thm. 2 and Thm. 3 considering finite sampling error to Appendix D.2. To this point, we have presented our framework for computing high-confidence fairness certificates given access to model h ( ) and a finite number of samples drawn from P. 4 Experiments In this section, we evaluate the certified fairness under both sensitive shifting and general shifting scenarios on six real-world datasets. We observe that under the sensitive shifting, our certified fairness bound is tight (Section 4.1); while the bound is less tight under general shifting (Section 4.2) which depends on the tightness of generalization bounds within each subpopulation (details in Section 3.3). In addition, we show that our certification framework can flexibly integrate more constraints on Q, leading to a tighter fairness certification (Section 4.3). Finally, we compare our certified fairness bound with existing distributional robustness bound [43] (section 4.4), since both consider a shifted distribution while our bound is optimized with an additional fairness constraint which is challenging to be directly integrated to the existing distributional robustness optimization. We show that with the fairness constraint and our optimization approach, our bound is much tighter. Dataset & Model. We validate our certified fairness on six real-world datasets: Adult [3], Compas [2], Health [19], Lawschool [48], Crime [3], and German [3]. Details on the datasets and data processing steps are provided in Appendix E.1. Following the standard setup of fairness evaluation in the literature [39, 38, 31, 42], we consider the scenario that the sensitive attributes and labels take binary values. The Re LU network composed of 2 hidden layers of size 20 is used for all datasets. Hellinger Distance Hellinger Distance Hellinger Distance Hellinger Distance Figure 1: Certified fairness with sensitive shifting. Grey points are results on generated distributions (Q) and the black line is our fairness certificate based on Thm. 2. We observe that our fairness certificate is usually tight. Fairness Certification. We perform vanilla model training and then leverage our fairness certification framework to calculate the fairness certificate. Concretely, we input the trained model information on P and the framework would output the fairness certification for both sensitive shifting and general shifting scenarios following Thm. 2 and Thm. 3, respectively. Code, model, and all experimental data are publicly available at https://github.com/ AI-secure/Certified-Fairness. 4.1 Certified Fairness with Sensitive Shifting Generating Fair Distributions. To evaluate how well our certificates capture the fairness risk in practice, we compare our certification bound with the empirical loss evaluated on randomly generated 30, 000 fairness constrained distributions Q shifted from P. The detailed steps for generating fairness constrained distributions Q are provided in Appendix E.2. Under sensitive shifting, since each subpopulation divided by the sensitive attribute and label does not change (Section 2.1), we tune only the portion of each subpopulation qs,y satisfying the base rate fairness constraint, and then sample from each subpopulation of P individually according to the proportion qs,y. In this way, our protocols can generate distributions with different combinations of subpopulation portions. If the classifier is biased toward one subpopulation (i.e., it achieves high accuracy in the group but low accuracy in others), the worst-case accuracy on generated distribution is low since the portion of the biased subpopulation in the generated distribution can be low; in contrast, a fair classifier which performs uniformly well for each group can achieve high worst-case accuracy (high certified fairness). Therefore, we believe that our protocols can demonstrate real-world training distribution bias as well as reflect the model s unfairness and certification tightness in real-world scenarios. Results. We report the classification error (Error) and BCE loss as the evaluation metric. Figure 1 illustrates the certified fairness on Adult, Compas, Health, and Lawschool under sensitive shifting. More results on two relatively small datasets (Crime, German) are shown in Appendix E.5. From the results, we see that our certified fairness is tight in practice. 4.2 Certified Fairness with General Shifting In the general shifting scenario, we similarly randomly generate 30, 000 fair distributions Q shifted from P. Different from sensitive shifting, the distribution conditioned on sensitive attribute Xs and label Y can also change in this scenario. Therefore, we construct another distribution Q0 disjoint with P on non-sensitive attributes and mix P and Q0 in each subpopulation individually guided by mixing parameters satisfying fair base rate constraint. Detailed generation steps are given in Appendix E.2. Since the fairness certification for general shifting requires bounded loss, we select classification error (Error) and Jensen-Shannon loss (JSD Loss) as the evaluation metric. Figure 2 illustrates the certified fairness with classification error metric under general shifting. Results of JSD loss and more results on two relatively small datasets (Crime, German) are in Appendix E.5. 4.3 Certified Fairness with Additional Non-Skewness Constraints In Section 2.1, we discussed that to represent different real-world scenarios we can add more constraints such as Equation (1) to prevent the skewness of Q, which can be flexibly incorporated into our certificate framework. Concretely, for sensitive shifting, we only need to add one more Hellinger Distance Hellinger Distance Hellinger Distance Hellinger Distance Figure 2: Certified fairness with general shifting. Grey points are results on generated distributions (Q) and the black line is our fairness certificate based on Thm. 3. We observe that our fairness certificate is non-trivial. (a) Sentitive Shifting (b) General Shifting (c) Comparison with WRM Figure 3: Certified fairness with additional non-skewness constraints on Adult dataset is shown in (a) (b). s controls the skewness of Q (| Pr(X,Y ) Q[Xs = 0] Pr(X,Y ) Q[Xs = 1]| s). More analysis in Section 4.3. In (c), we compare our certified fairness bound with the distributional robustness bound [43]. More analysis in Section 4.4. box constraint3 0.5 s/2 ks 0.5 + s/2 where s is a parameter controlling the skewness of Q, which still guarantees convexity. For general shifting, we only need to modify the region partition step3, where we split [0.5 s/2, 0.5 + s/2] instead of [0, 1]. The certification results with additional constraints are in Figures 3(a) and 3(b), which suggests that if the added constraints are strict (i.e., smaller s), the bound is tighter. More constraints w.r.t. labels can also be handled by our framework and the corresponding results as well as results on more datasets are in Appendix E.6. 4.4 Comparison with Distributional Robustness Bound To the best of our knowledge, there is no existing work providing certified fairness on the end-to-end model performance. Thus, we try to compare our bound with the distributional robustness bound since both consider certain distribution shifts. However, it is challenging to directly integrate the fairness constraints into existing bounds. Therefore, we compare with the state-of-the-art distributional robustness certification WRM [43], which solves the similar optimization problem as ours except for the fairness constraint. For fair comparison, we construct a synthetic dataset following [43], on which there is a one-to-one correspondence between the Hellinger and Wasserstein distance used by WRM. We randomly select one dimension as the sensitive attribute. Since WRM has additional assumptions on smoothness of models and losses, we use JSD loss and a small ELU network with 2 hidden layers of size 4 and 2 following their setting. More implementation details are in Appendix E.4. Results in Figure 3(c) suggest that 1) our certified fairness bound is much tighter than WRM given the additional fairness distribution constraint and our optimization framework; 2) with additional fairness constraint, our certificate problem could be infeasible under very small distribution distances since the fairness constrained distribution Q does not exist near the skewed original distribution P; 3) with the fairness constraint, we provide non-trivial fairness certification bound even when the distribution shift is large. 5 Conclusion In this paper, we provide the first fairness certification on end-to-end model performance, based on a fairness constrained distribution which has bounded distribution distance from the training distribution. We show that our fairness certification has strong connections with existing fairness notions such as group parity, and we provide an effective framework to calculate the certification under different scenarios. We provide both theoretical and empirical analysis of our fairness certification. Acknowledgements. MK, LL, and BL are partially supported by the NSF grant No.1910100, NSF CNS No.2046726, C3 AI, and the Alfred P. Sloan Foundation. YL is partially supported by the NSF grants IIS-2143895 and IIS-2040800. 3Note that such modification is only viable when sentive attributes take binary values, which is the typical scenario in the literature of fairness evaluation [39, 38, 31, 42]. [1] Aws Albarghouthi, Loris D Antoni, Samuel Drews, and Aditya V Nori. Fairsquare: proba- bilistic verification of program fairness. Proceedings of the ACM on Programming Languages, 1(OOPSLA):1 30, 2017. [2] Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine bias. In Ethics of Data and Analytics, pages 254 264. Auerbach Publications, 2016. [3] Arthur Asuncion and David Newman. Uci machine learning repository, 2007. [4] Mislav Balunovi c, Anian Ruoss, and Martin Vechev. Fair normalizing flows. ar Xiv preprint ar Xiv:2106.05937, 2021. [5] Solon Barocas and Andrew D Selbst. Big data s disparate impact. Calif. L. Rev., 104:671, 2016. [6] Osbert Bastani, Xin Zhang, and Armando Solar-Lezama. Probabilistic verification of fair- ness properties via concentration. Proceedings of the ACM on Programming Languages, 3(OOPSLA):1 27, 2019. [7] Aharon Ben-Tal, Dick Den Hertog, Anja De Waegenaere, Bertrand Melenberg, and Gijs Rennen. Robust solutions of optimization problems affected by uncertain probabilities. Management Science, 59(2):341 357, 2013. [8] Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. Fairness in criminal justice risk assessments: The state of the art. Sociological Methods & Research, 50(1):3 44, 2021. [9] Yatong Chen, Reilly Raab, Jialu Wang, and Yang Liu. Fairness transferability subject to bounded distribution shift. Advances in Neural Information Processing Systems, 2022. [10] Yoo Jung Choi, Meihua Dang, and Guy Van den Broeck. Group fairness by probabilistic modeling with latent fair decisions. ar Xiv preprint ar Xiv:2009.09031, 2020. [11] Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5(2):153 163, 2017. [12] Elliot Creager, David Madras, Jörn-Henrik Jacobsen, Marissa Weis, Kevin Swersky, Toniann Pitassi, and Richard Zemel. Flexibly fair representation learning by disentanglement. In International conference on machine learning, pages 1436 1445. PMLR, 2019. [13] Amit Datta, Michael Carl Tschantz, and Anupam Datta. Automated experiments on ad privacy settings. Proceedings on privacy enhancing technologies, 2015(1):92 112, 2015. [14] John Duchi and Hongseok Namkoong. Variance-based regularization with convex objectives. The Journal of Machine Learning Research, 20(1):2450 2504, 2019. [15] John C Duchi, Peter W Glynn, and Hongseok Namkoong. Statistics of robust optimization: A generalized empirical likelihood approach. Mathematics of Operations Research, 2021. [16] Harrison Edwards and Amos Storkey. Censoring representations with an adversary. ar Xiv preprint ar Xiv:1511.05897, 2015. [17] Jeffrey A Gliner. Reviewing qualitative research: Proposed criteria for fairness and rigor. The Occupational Therapy Journal of Research, 14(2):78 92, 1994. [18] Moritz Hardt, Eric Price, and Nati Srebro. Equality of opportunity in supervised learning. Advances in neural information processing systems, 29, 2016. [19] Kaggle Inc. Heritage health prize kaggle. https://www.kaggle.com/c/hhp, 2022. [20] Xisen Jin, Francesco Barbieri, Brendan Kennedy, Aida Mostafazadeh Davani, Leonardo Neves, and Xiang Ren. On transferability of bias mitigation effects in language model fine-tuning. ar Xiv preprint ar Xiv:2010.12864, 2020. [21] Philips George John, Deepak Vijaykeerthy, and Diptikalyan Saha. Verifying individual fairness in machine learning models. In Conference on Uncertainty in Artificial Intelligence, pages 749 758. PMLR, 2020. [22] Thomas Kehrenberg, Myles Bartlett, Oliver Thomas, and Novi Quadrianto. Null-sampling for interpretable and fair representations. In European Conference on Computer Vision, pages 565 580. Springer, 2020. [23] Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-offs in the fair determination of risk scores. ar Xiv preprint ar Xiv:1609.05807, 2016. [24] Himabindu Lakkaraju, Jon Kleinberg, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. The selective labels problem: Evaluating algorithmic predictions in the presence of unobservables. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 275 284, 2017. [25] Henry Lam. Robust sensitivity analysis for stochastic systems. Mathematics of Operations Research, 41(4):1248 1275, 2016. [26] Jiachun Liao, Chong Huang, Peter Kairouz, and Lalitha Sankar. Learning generative adversarial representations (gap) under fairness and censoring constraints. ar Xiv preprint ar Xiv:1910.00411, 1, 2019. [27] Lydia T Liu, Max Simchowitz, and Moritz Hardt. The implicit fairness criterion of unconstrained learning. In International Conference on Machine Learning, pages 4051 4060. PMLR, 2019. [28] Francesco Locatello, Gabriele Abbati, Thomas Rainforth, Stefan Bauer, Bernhard Schölkopf, and Olivier Bachem. On the fairness of disentangled representations. Advances in Neural Information Processing Systems, 32, 2019. [29] Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, and Richard Zemel. The variational fair autoencoder. ar Xiv preprint ar Xiv:1511.00830, 2015. [30] David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. Learning adversarially fair and transferable representations. In International Conference on Machine Learning, pages 3384 3393. PMLR, 2018. [31] Subha Maity, Debarghya Mukherjee, Mikhail Yurochkin, and Yuekai Sun. Does enforcing fairness mitigate biases caused by subpopulation shift? In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 25773 25784. Curran Associates, Inc., 2021. [32] Andreas Maurer and Massimiliano Pontil. Empirical bernstein bounds and sample variance penalization. ar Xiv preprint ar Xiv:0907.3740, 2009. [33] Daniel Mc Namara, Cheng Soon Ong, and Robert C. Williamson. Costs and benefits of fair representation learning. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, AIES 19, page 263 270, New York, NY, USA, 2019. Association for Computing Machinery. [34] Luca Oneto, Michele Donini, Massimiliano Pontil, and Andreas Maurer. Learning fair and trans- ferable representations with theoretical guarantees. In 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), pages 30 39, 2020. [35] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019. [36] Momchil Peychev, Anian Ruoss, Mislav Balunovi c, Maximilian Baader, and Martin Vechev. Latent space smoothing for individually fair representations. ar Xiv preprint ar Xiv:2111.13650, 2021. [37] Reilly Raab and Yang Liu. Unintended selection: Persistent qualification rate disparities and interventions. Advances in Neural Information Processing Systems, 34, 2021. [38] Yuji Roh, Kangwook Lee, Steven Whang, and Changho Suh. Sample selection for fair and robust training. Advances in Neural Information Processing Systems, 34, 2021. [39] Anian Ruoss, Mislav Balunovic, Marc Fischer, and Martin Vechev. Learning certified individu- ally fair representations. Advances in Neural Information Processing Systems, 33:7584 7596, 2020. [40] Mhd Hasan Sarhan, Nassir Navab, Abouzar Eslami, and Shadi Albarqouni. Fairness by learning orthogonal disentangled representations. In European Conference on Computer Vision, pages 746 761. Springer, 2020. [41] Shahar Segal, Yossi Adi, Benny Pinkas, Carsten Baum, Chaya Ganesh, and Joseph Keshet. Fairness in the eyes of the data: Certifying machine-learning models. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pages 926 935, 2021. [42] Shubhanshu Shekhar, Greg Fields, Mohammad Ghavamzadeh, and Tara Javidi. Adaptive sampling for minimax fair classification. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 24535 24544. Curran Associates, Inc., 2021. [43] Aman Sinha, Hongseok Namkoong, Riccardo Volpi, and John Duchi. Certifying some dis- tributional robustness with principled adversarial training. ar Xiv preprint ar Xiv:1710.10571, 2017. [44] Jiaming Song, Pratyusha Kalluri, Aditya Grover, Shengjia Zhao, and Stefano Ermon. Learning controllable fair representations. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 2164 2173. PMLR, 2019. [45] Ton Steerneman. On the total variation and hellinger distance between signed measures; an application to product measures. Proceedings of the American Mathematical Society, 88(4):684 688, 1983. [46] Caterina Urban, Maria Christakis, Valentin Wüstholz, and Fuyuan Zhang. Perfectly parallel fairness certification of neural networks. Proceedings of the ACM on Programming Languages, 4(OOPSLA):1 30, 2020. [47] Maurice Weber, Linyi Li, Boxin Wang, Zhikuan Zhao, Bo Li, and Ce Zhang. Certifying out-of-domain generalization for blackbox functions. In International Conference on Machine Learning. PMLR, 2022. [48] Linda F Wightman. Lsac national longitudinal bar passage study. lsac research report series. Law School Admission Council, 1998. [49] Robin Winter, Floriane Montanari, Frank Noé, and Djork-Arné Clevert. Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations. Chemical science, 10(6):1692 1701, 2019. [50] Samuel Yeom and Matt Fredrikson. Individual fairness revisited: Transferring techniques from adversarial robustness. ar Xiv preprint ar Xiv:2002.07738, 2020. [51] Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representa- tions. In International conference on machine learning, pages 325 333. PMLR, 2013. [52] Xueru Zhang, Ruibo Tu, Yang Liu, Mingyan Liu, Hedvig Kjellstrom, Kun Zhang, and Cheng Zhang. How do fair decisions fare in long-term qualification? Advances in Neural Information Processing Systems, 33:18457 18469, 2020. [53] Han Zhao, Amanda Coston, Tameem Adel, and Geoffrey J Gordon. Conditional learning of fair representations. ar Xiv preprint ar Xiv:1910.07162, 2019. [54] Han Zhao and Geoff Gordon. Inherent tradeoffs in learning fair representations. Advances in neural information processing systems, 32, 2019. 1. For all authors... (a) Do the main claims made in the abstract and introduction accurately reflect the paper s contributions and scope? [Yes] (b) Did you describe the limitations of your work? [Yes] We clearly specify the assumptions and application scopes, if exists, of all our results. (c) Did you discuss any potential negative societal impacts of your work? [Yes] We discuss broader impacts in Appendix A. (d) Have you read the ethics review guidelines and ensured that your paper conforms to them? [Yes] 2. If you are including theoretical results... (a) Did you state the full set of assumptions of all theoretical results? [Yes] (b) Did you include complete proofs of all theoretical results? [Yes] We include complete proofs for all results in the appendix. 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experi- mental results (either in the supplemental material or as a URL)? [Yes] (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] We include them in Appendix E.3 (c) Did you report error bars (e.g., with respect to the random seed after running experi- ments multiple times)? [Yes] Indeed, we rigorously compute the confidence intervals due to the finite sampling (see Section 3.4), and guarantee that all fairness certificates reported in the paper holds with probability 90%. (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] We provide them in Appendix E.3. 4. If you are using existing assets (e.g., code, data, models) or curating/releasing new assets... (a) If your work uses existing assets, did you cite the creators? [Yes] (b) Did you mention the license of the assets? [Yes] (c) Did you include any new assets either in the supplemental material or as a URL? [Yes] (d) Did you discuss whether and how consent was obtained from people whose data you re using/curating? [No] We only use public data. (e) Did you discuss whether the data you are using/curating contains personally identifiable information or offensive content? [No] We only use public data. 5. If you used crowdsourcing or conducted research with human subjects... (a) Did you include the full text of instructions given to participants and screenshots, if applicable? [N/A] (b) Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? [N/A] (c) Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation? [N/A]