# learning_certified_individually_fair_representations__009d8ad6.pdf

Learning Certiﬁed Individually Fair Representations

Anian Ruoss, Mislav Balunovi c, Marc Fischer, Martin Vechev Department of Computer Science ETH Zurich anruoss@ethz.ch {mislav.balunovic, marc.fischer, martin.vechev}@inf.ethz.ch

Fair representation learning provides an effective way of enforcing fairness constraints without compromising utility for downstream users. A desirable family of such fairness constraints, each requiring similar treatment for similar individuals, is known as individual fairness. In this work, we introduce the ﬁrst method that enables data consumers to obtain certiﬁcates of individual fairness for existing and new data points. The key idea is to map similar individuals to close latent representations and leverage this latent proximity to certify individual fairness. That is, our method enables the data producer to learn and certify a representation where for a data point all similar individuals are at ℓ -distance at most ϵ, thus allowing data consumers to certify individual fairness by proving ϵ-robustness of their classiﬁer. Our experimental evaluation on ﬁve real-world datasets and several fairness constraints demonstrates the expressivity and scalability of our approach.

1 Introduction

The increased use of machine learning in sensitive domains (e.g., crime risk assessment [1], ad targeting [2], and credit scoring [3]) has raised concerns that methods learning from data can reinforce human bias, discriminate, and lack fairness [4 6]. Moreover, data owners often face the challenge that their data will be used in (unknown) downstream applications, potentially indifferent to fairness concerns [7]. To address this challenge, the paradigm of learning fair representations has emerged as a promising approach to obtain data representations that preserve fairness while maintaining utility for a variety of downstream tasks [8, 9]. The recent work of Mc Namara et al. [10] has formalized this setting by partitioning the landscape into: a data regulator who deﬁnes fairness for the particular task at hand, a data producer who processes sensitive user data and transforms it into another representation, and a data consumer who performs predictions based on the new representation.

In this setting, a machine learning model M : Rn Ro is composed of two parts: an encoder fθ : Rn Rk, provided by the data producer, and a classiﬁer hψ : Rk Ro, provided by the data consumer, with Rk denoting the latent space. The data regulator selects a deﬁnition of fairness that the model M should satisfy. Most work so far has explored two main families of fairness deﬁnitions [11]: statistical and individual. Statistical notions deﬁne speciﬁc groups in the population and require that particular statistics, computed based on model decisions, should be equal for all groups. Popular notions of this kind include demographic parity [12] and equalized odds [13]. While these notions do not require any assumptions on the data and are easy to certify, they offer no guarantees for individuals or other subgroups in the population [14]. In contrast, individual notions of fairness [12] are desirable as they explicitly require that similar individuals in the population are treated similarly.

Key challenge A central challenge then is to enforce individual fairness in the setting described above. That is, to both learn an individually fair representation and to certify that individual fairness

34th Conference on Neural Information Processing Systems (Neur IPS 2020), Vancouver, Canada.

is actually satisﬁed across the end-to-end model M without compromising the independence of the data producer and the data consumer.

This work In this work, we propose the ﬁrst method for addressing the above challenge. At a high level, our approach is based on the observation that recent advances in training machine learning models with logical constraints [15] together with new methods for proving that constraints are satisﬁed [16] open the possibility for learning certiﬁed individually fair models.

Concretely, we identify a practical class of individual fairness deﬁnitions captured via declarative fairness constraints. Such a fairness constraint is a binary similarity function φ: Rn Rn {0, 1}, where φ(x, x ) evaluates to 1 if and only if two individuals x and x are similar (e.g., if all their attributes except for race are the same). By working with declarative constraints, data regulators can now express interpretable, domain-speciﬁc notions of similarity, a problem known to be challenging [8, 17 21].

Given the fairness constraint φ, we can now train an individually fair representation and use it to obtain a certiﬁcate of individual fairness for the end-to-end model. For training, the data producer can employ our framework to learn an encoder fθ with the goal that two individuals satisfying φ should be mapped close together in ℓ -distance in latent space. As a consequence, individual fairness can then be certiﬁed for a data point in two steps: ﬁrst, the data producer computes a convex relaxation of the latent set of similar individuals and passes it to the data consumer. Second, the data consumer certiﬁes individual fairness by proving local robustness within the convex relaxation. Importantly, the data consumer can now perform modular certiﬁcation: it does not need to know the fairness constraint φ and the concrete data point x.

Our experimental evaluation on several datasets and fairness constraints shows a substantial increase (up to 72.6%) of certiﬁed individuals (unseen during training) when compared to standard representation learning.

Main contributions Our key contributions are:

A practical family of similarity notions for individual fairness deﬁned via interpretable logical constraints.

A method to learn individually fair representations (deﬁned in an expressive logical fragment), which comes with provable certiﬁcates.

An end-to-end implementation of our method in an open-source tool called LCIFR, together with an extensive evaluation on several datasets, constraints, and architectures. We make LCIFR publicly available at https://github.com/eth-sri/lcifr.

This section provides a high-level overview of our approach, with the overall ﬂow shown in Figure 1.

As introduced earlier, our setting consists of three parties. The ﬁrst party is a data regulator who deﬁnes similarity measures for the input and the output denoted as φ and µ, respectively. The properties φ and µ are problem-speciﬁc and can be expressed in a rich logical fragment which we describe later in Section 4. For example, for classiﬁcation tasks µ could denote equal classiﬁcation (i.e., µ(M(x), M(x )) = 1 M(x) = M(x )) or classifying M(x) and M(x ) to the same label group; for regressions tasks µ could evaluate to 1 if M(x) M(x ) 0.1 and 0 otherwise. We focus on equal classiﬁcation in the classiﬁcation setting for the remainder of this work.

The goal of treating similar individuals as similarly as possible can then be formulated as ﬁnding a classiﬁer M which maximizes

Ex D [ x Rn : φ(x, x ) = µ(M(x), M(x ))] , (1)

where D is the underlying data distribution (we assume a logical expression evaluates to 1 if it is true and to 0 otherwise). As usual in machine learning, we approximate this quantity with the empirical risk, by computing the percentage of individuals x from the test set for which we can certify that

x Sφ(x) : µ(M(x), M(x )), (2)

Data Producer

fθ (Sφ (x)) B (z, ϵ)

Data Consumer

hψ hψ (B (z, ϵ))

Figure 1: Conceptual overview of our framework. The left side shows the component corresponding to the data producer who learns an encoder fθ which maps the entire set of individuals Sφ(x) that are similar to individual x, according to the similarity notion φ, to points near fθ(x) in the latent space. The data producer then computes an ℓ -bounding box B around the latent set of similar individuals fθ(Sφ(x)) with center z = fθ(x) and radius ϵ and passes it to the data consumer. The data consumer receives the latent representation z and radius ϵ, trains a classiﬁer hψ, and certiﬁes that the entire ℓ -ball centered around z with radius ϵ is classiﬁed the same (green color shows fair output region).

where Sφ(x) = {x Rn | φ(x, x )} denotes the set of all points similar to x. Note that Sφ(x) generally contains an inﬁnite number of individuals. In Figure 1, Sφ(x) is represented as a brown shape, and x is shown as a single point inside of Sφ(x).

The key idea of our approach is to train the encoder fθ to map point x and all points x Sφ(x) close to one another in the latent space with respect to ℓ -distance, speciﬁed as φ (x, x ) = ||fθ(x ) fθ(x)|| δ, (3) where δ is a tunable parameter of the method, determined in agreement between producer and consumer (we could also use another ℓp-norm). If the encoder indeed satisﬁes Equation (3), the data consumer, potentially indifferent to the fairness constraint, can then train a classiﬁer hψ independently of the similarity notion φ. Namely, the data consumer only has to train hψ to be robust to perturbations up to δ in ℓ -norm, which can be solved via standard min-max optimization, discussed in Section 4.

We now explain our end-to-end inference with provable certiﬁcates for encoder fθ and classiﬁer hψ.

Processing the producer model Given a data point x, we ﬁrst propagate both x and its set of similar points Sφ(x) through the encoder, as shown in Figure 1, to obtain the latent representations z = fθ(x) and fθ(Sφ(x)). As Equation (3) may not hold for the particular x and δ due to the stochastic nature of training, we compute the smallest ℓ -bounding box of radius ϵ such that fθ(Sφ(x)) B (z, ϵ) := {z | z z ϵ}. This ℓ -bounding box with center z and radius ϵ is shown as orange in Figure 1.

Processing the consumer model Next, we provide the latent representation z and the radius ϵ to the data consumer. The data consumer then knows that all points similar to x are in the ℓ -ball of radius ϵ, but does not need to know the similarity constraint φ nor the particular shape fθ(Sφ(x)). The key observation is the following: if the data consumer can prove its classiﬁer hψ is robust to ℓ -perturbations up to ϵ around z, then the end-to-end classiﬁer M = hψ fθ satisﬁes individual fairness at x with respect to the similarity rule φ imposed by the data regulator.

There are two central technical challenges we need to address. The ﬁrst challenge is how to train an encoder to satisfy Equation (3), while not making any domain-speciﬁc assumptions about the point x or the similarity constraint φ. The second challenge is how to provide a certiﬁcate of individual fairness for x, which requires both computing the smallest radius ϵ such that fθ(Sφ(x)) B (z, ϵ), as well as certifying ℓ -robustness of the classiﬁer hψ.

To train an encoder, we build on Fischer et al. [15], who provide a translation from logical constraints φ to a differentiable loss function. The training of the encoder network can then be formulated as a min-max optimization problem, which alternates between (i) searching for counterexamples x Sφ(x) that violate Equation (3), and (ii) training fθ on the counterexamples. We employ gradient descent to minimize a joint objective composed of a classiﬁcation loss and the constraint loss obtained from translating Equation (3). Once no more counterexamples are found, we can conclude the encoder empirically satisﬁes Equation (3). We discuss the detailed procedure in Section 4.

We compute a certiﬁcate for individual fairness in two steps. First, to provide guarantees on the latent representation generated by encoder fθ, we solve the optimization problem

ϵ = max x Sφ(x) ||z fθ(x )|| .

Recall that the set Sφ(x) generally contains an inﬁnite number of individuals x , and thus this optimization problem cannot be solved by simple enumeration. In Section 5 we show how this optimization problem can be encoded as a mixed-integer linear program (MILP) and solved using off-the-shelf MILP solvers. After obtaining ϵ, we certify local robustness of the classiﬁer hψ around z = fθ(x) by proving (using MILP) that for each z where ||z z|| ϵ, the classiﬁcation results of hψ(z ) and hψ(z) coincide. Altogether, this implies the overall model M = hψ fθ satisﬁes individual fairness for x. Finally, note that since the bounding box B (z, ϵ) is a convex relaxation of the latent set of similar individuals fθ(Sφ(x)), the number of individuals for which we can obtain a certiﬁcate is generally lower than the number of individuals that actually satisfy Equation (2).

3 Related Work

We now discuss work most closely related to ours.

Learning fair representations There has been a long line of work on learning fair representations. Zemel et al. [8] introduced a method to learn fair representations that ensure group fairness and protect sensitive attributes. Such representations, invariant to sensitive attributes, can also be learned using variational autoencoders [22], adversarial learning [9, 23], or disentanglement [24]. Zemel et al. [8] and Madras et al. [9] also consider the problem of fair transfer learning, which we investigate in our work. Song et al. [25] used duality to unify some of the mentioned work under the same framework. Mc Namara et al. [26] derived theoretical guarantees for learning fair representations. Their guarantees require statistics of the data distribution and consist of probabilistic bounds for individual and group fairness: for a new data point from the same distribution, the constraint will hold with a certain probability. In contrast, we obtain a certiﬁcate for a ﬁxed data point, which ensures that the fairness constraints holds (independent of the other data points).

Most work so far focuses on learning representations that satisfy statistical notions of fairness, but there has also been some recent work on learning individually fair representations. These works learn fair representations with alternative deﬁnitions of individual fairness based on Wasserstein distance [19, 27], fairness graphs [18], or distance measures [17]. A different line of work has investigated leaning the fairness metric from data [19 21, 28]. In contrast, we deﬁne individual fairness via interpretable logical constraints. Finally, recent works [29 31] studied the task of learning representations that are robust to (adversarial) perturbations, i.e., all similar individuals in our case, however not in the context of fairness. Many of the above methods for learning (individually) fair representations employ nonlinear components [8, 17], graphs [18], or sampling [22, 24] and can thus not be efﬁciently certiﬁed, unlike the neural networks that we consider in our work.

While we focus on learning fair representations, other lines of work have investigated individual fairness in the context of clustering [32, 33], causal inference [34 37], composition of individually fair classiﬁers [38, 39], and differential privacy (DP) [12, 40, 41]. The close relationship between individual fairness and DP has been discussed in previous work (see, e.g., [12]). However, DP crucially differs from our work in that it obtains a probabilistic fairness guarantee, similar to Mc Namara et al. [26] mentioned above, whereas we compute absolute fairness guarantees for every data point. The most natural way to employ DP for a representation learning approach, like LCIFR, would be to make the data producer model fθ differentially private for a neighborhood that encodes Sφ, by adding noise inside the computation of fθ. If one can achieve DP for the neighborhood Sφ (a non-trivial challenge), the data consumer model can then be seen as a post-processing step, which with the right robustness certiﬁcate yields a probabilistic guarantee of Equation (2).

Certiﬁcation of neural networks Certiﬁcation of neural networks has become an effective way to prove that these models are robust to adversarial perturbations. Certiﬁcation approaches are typically based on SMT solving [42], abstract interpretation [43], mixed-integer linear programming [16], or linear relaxations [44 46]. Three concurrent works also investigate the certiﬁcation of individual fairness [47 49]. However, these works try to certify a global individual fairness property, i.e., for a given classiﬁer there exists an input which is not treated individually fair, whereas we focus on local

individual fairness, i.e., for every concrete data point we certify whether the model is individually fair or not. Moreover, these works consider similarity notions that are less expressive than and can be captured by our logical constraints. Finally, they only consider certifying fairness of existing models, while we also focus on learning fair representations.

In our work, we investigate modular certiﬁcation. For the data producer, we need to propagate the input shape through both logical operators (e.g., conjunctions and disjunctions) and the neural network. While in our work, we use a MILP encoding, other approaches could also be applied by crafting specialized convex relaxations. For example, if our approach is applied to learn individually fair representations of complex data such as images, where encoder networks are usually larger than in tabular data that we consider here, one could leverage the certiﬁcation framework from Singh et al. [46]. On the data consumer side, any of the above approaches could be applied as they are all designed to certify ℓ -robustness which we consider in our work.

4 Learning Individually Fair Representations

We now present our method for learning individually fair representations with respect to the property φ. To illustrate our method, we consider the case where the regulator proposes the similarity constraint:

φ(x, x ) :=

i Cat\{race} (xi = x i)

j Num |xj x j| α.

According to φ, individual x is considered similar to x if: (i) all categorical attributes except for race are equal to those of x, and (ii) all numerical attributes (e.g., income) of x and x differ by at most α. Thus, under φ, the similarity of individuals x and x does not depend on their respective races. Note that since φ is binary x and x are either considered similar or not which is in line with the typical use-case in classiﬁcation where two individuals are either classiﬁed to the same label or not. Moreover, such logical formulas (of reasonable size) are generally considered humanly readable and are thus investigated in the interpretable machine learning community (e.g., for decision trees [50]).

Enforcing individual fairness To learn a representation that satisﬁes φ, we build on the recent work DL2 [15]. Concretely, we aim to enforce the following constraint on the encoder fθ used by the data producer: φ(x, x ) = fθ (x) fθ (x ) δ, (4) where δ is a tunable constant, determined in agreement between the data producer and the data consumer. With DL2, this implication can be translated into a non-negative, differentiable loss L (φ) such that L (φ) (x, x ) = 0 if and only if the implication is satisﬁed. Here, we denote ω (x, x ) := fθ (x) fθ (x ) δ and translate the constraint in Equation (4) as

L (φ = ω) = L ( φ ω) = L ( φ) L (ω) ,

where negations are propagated through constraints via standard logic. Moreover, we have

L (ω) (x, x ) = L ( fθ (x) fθ (x ) δ)

= max { fθ (x) fθ (x ) δ, 0} .

Similarly, conjunctions L (φ φ ) would be translated as L (φ ) + L (φ ), and we refer interested readers to the original work [15] for further details on the translation.

Using this differentiable loss, the data producer can now approximate the problem of ﬁnding an encoder fθ that maximizes the probability that the constraint φ = ω is satisﬁed for all individuals via the following min-max optimization problem (deﬁned in two steps): First, we ﬁnd a counterexample

x = arg min x Sφ(x) L ( (φ = ω)) (x, x ) ,

where Sφ (x) = {x Rn | φ (x, x )} denotes the set of all individuals similar to x according to φ. Then, in the second step, we ﬁnd the parameters θ that minimize the constraint loss at x :

arg min θ Ex D [L (φ = ω) (x, x )] .

Note that in the outer loop, we are ﬁnding parameters θ that minimize the loss of the original constraint from Equation (4), while in the inner loop, we are ﬁnding a counterexample x by minimizing the

loss corresponding to the negation of this constraint. We use Adam [51] for optimizing the outer problem. For the inner minimization problem, Fischer et al. [15] further reﬁne the loss by excluding constraints that have closed-form analytical solutions, e.g., max { x x δ, 0} which can be minimized by projecting x onto the ℓ -ball of radius δ around x. The resulting objective is thus

x = arg min x C L (ρ) (x, x ) ,

where C is the convex set and ρ is (φ = ω) without the respective constraints. It has been shown [52] that such an objective can be efﬁciently solved with Projected Gradient Descent (PGD).

DL2 does not provide a meaningful translation for categorical constraints, which are essential to fairness, and we derive a relaxation method for training with categorical constraints in Appendix A.

Predictive utility of the representation Recall that our method is modular in the sense that the data producer and the data consumer models are learned separately. Thus, the data producer needs to ensure that the latent representation remains informative for downstream applications (represented by the data consumer model hψ). To that end, the data producer additionally trains a classiﬁer q: Rk Ro that tries to predict the target label y from the latent representation z = fθ(x). Thus, the data producer seeks to jointly train the encoder fθ and classiﬁer q to minimize the combined objective

arg min fθ,q Ex,y [LC (q (fθ (x)) , y) + γLF (x, fθ(x))] , (5)

where LC is any suitable classiﬁcation loss (e.g., cross-entropy), LF is the fairness constraint loss obtained via DL2, and the hyperparameter γ balances the two objectives. We empirically investigate impact of the loss balancing factor γ on the accuracy-fairness tradeoff in Appendix B.

Training robust classiﬁer hψ We assume the encoder fθ has been trained to maintain predictive utility and satisfy Equation (4). Recall that, given this assumption, the data consumer who wants to ensure her classiﬁer hψ is individually fair, only needs to ensure local robustness of the classiﬁer for perturbations up to δ in l -norm. This is a standard problem in robust machine learning [53] and can be solved via min-max optimization, recently found to work well for neural network models [52]:

min ψ Ez Dz

max π [ δ] LC (hψ(z + π), y) ,

where Dz is the latent distribution obtained by sampling from D and applying the encoder fθ, and LC is a suitable classiﬁcation loss. The optimization alternates between: (i) trying to ﬁnd π [ δ] that maximizes LC (hψ(z + π), y), and (ii) updating ψ to minimize LC (hψ(z + π), y) under such worst-case perturbations π. While the theoretical necessity of training for local robustness is clear, we empirically investigate its effect on accuracy and certiﬁable fairness in Appendix C.

5 Certifying Individual Fairness

In this section we discuss how the data consumer can compute a certiﬁcate of individual fairness for its model hψ trained on the latent representation (as described in Section 4 above). We split this process into two steps: (i) the data producer propagates a data point x through the encoder to obtain z = fθ (x) and computes the radius ϵ of the smallest ℓ -ball around z that contains the latent representations of all similar individuals fθ (Sφ (x)), i.e., fθ (Sφ (x)) B (z, ϵ), and (ii) the data consumer checks whether all points in the latent space that differ by at most ϵ from z are classiﬁed to the same label, i.e., hψ (z) = hψ (z ) for all z B (z, ϵ). We now discuss both of these steps.

5.1 Certifying Latent Similarity

To compute the minimum ϵ which ensures that fθ (Sφ (x)) B (z, ϵ), the data producer models the set of similar individuals Sφ (x) and the encoder fθ as a mixed-integer linear program (MILP).

Modeling Sφ as MILP We use an example to demonstrate the encoding of logical constraints with MILP. Consider an individual x that has two categorical features x1 = [1, 0, . . . , 0] and x2 = [0, . . . , 0, 1] and one numerical feature x3, with the following constraint for similarity:

φ (x, x ) := (x1 = x 1) (|x3 x 3| α) .

Here x is an individual from the test dataset and can be treated as constant, while x is encoded using mixed-integer variables. For every categorical feature x i we introduce k binary variables vl i with l = 1, . . . , k, where k is the number of distinct values this categorical feature can take. For the ﬁxed categorical feature x 1, which is equal to x1, we add the constraints v1 1 = 1 and vl 1 = 0 for l = 2, . . . , k. To model the free categorical feature x 2 we add the constraint P

l vl 2 = 1 thereby enforcing it to take on exactly one of k potential values. Finally, the numerical attribute x 3 can be modeled by adding a corresponding variable v3 with the two constraints: v3 x3 α and v3 x3 + α. It can be easily veriﬁed that our encoding of Sφ is exact.

Consider now a fairness constraint including disjunctions, i.e., φ := φ1 φ2. To model such a disjunction we introduce two auxiliary binary variables v1 and v2 with the constraints vi = 1 φi (x, x ) = 1 for i = 1, 2 and v1 + v2 1.

Handling general constraints The encodings demonstrated on these two examples can be applied for general constraints φ. A full formalization of our encoding is found in Appendix D.

Modeling fθ as MILP To model the encoder we employ the method from Tjeng et al. [16] which is exact for neural networks with Re LU activations. We recall that a Re LU performs max {x, 0} for some input x. Given an upper and lower bound on x, i.e., x [l, u] we can encode the output of Re LU exactly via case distinction: (i) if u 0 add a variable with upper and lower bound 0 to MILP, (ii) if l 0 add a variable with upper and lower bounds u and l respectively to MILP, and (iii) if l < 0 < u, add a variable v and a binary indicator i to MILP in addition to the following constraints:

0 v x i, x v x l (1 i), i = 1 0 x.

Finally, given the MILP formulation of Sφ and fθ we can compute ϵ by solving the following k MILP instances (where k is the dimension of the latent space):

ˆϵj = max x Sφ(x) |f (j) θ (x) f (j) θ (x ) |.

We compute the ﬁnal result as ϵ = max{ˆϵ1, ˆϵ2, . . . ˆϵk}.

5.2 Certifying Local Robustness

The data consumer obtains a point in latent space z and a radius ϵ. To obtain a fairness certiﬁcate, the data consumer certiﬁes that all points in the latent space at ℓ -distance at most ϵ from z are mapped to the same label as z. This amounts to solving the following MILP optimization problem for each logit h(y ) ψ with label y different from the true label y:

max z B (z,ϵ) h(y ) ψ (z ) h(y) ψ (z ).

If the solution of the above optimization problem is less than zero for each y = y, then robustness of the classiﬁer hψ is provably established. Note that, the data consumer can employ same methods as the data producer to encode the classiﬁer as MILP [16] and beneﬁt from any corresponding advancements in solving MILP instances in the context of neural network certiﬁcation, e.g., [54].

We now formalize our certiﬁcate, that allows the data consumer to prove individual fairness of M, once given z and ϵ by the data producer: Theorem 1. (Individual fairness certiﬁcate) Suppose M = hψ fθ with data point x and similarity notion φ. Furthermore, let z = fθ(x), Sφ(x) = {x Rn | φ(x, x )} and ϵ = maxx Sφ(x) ||z fθ(x )|| . If max z B (z,ϵ) h(y ) ψ (z ) h(y) ψ (z ) < 0

for all labels y different from the true label y, then for all x Sφ(x) we have M(x) = M(x ).

Proof. Provided in Appendix E.

Table 1: Accuracy and certiﬁed individual fairness. We compare the accuracy and percentage of certiﬁed individuals with a baseline obtained from setting the loss balancing factor γ = 0. LCIFR produces a drastic increase in certiﬁed individuals while only incurring minor decrease in accuracy.

ACCURACY (%) CERTIFIED (%) CONSTRAINT DATASET BASE LCIFR BASE LCIFR

ADULT 83.0 81.4 59.0 97.8 COMPAS 65.8 63.4 32.1 79.0 CRIME 84.4 83.1 7.4 66.9 GERMAN 76.5 74.0 71.0 97.5 HEALTH 80.8 81.1 75.4 97.8 LAW SCHOOL 84.4 84.6 57.9 69.2

ADULT 83.3 83.1 79.9 100 COMPAS 65.6 66.3 90.9 100 CRIME 84.4 83.9 78.3 100 GERMAN 76.0 75.5 88.5 100 HEALTH 80.7 80.9 64.1 99.8 LAW SCHOOL 84.4 84.4 25.6 51.1

CAT + NOISE

ADULT 83.3 81.3 47.5 97.6 COMPAS 65.6 63.7 30.9 75.6 CRIME 84.4 81.5 6.2 63.3 GERMAN 76.0 70.0 68.0 95.5 HEALTH 80.7 80.7 24.7 97.3 LAW SCHOOL 84.4 84.5 11.6 28.9

ATTRIBUTE ADULT 83.0 80.9 49.3 94.6 GERMAN 76.5 73.5 65.0 96.5 LAW SCHOOL 84.3 86.9 46.4 62.6

QUANTILES LAW SCHOOL 84.2 84.2 56.5 76.9

6 Experimental Evaluation

We implement our method in a tool called LCIFR and present an extensive experimental evaluation. We consider a variety of different datasets Adult [55], Compas [56], Crime [55], German [55], Health (https://www.kaggle.com/c/hhp), and Law School [57] which we describe in detail in Appendix F. We perform the following preprocessing on all datasets: (i) normalize numerical attributes to zero mean and unit variance, (ii) one-hot encode categorical features, (iii) drop rows and columns with missing values, and (iv) split into train, test and validation sets. Although we only consider datasets with binary classiﬁcation tasks, we note that our method straightforwardly extends to the multiclass case. We perform all experiments on a desktop PC using a single Ge Force RTX 2080 Ti GPU and 16-core Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz. We make all code, datasets and preprocessing pipelines publicly available at https://github.com/eth-sri/lcifr to ensure reproducibility of our results.

Experiment setup We model the encoder fθ as a neural network, and we use logistic regression as a classiﬁer hψ. We perform a grid search over model architectures and loss balancing factors γ which we evaluate on the validation set. As a result, we consider fθ with 1 hidden layer of 20 neurons (except for Law School where we do not have a hidden layer) and a latent space of dimension 20. We ﬁx γ to 10 for Adult, Crime, and German, to 1 for Compas and Health, and to 0.1 for Law School. We provide a more detailed overview of the model architectures and hyperparameters in Appendix G.

Fairness constraints We propose a range of different constraints for which we apply our method. These constraints deﬁne the similarity between two individuals based on their numerical attributes (NOISE), categorical attributes (CAT), or combinations thereof (CAT + NOISE). Furthermore, we consider more involved similarity notions based on disjunctions (ATTRIBUTE) and quantiles of certain attributes to counter subordination between social groups [18] (QUANTILES). A full formalization of our constraints is found in Appendix H.

Table 2: Accuracy and percentage of certiﬁed individuals for transferable representation learning on Health dataset with CAT + NOISE constraint. The transfer labels are omitted during training and the data producer objective is augmented with a reconstruction loss. This allows the data consumer to achieve high accuracies and certiﬁcation rates across a variety of (potentially unknown) tasks.

TASK LABEL ACCURACY (%) CERTIFIED (%)

ORIGINAL CHARLSON INDEX 73.8 96.9

MSC2A3 73.7 86.1 METAB3 75.4 93.6 ARTHSPIN 75.4 93.7 NEUMENT 73.8 97.1 RESPR4 72.4 98.4

Applying our method in practice We assume that the data regulator has deﬁned the above constraints. First, we act as the data producer and learn a representation that enforces the individual fairness constraints using our method from Section 4. After training, we compute ϵ for every individual data point in the test set and pass it to the data consumer along with the latent representation of the entire dataset as described in Section 5.1. Second, we act as data consumer and use our method from Section 4 to learn a locally-robust classiﬁer from the latent representation. Finally, to obtain a certiﬁcate of individual fairness, we use ϵ to certify the classiﬁer via our method from Section 5.2.

In Table 1 we compare the accuracy and percentage of certiﬁed individuals (i.e., the empirical approximation of a lower bound on Equation (1)) with a baseline encoder and classiﬁer obtained from standard representation learning (i.e., γ = 0). We do not compare with other approaches for learning individually fair representations since they either consider a different similarity metric or employ nonlinear components that cannot be efﬁciently certiﬁed. It can be observed that LCIFR drastically increases the percentage of certiﬁed individuals across all constraints and datasets. We would like to highlight the relatively low (albeit still signiﬁcantly higher than baseline) certiﬁcation rate for the Law School dataset. This is due to the relatively small loss balancing factor γ = 0.1 which only weakly enforces the individual fairness constraint during training. Finally, we report the following mean certiﬁcation runtime per input, averaged over all constraints: 0.29s on Adult, 0.35s on Compas, 1.23s on Crime, 0.28s on German, 0.68s on Health, and 0.02s on Law School, showing that our method is computationally efﬁcient. We show that our method scales to larger networks in Appendix I.

Fair Transfer Learning We follow Madras et al. [9] to demonstrate that our method is compatible with transferable representation learning. We also consider the Health dataset, for which the original task is to predict the Charlson Index. To demonstrate transferability, we omit the primary condition group labels from the set of features, and try to predict them from the latent representation without explicitly optimizing for the task. To that end, the data producer additionally learns a decoder g (z), which tries to predict the original attributes x from the latent representation, thereby not only retaining task-speciﬁc information on the Charlson Index. This amounts to adding a reconstruction loss LR (x, g (fθ (x))) (e.g., ℓ2) to the objective in Equation (5). Assuming that our representations are in fact transferable, the data consumer is now free to choose any classiﬁcation objective. We note that our certiﬁcation method straightforwardly extends to all possible prediction tasks allowing the data consumer to obtain fairness certiﬁcates regardless of the objective. Here, we let the data consumer train classiﬁers for both the original task and to predict the 5 most common primary condition group labels. We display the accuracy and percentage of certiﬁed data points on all tasks in Table 2. The table shows that our learned representation transfers well across tasks while additionally providing provable individual fairness guarantees.

7 Conclusion

We introduced a novel end-to-end framework for learning representations with provable certiﬁcates of individual fairness. We demonstrated that our method is compatible with existing notions of fairness, such as transfer learning. Our evaluation across different datasets and fairness constraints demonstrates the practical effectiveness of our method.

Broader Impact

Methods that learn from data can potentially produce unfair outcomes by reinforcing human biases or discriminating amongst speciﬁc groups. We illustrate how our method can be employed to address these issues with an example due to Cisse and Koyejo [7]. Consider a company with several teams working with the same data to build models. Although the individual teams may not care about fairness of their models, the company needs to comply with ethical or legal requirements. In this setting, our framework enables the company to obtain such certiﬁcates from every team in a minimally invasive and modular fashion without compromising downstream utility.

Although individual fairness is a desirable property, it is far from sufﬁcient to provide any ethical guarantees. For example, treating all individuals similarly badly does not conﬂict with individual fairness. Our method thus depends on the assumption that all involved parties act reasonably. That is, the data regulator needs to take all ethical aspects and future societal consequences into consideration when designing the similarity property. However, even a diligent data regulator may unconsciously encode biases in the similarity measure. Moreover, our approach breaks down with an adversarial data producer that either explicitly learns a discriminatory representation or simply fails to respect the deﬁned similarity notion. Finally, the case where the data consumer acts adversarially has been investigated in previous work [9] and can be mitigated to some extent.

Acknowledgments and Disclosure of Funding

We would like to thank Anna Chmuroviˇc for her help with initial investigations on combining fairness and differentiable logic during her ETH Student Summer Research Fellowship. We also thank the anonymous reviewers for their insightful comments and suggestions. None of the authors received third party funding or third party support to pursue this work during the last 36 months prior to submission.

[1] Tim Brennan, William Dieterich, and Beate Ehret. Evaluating the predictive validity of the compas risk and needs assessment system. Criminal Justice and Behavior, 2009.

[2] Amit Datta, Michael Carl Tschantz, and Anupam Datta. Automated experiments on ad privacy settings. Proceedings on Privacy Enhancing Technologies, 2015.

[3] Amir E Khandani, Adlar J Kim, and Andrew W Lo. Consumer credit-risk models via machinelearning algorithms. Journal of Banking & Finance, 2010.

[4] Latanya Sweeney. Discrimination in online ad delivery. Queue, 2013.

[5] Tolga Bolukbasi, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama, and Adam Tauman Kalai. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in Neural Information Processing Systems 29, 2016.

[6] Solon Barocas and Andrew D. Selbst. Big data s disparate impact. California Law Review, 2016.

[7] Moustapha Cisse and Sanmi Koyejo. Fairness and representation learning. Neur IPS Invited Talk 2019, 2019.

[8] Richard S. Zemel, Yu Wu, Kevin Swersky, Toniann Pitassi, and Cynthia Dwork. Learning fair representations. In Proceedings of the 30th International Conference on Machine Learning, 2013.

[9] David Madras, Elliot Creager, Toniann Pitassi, and Richard S. Zemel. Learning adversarially fair and transferable representations. In Proceedings of the 35th International Conference on Machine Learning, 2018.

[10] Daniel Mc Namara, Cheng Soon Ong, and Robert C Williamson. Provably fair representations. ar Xiv preprint ar Xiv:1710.04394, 2017.

[11] Alexandra Chouldechova and Aaron Roth. The frontiers of fairness in machine learning. ar Xiv preprint ar Xiv:1810.08810, 2018.

[12] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, 2012.

[13] Moritz Hardt, Eric Price, and Nati Srebro. Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems 29, 2016.

[14] Michael J. Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. In Proceedings of the 35th International Conference on Machine Learning, 2018.

[15] Marc Fischer, Mislav Balunovic, Dana Drachsler-Cohen, Timon Gehr, Ce Zhang, and Martin T. Vechev. DL2: training and querying neural networks with logic. In Proceedings of the 36th International Conference on Machine Learning, 2019.

[16] Vincent Tjeng, Kai Y. Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming. In 7th International Conference on Learning Representations, 2019.

[17] Preethi Lahoti, Krishna P. Gummadi, and Gerhard Weikum. ifair: Learning individually fair data representations for algorithmic decision making. In 35th IEEE International Conference on Data Engineering, 2019.

[18] Preethi Lahoti, Krishna P. Gummadi, and Gerhard Weikum. Operationalizing individual fairness with pairwise fair representations. Proc. VLDB Endow., 2019.

[19] Mikhail Yurochkin, Amanda Bower, and Yuekai Sun. Training individually fair ML models with sensitive subspace robustness. In 8th International Conference on Learning Representations, 2020.

[20] Hanchen Wang, Nina Grgic-Hlaca, Preethi Lahoti, Krishna P. Gummadi, and Adrian Weller. An empirical study on learning fairness metrics for compas data with human supervision. ar Xiv preprint ar Xiv:1910.10255, 2019.

[21] Christina Ilvento. Metric Learning for Individual Fairness. In 1st Symposium on Foundations of Responsible Computing, 2020.

[22] Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, and Richard S. Zemel. The variational fair autoencoder. In 4th International Conference on Learning Representations, 2016.

[23] Harrison Edwards and Amos J. Storkey. Censoring representations with an adversary. In 4th International Conference on Learning Representations, 2016.

[24] Elliot Creager, David Madras, Jörn-Henrik Jacobsen, Marissa A. Weis, Kevin Swersky, Toniann Pitassi, and Richard S. Zemel. Flexibly fair representation learning by disentanglement. In Proceedings of the 36th International Conference on Machine Learning, 2019.

[25] Jiaming Song, Pratyusha Kalluri, Aditya Grover, Shengjia Zhao, and Stefano Ermon. Learning controllable fair representations. In The 22nd International Conference on Artiﬁcial Intelligence and Statistics, 2019.

[26] Daniel Mc Namara, Cheng Soon Ong, and Robert C. Williamson. Costs and beneﬁts of fair representation learning. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, 2019.

[27] Rui Feng, Yang Yang, Yuehan Lyu, Chenhao Tan, Yizhou Sun, and Chunping Wang. Learning fair representations via an adversarial framework. ar Xiv preprint ar Xiv:1904.13341, 2019.

[28] Debarghya Mukherjee, Mikhail Yurochkin, Moulinath Banerjee, and Yuekai Sun. Two simple ways to learn individual fairness metrics from data. ar Xiv preprint ar Xiv:2006.11439, 2020.

[29] Shivam Garg, Vatsal Sharan, Brian Hu Zhang, and Gregory Valiant. A spectral view of adversarially robust features. In Advances in Neural Information Processing Systems 31, 2018.

[30] A. Pensia, V. Jog, and P. Loh. Extracting robust and accurate features via a robust information bottleneck. IEEE Journal on Selected Areas in Information Theory, 2020.

[31] Sicheng Zhu, Xiao Zhang, and David Evans. Learning adversarially robust representations via worst-case mutual information maximization. ar Xiv preprint ar Xiv:2002.11798, 2020.

[32] Christopher Jung, Sampath Kannan, and Neil Lutz. A center in your neighborhood: Fairness in facility location. ar Xiv preprint ar Xiv:1908.09041, 2019.

[33] Sepideh Mahabadi and Ali Vakilian. Individual fairness for k-clustering. ar Xiv preprint ar Xiv:2002.06742, 2020.

[34] Matt J. Kusner, Joshua R. Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. In Advances in Neural Information Processing Systems 30, 2017.

[35] Lu Zhang, Yongkai Wu, and Xintao Wu. A causal framework for discovering and removing direct and indirect discrimination. In Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence, 2017.

[36] David Madras, Elliot Creager, Toniann Pitassi, and Richard S. Zemel. Fairness through causal awareness: Learning causal latent-variable models for biased data. In Proceedings of the Conference on Fairness, Accountability, and Transparency, 2019.

[37] Yoichi Chikahara, Shinsaku Sakaue, Akinori Fujino, and Hisashi Kashima. Learning individually fair classiﬁer with path-speciﬁc causal-effect constraint. ar Xiv preprint ar Xiv:2002.06746, 2020.

[38] Cynthia Dwork and Christina Ilvento. Fairness Under Composition. In 10th Innovations in Theoretical Computer Science Conference, 2018.

[39] Cynthia Dwork, Christina Ilvento, and Meena Jagadeesan. Individual Fairness in Pipelines. In 1st Symposium on Foundations of Responsible Computing, 2020.

[40] Matthew Jagielski, Michael J. Kearns, Jieming Mao, Alina Oprea, Aaron Roth, Saeed ShariﬁMalvajerdi, and Jonathan Ullman. Differentially private fair learning. In Proceedings of the 36th International Conference on Machine Learning, 2019.

[41] Depeng Xu, Shuhan Yuan, and Xintao Wu. Achieving differential privacy and fairness in logistic regression. In Companion of The 2019 World Wide Web Conference, 2019.

[42] Guy Katz, Clark W. Barrett, David L. Dill, Kyle Julian, and Mykel J. Kochenderfer. Reluplex: An efﬁcient SMT solver for verifying deep neural networks. In Computer Aided Veriﬁcation - 29th International Conference, 2017.

[43] Timon Gehr, Matthew Mirman, Dana Drachsler-Cohen, Petar Tsankov, Swarat Chaudhuri, and Martin T. Vechev. AI2: safety and robustness certiﬁcation of neural networks with abstract interpretation. In 2018 IEEE Symposium on Security and Privacy, 2018.

[44] Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efﬁcient neural network robustness certiﬁcation with general activation functions. In Advances in Neural Information Processing Systems 31, 2018.

[45] Gagandeep Singh, Timon Gehr, Markus Püschel, and Martin T. Vechev. An abstract domain for certifying neural networks. Proc. ACM Program. Lang., 2019.

[46] Gagandeep Singh, Rupanshu Ganvir, Markus Püschel, and Martin T. Vechev. Beyond the single neuron convex barrier for neural network certiﬁcation. In Advances in Neural Information Processing Systems 32, 2019.

[47] Caterina Urban, Maria Christakis, Valentin Wüstholz, and Fuyuan Zhang. Perfectly parallel fairness certiﬁcation of neural networks. ar Xiv preprint ar Xiv:1912.02499, 2019.

[48] Samuel Yeom and Matt Fredrikson. Individual fairness revisited: Transferring techniques from adversarial robustness. In Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence, 2020.

[49] Philips George John, Deepak Vijaykeerthy, and Diptikalyan Saha. Verifying individual fairness in machine learning models. In Proceedings of the Thirty-Sixth Conference on Uncertainty in Artiﬁcial Intelligence, 2020.

[50] Finale Doshi-Velez and Been Kim. Considerations for Evaluation and Generalization in Interpretable Machine Learning. Springer International Publishing, 2018.

[51] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, 2015.

[52] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In 6th International Conference on Learning Representations, 2018.

[53] Aharon Ben-Tal, Laurent El Ghaoui, and Arkadi Nemirovski. Robust optimization. Princeton University Press, 2009.

[54] Gagandeep Singh, Timon Gehr, Markus Püschel, and Martin T. Vechev. Boosting robustness certiﬁcation of neural networks. In 7th International Conference on Learning Representations, 2019.

[55] Dheeru Dua and Casey Graff. UCI machine learning repository, 2017.

[56] Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine bias, 2016.

[57] F. Linda Wightman. LSAC national longitudinal bar passage study, 2017.

[58] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary De Vito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, 2019.

[59] Rüdiger Ehlers. Formal veriﬁcation of piece-wise linear feed-forward neural networks. In Automated Technology for Veriﬁcation and Analysis - 15th International Symposium, 2017.