# sofair_single_shot_fair_representation_learning__d457b078.pdf So Fai R: Single Shot Fair Representation Learning Xavier Gitiaux and Huzefa Rangwala George Mason University {xgitiaux, rangwala}@gmu.edu To avoid discriminatory uses of their data, organizations can learn to map them into a representation that filters out information related to sensitive attributes. However, all existing methods in fair representation learning generate a fairness-information trade-off. To achieve different points on the fairness-information plane, one must train different models. In this paper, we first demonstrate that fairness-information trade-offs are fully characterized by rate-distortion trade-offs. Then, we use this key result and propose So Fai R, a single shot fair representation learning method that generates with one trained model many points on the fairness-information plane. Besides its computational saving, our single-shot approach is, to the extent of our knowledge, the first fair representation learning method that explains what information is affected by changes in the fairness / distortion properties of the representation. Empirically, we find on three datasets that So Fai R achieves similar fairness-information tradeoffs as its multi-shot counterparts. 1 Introduction Machine learning algorithms increasingly support decisionmaking systems in contexts where outcomes have long-term implications on the subject s well-being. A growing body of evidence find that algorithms can either replicate or exacerbate existing social biases against some demographic groups. These evidence span many domains, including recidivism risk assessment [Pro Publica, 2016], face recognition [Buolamwini and Gebru, 2018], education data mining [Gardner et al., 2019], and medical diagnosis [Pfohl et al., 2019]. As a result, organizations that collect data are increasingly scrutinized for the potentially discriminatory use of a data by downstream applications. A flexible solution to the datascience pipeline is to control unfair uses of a data before its ingestion by a machine algorithm. Fair representation learning [Zemel et al., 2013] follows this paradigm. It is a data pre-processing method that encodes the data into a representation or code Z, while removing its correlations with sensitive attributes S. Current approaches in fair representation learning [Zemel et al., 2013; Madras et al., 2018; Gitiaux and Rangwala, 2021a; Creager et al., 2019] generate a fairness-information trade-off and are inflexible with respect to their fairnessinformation trade-off, which is set at training time. This limits the deployment of fair representation learning approaches. For example, in medical applications, at test time, a user may need to adjust the content of the representation depending on whether gender is an appropriate feature for the downstream task at play. On one hand, for a downstream application that predicts cardiovascular risk, gender is an important/appropriate feature that should be part of the representation of the data. On the other hand, for a downstream application that predicts payment of medical bills, gender should be irrelevant to the outcome and thus, filtered out from the representation. With existing methods in fair representation learning, the user would have to re-train a fair encoder-decoder to meet each request. At issue are computational costs and lack of consistency between released representations, since the user cannot explain what changes occur between each data product it releases. This paper introduces So Fai R, Single Shot Fair Representation, a method to generate a unfairness-distortion curve with one single trained model. We first show that we can derive unfairness-distortion curves from rate-distortion curves. We can control for the mutual information I(Z, S) between representation and sensitive attribute by encoding X into a bitstream and by controlling for its entropy. We then construct a gated architecture that masks partially the bitstream conditional on the value of the Lagrangian multiplier in the rate-distortion optimization problem. The mask adapts to the fairness-information trade-off targeted by the user who can explore at test time the entire unfairness-distortion curve by increasingly umasking bits. For example, in the case of a downstream medical application for which gender is sensitive and needs to be filtered out, the user sets at test time the Lagrangian multiplier to its largest value, which lowers the resolution of the representation and in a binary basis, masks the rightmost tail of the bit stream. Besides saving on computational costs, So Fai R allows users to interpret what type of information is affected by movement along unfairness-distortion curves. Moving upward unmasks bits in the tail of the bitstream and thus, increases the resolution of the representation encoded in a binary basis. By correlating these unmasked bits with data features, the practitioner has at hand a simple method to explore what information related to the features is added to the representation as its fairness properties degrade. Empirically, we demonstrate on three datasets that at cost constant with the number of points on the curve, So Fai R con- Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22) structs unfairness-distortion curves that are comparable to the ones produced by existing multi-shot approaches whose cost increases linearly with the number of points. On the benchmark Adults dataset, we find that increasingly removing information related to gender degrades first how the representation encodes working hours; then, relationship status and type of professional occupations; finally, marital status. Our contributions are as follows: (i) we formalize fairness-information trade-offs in unsupervised fair representation learning with unfairness-distortion curves and show a tractable connection with rate-distortion curves; (ii) we propose a single shot fair representation learning method to control fairness-information trade-off at test time, while training a single model; and, (iii) we offer a method to interpret how improving or degrading the fairness properties of the resulting representation affects the type of information it encodes. Proofs of theoretical results are in the appendix. Additional experimental details and results are in the supplementary file1. The code publicly available here2. Related Work. A growing body of machine learning literature explores how algorithms can adversely impact some demographic groups (e.g individuals self-identified as Female or African-American) (see [Chouldechova and Roth, 2018] for a review). This paper is more closely related to methods that transform the data into a fair representation. Most of the current literature focus on supervised techniques that tailor the representations to a specific downstream task (e.g [Madras et al., 2018; Edwards and Storkey, 2016; Moyer et al., 2018; Gupta et al., 2021; Jaiswal et al., 2020]). However, practical implementations of fair representation learning would occur in unsupervised setting where organizations cannot anticipate all downstream uses of a data. This paper contributes to unsupervised fair representation (e.g [Gitiaux and Rangwala, 2021b]) by (i) formalizing fairness-information trade-off in a distortion-rate phase diagram, which extends compression-based approaches (e.g [Gitiaux and Rangwala, 2021a]); and (ii), proposing an adaptive technique that allows a single trained model to output as many points as desired on a unfairness-distortion curve. The implementation of So Fai R relates to approaches in rate-distortion that learn adaptive encoder and vary the compression rate at test time (e.g. [Theis et al., 2017; Choi et al., 2019]. We borrow soft-quantization techniques and entropy coding to solve the rate-distortion problem that can be derived from the fair representation learning objective. Our adaptive mask relates to the gain function in [Cui et al., 2020] that selects channels depending on the targeted bit rate. We rely on successive refinement methods from information theory (e.g [Kostina and Tuncel, 2019]) that use a common encoder for all points on the unfairness-distortion curve and add new information by appending bits to a initially coarse representation. To our knowledge, we are the first contribution to implement a deep learning multi-resolution quantization and apply it to the problem of single shot fair representation learning. 1See Long version at https://arxiv.org/abs/2204.12556 2See https://github.com/Gitiauxx/So Fai R Figure 1: Unfairness-distortion curves I(D) vs. rate-distortion curve R(D). The unfairness distortion I(D) can be deduced from the rate-distortion R(D) curve by a downward shift equal to D H(X|S) if the distortion is less than D . 2 Problem Statement 2.1 Preliminaries Consider a population of individuals represented by features X X and sensitive attributes in S S {0, 1}ds, where ds 1 is the dimension of the sensitive attributes space. The objective of unsupervised fair representation learning is to map features X X into a d dimensional representation Z Z such that (i) Z maximizes the information related to X, but (ii) minimizes the information related to sensitive attributes S. We control for the fairness properties of the representation Z via its mutual information I(Z, S) with S. I(Z, S) is an upper bound to the demographic disparity of any classifier using Z as input [Gupta et al., 2021]. We control for the information contained in Z by constraining a distortion d(X, {Z, S}) that measures how much information is lost when using a data reconstructed from Z and S instead of the original X. Therefore, fair representation learning is equivalent to solving the following unfairness-distortion problem I(D) = min f I(Z, S) s.t. d(X, {Z, S}) D (1) where f : X Z is an encoder. The unfairness-distortion function I(D) defines the minimum mutual information between Z and S a user can expect when encoding the data with a distortion less or equal to D. The unfairness-distortion problem (1) implies a fairness-information trade-off: lower values of the distortion constraint D degrade the fairness properties of Z by increasing I(D). The objective of this paper is given a data X to obtain the unfairness-distortion function I(D) with a single encoder-decoder architecture. 2.2 Unfairness Distortion Curves Rate distortion theory characterizes the minimum average number of bits R(D) used to represent X by a code Z while the expected distortion incurred to reconstruct X from the code is less than D. We show how to derive unfairnessdistortion functions I(D) from rate distortion functions R(D). Theorem 2.1. Suppose that the distortion is given by d(X, {Z, S}) = E[ log(p(x|z, s)]. Then, the unfairness Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22) distortion function I(D) is equal to R(D) + D C if R D 1 and 0 otherwise. C = H(X|S) is a constant that does not depend on D, but only on the data X. Moreover, I(D) is a non-increasing convex function. Phase diagram. Figure 1 shows a graphical interpretation of Theorem 2.1 in a (D, R) plane. (D , R ) denotes the point on the rate-distortion curve where R D = 1. For D D , the rate distortion curve is above the line defined by R + D = H(X|S) and that difference between I(D) and R(D) is I(Z, S). For D > D , the rate-distortion curve is the line R + D = H(X|S) and the unfairness-distortion curve is the horizontal axis. We call the regime D D H(X|S) the fair-encoding limit where the distortion is less than its upper limit, but Z is independent of sensitive attribute S. Information bottleneck. Theorem 2.1 implies that fairness-distortion trade-offs are fully characterized by rate-distortion trade-offs. A fundamental result in rate distortion theory ([Tishby, 1999]) shows that the rate-distortion function is given by the information bottleneck R(D) = min f I(X, Z) s.t d(X, {Z, S}) D. (2) By solving this information bottleneck with d(X, {Z, S}) = H(X|Z, S) and invoking Theorem 2.1, we can recover the unfairness-distortion I(D). [Gitiaux and Rangwala, 2021a] provide an intuition for this result. Controlling for the mutual information I(Z, X) allows to control for I(Z, S) because an encoder would not waste code length to represent information related to sensitive attributes, since sensitive attributes are provided directly as an input to the decoder. We can write the information bottleneck in its Lagrangian form as min f βI(Z, X) + E[ log p(x|z, s)] (3) The coefficient β relates to the inverse of the slope of the ratedistortion curve: R D = 1/β. Each value of β generates a different point along the rate-distortion curve and thus, by Theorem 2.1 a different point along the unfairness-distortion curve. Higher values of β lead to representations with lower bit rate and lower mutual information with S. To explore a unfairness-distortion curve, existing multi-shot strategies are prohibitively expensive as they learn a new encoder f for each value of β. Moreover, they cannot interpret how changes in β affect the representation generated by the encoder. 3 Method: Single-Shot Unfairness-Distortion Curves. We propose a single-shot method, So Fai R, to generate with one model as many points as desired on the unfairnessdistortion curve. An encoder f : X {0, 1}d r common to all values of β encodes the data into a d dimensional latent variable e [0, 1]d. We quantize each dimension ej of the d dimensional latent variable with a resolution rj(β): we transform ej into a quantized representation zj(β) = [e r(β)]/r(β), where [.] denotes the rounding-up operation and r(.) is a decreasing function of β. Figure 2: So Fai R generates interpretable shifts along the unfairnessdistortion curve. For a point z1, So Fair learns a mask m1 that hides bits on the tails of each dimension of the representation. By relaxing the mask to first m2 then m3, the number of bits used to represent the data increases from a1 to a2 and then a3; and, the representation moves to z2 then z3, which reduces the distortion at the expenses of degraded fairness properties. z1, z2 and z3 only differ by their masked bits (black squares). 3.1 Interpretability To maintain an interpretable relation between z(β) and z(β ) for β < β, we write rj(β) = 2aj(β), where aj(.) is a decreasing function of β for j = 1, ..., d. Each dimension zj(β) of the quantized representation is then encoded into aj(β) bits. Moreover, for β < β, each dimension j of the representation z(β ) is made of the same aj(β) bits as zj(β), followed by aj(β ) aj(β) additional bits. Each dimension zj(β) of the quantized representation is encoded into aj(β) bits bj,1, bj,2, ..., bj,aj(β), where bj,l {0, 1} for l = 1, ..., aj(β). For β < β and for j = 1, ..., d, we have zj(β ) = zj(β) + l=aj(β) bj,l2 l. Therefore, we have a tractable and interpretable relation between zj(β ) and zj(β). This construction allows relaxing fairness constraints and decreasing distortion by unmasking additional bits for each dimension of the representation. Figure 2 shows an example for a 2-dimensional representation. A user who has released z1 with high distortion and low mutual information I(Z, S) reduces distortion at the cost of fairness by unmasking one bit for the first dimension and two bits for the second and by generating z2. 3.2 Quantization We assign a maximum number of bits A > 0 to encode each dimension of the representation. We apply a function he to map the d dimensional latent variable e into [0, 1]d A and then, apply a rounding-up operator [he(e)] to generate a d A matrix, each row encoding a dimension of the representation with A bits (see Figure 2 with A = 3). For each dimension j, we implement aj(.) by applying a function ha to map e into a d dimensional vector of R+d and by computing aj(β) = A [1 tanh(ha(e)jβ))]. Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22) Figure 3: Unfairness-Distortion curves for a) DSprites, b) Adults-Gender, c) Adults-Race-Gender(left) and d) Heritage. For each value of β and each row of the matrix [he(e)], we mask all the entries in position l > aj(β): for each row j and each column l, we compute a soft mask mj,l(β) = σ (aj(β) l) where σ denotes a sigmoid activation; and then, we apply a rounding operator [mj,l(β)] to our soft mask. For example, suppose that we encode in at most A = 8 bits the embedding value e = 0.7 and that h(e) = e. For β = 0, we use all the bits (a(0) = 8) and z = 0.699; for β = 0.5, a = 8(1 tanh((0.5)(0.7))) = 5.3 and we use only 5 bits with z = 0.6875. The binarization caused by the rounding operation [.] is not differentiable. We follow [Theis et al., 2017] and use a gradient-through approach that replaces [.] by the identity during the backward pass of back-propagation, while keeping the rounding operation during the forward pass. 3.3 Entropy Estimation In our implementation, encoding and quantization are deterministic and Z is completely determined by X: H(Z|X) = 0 and I(Z, X) = H(Z). To estimate the entropy of the representation Z, we use an auto-regressive factorization and write the discrete distribution P(z|β) as P(z|β) = Qd j=1 P(zj|z. D . Let g be a solution of the minimization (6) for D. Note that a solution of (6) for D respects the constraint of the minimization (6) for D and thus, I(D ) I(D). Let D denote Hg (X|Z, S). Then, by definition of the rate-distortion objective value (7), we have I(D) = Ig (Z, X) + D H(X|S) R(D ) + D H(X|S). (11) If D < D , then we already know that I(D ) = R(D ) + D H(X|S) and that I(D ) > I(D ) I(D). Moreover, by inequality (11), I(D ), thus I(D ) > I(D) I(D ), which is a contradiction. If D = D , we already know that I(D) I(D ) = R(D )+D H(X|S) = I(D ) I(D) and thus that I(D) = I(D ). It remains to look at the case D > D . Consider D [D , D ]. By convexity of R(D) we have R(D ) R(D ) R(D ) D (D D ) (a) = D D , where (a) comes the fact that R(D ) D = 1. It results that by the inequality (11) I(D) R(D ) + D H(X|S). Moreover, we already know that R(D ) + D H(X|S) = I(D ). Hence I(D ) I(D) I(D ), which proves the equality in Lemma A.2. Lemma A.4. Let D = H(X|S). We have I(D ) = 0. Proof. Consider an encoder g that generates a random variable Z independent of X. Then Hg(X|Z, S) = D and Ig(Z, X) = 0. Therefore, g respect the constraint of the minimization (6) for D and I(D ) Ig(Z, X) + Hg(X|Z, S) H(X|S) = 0. Hence, I(D ) = 0. By combining Lemma A.2 and A.4, we can show that I(D) = 0 for D D . A.2 Entropy Estimation We follow a standard approach in rate-distortion [Theis et al., 2017; Choi et al., 2019] and approximate the discrete distribution p(zj|z.