# flexibly_fair_representation_learning_by_disentanglement__050da4c1.pdf Flexibly Fair Representation Learning by Disentanglement Elliot Creager 1 2 David Madras 1 2 J orn-Henrik Jacobsen 2 Marissa A. Weis 2 3 Kevin Swersky 4 Toniann Pitassi 1 2 Richard Zemel 1 2 We consider the problem of learning representations that achieve group and subgroup fairness with respect to multiple sensitive attributes. Taking inspiration from the disentangled representation learning literature, we propose an algorithm for learning compact representations of datasets that are useful for reconstruction and prediction, but are also flexibly fair, meaning they can be easily modified at test time to achieve subgroup demographic parity with respect to multiple sensitive attributes and their conjunctions. We show empirically that the resulting encoder which does not require the sensitive attributes for inference enables the adaptation of a single representation to a variety of fair classification tasks with new target labels and subgroup definitions. 1. Introduction Machine learning systems are capable of exhibiting discriminatory behaviors against certain demographic groups in high-stakes domains such as law, finance, and medicine (Kirchner et al., 2016; Aleo & Svirsky, 2008; Kim et al., 2015). These outcomes are potentially unethical or illegal (Barocas & Selbst, 2016; Hellman, 2018), and behoove researchers to investigate more equitable and robust models. One promising approach is fair representation learning: the design of neural networks using learning objectives that satisfy certain fairness or parity constraints in their outputs (Zemel et al., 2013; Louizos et al., 2016; Edwards & Storkey, 2016; Madras et al., 2018). This is attractive because neural network representations often generalize to tasks that are unspecified at train time, which implies that a properly specified fair network can act as a group parity bottleneck that reduces discrimination in unknown downstream tasks. 1University of Toronto 2Vector Institute 3University of T ubingen 4Google Research. Correspondence to: Elliot Creager . Proceedings of the 36 th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s). Current approaches to fair representation learning are flexible with respect to downstream tasks but inflexible with respect to sensitive attributes. While a single learned representation can adapt to the prediction of different task labels y, the single sensitive attribute a for all tasks must be specified at train time. Mis-specified or overly constraining train-time sensitive attributes could negatively affect performance on downstream prediction tasks. Can we instead learn a flexibly fair representation that can be adapted, at test time, to be fair to a variety of protected groups and their intersections? Such a representation should satisfy two criteria. Firstly, the structure of the latent code should facilitate simple adaptation, allowing a practitioner to easily adapt the representation to a variety of fair classification settings, where each task may have a different task label y and sensitive attributes a. Secondly, the adaptations should be compositional: the representations can be made fair with respect to conjunctions of sensitive attributes, to guard against subgroup discrimination (e.g., a classifier that is fair to women but not Black women over the age of 60). This type of subgroup discrimination has been observed in commercial machine learning systems (Buolamwini & Gebru, 2018). In this work, we investigate how to learn flexibly fair representations that can be easily adapted at test time to achieve fairness with respect to sets of sensitive groups or subgroups. We draw inspiration from the disentangled representation literature, where the goal is for each dimension of the representation (also called the latent code ) to correspond to no more than one semantic factor of variation in the data (for example, independent visual features like object shape and position) (Higgins et al., 2017; Locatello et al., 2019). Our method uses multiple sensitive attribute labels at train time to induce a disentangled structure in the learned representation, which allows us to easily eliminate their influence at test time. Importantly, at test time our method does not require access to the sensitive attributes, which can be difficult to collect in practice due to legal restrictions (Elliot et al., 2008; DCCA, 1983). The trained representation permits simple and composable modifications at test time that eliminate the influence of sensitive attributes, enabling a wide variety of downstream tasks. We first provide proof-of-concept by generating a variant of the synthetic DSprites dataset with correlated ground truth Flexibly Fair Representation Learning by Disentanglement non-sensitive observations sensitive observations non-sensitive latents sensitive latents (a) FFVAE learns the encoder distribution q(z, b|x) and decoder distributions p(x|z, b), p(a|b) from inputs x and multiple sensitive attributes a. The disentanglement prior structures the latent space by encouraging low MI(bi, aj)8i 6= j and low MI(b, z) where MI( ) denotes mutual information. modified sens. latents target label (b) The FFVAE latent code [z, b] can be modified by discarding or noising out sensitive dimensions {bj}, which yields a latent code [z, b0] independent of groups and subgroups derived from sensitive attributes {aj}. A held out label y can then be predicted with subgroup demographic parity. Figure 1. Data flow at train time (1a) and test time (1b) for our model, Flexibly Fair VAE (FFVAE). factors of variation, which is better suited to fairness questions. We demonstrate that even in the correlated setting, our method is capable of disentangling the effect of several sensitive attributes from data, and that this disentanglement is useful for fair classification tasks downstream. We then apply our method to a real-world tabular dataset (Communities & Crime) and an image dataset (Celeb-A), where we find that our method matches or exceeds the fairness-accuracy tradeoff of existing disentangled representation learning approaches on a majority of the evaluated subgroups. 2. Background Group Fair Classification In fair classification, we consider labeled examples x, a, y pdata where y 2 Y are labels we wish to predict, a 2 A are sensitive attributes, and x 2 X are non-sensitive attributes. The goal is to learn a classifier ˆy = g(x, a) (or ˆy = g(x)) which is predictive of y and achieves certain group fairness criteria w.r.t. a. These criteria are typically written as independence properties of the various random variables involved. In this paper we focus on demographic parity, which is satisfied when the predictions are independent of the sensitive attributes: ˆy ? a. It is often impossible or undesirable to satisfy demographic parity exactly (i.e. achieve complete independence). In this case, a useful metric is demographic parity distance: DP = |E[ y = 1|a = 1] E[ y = 1|a = 0]| (1) where y here is a binary prediction derived from model output ˆy. When DP = 0, demographic parity is achieved; in general, lower DP implies less unfairness. Our work differs from the fair classification setup as follows: We consider several sensitive attributes at once, and seek fair outcomes with respect to each; individually as well as jointly (cf. subgroup fair classification, Kearns et al. (2018); Hebert Johnson et al. (2018)); also, we focus on representation learning rather than classification, with the aim of enabling a range of fair classification tasks downstream. Fair Representation Learning In order to flexibly deal with many label and sensitive attribute sets, we employ representation learning to compute a compact but predicatively useful encoding of the dataset that can be flexibly adapted to different fair classification tasks. As an example, if we learn the function f that achieves independence in the representations z ? a with z = f(x, a) or z = f(x), then any predictor derived from this representation will also achieve the desired demographic parity, ˆy ? a with ˆy = g(z). The fairness literature typically considers binary labels and sensitive attributes: A = Y = {0, 1}. In this case, approaches like regularization (Zemel et al., 2013) and adversarial regularization (Edwards & Storkey, 2016; Madras et al., 2018) are straightforward to implement. We want to address the case where a is a vector with many dimensions. Group fairness must be achieved for each of the dimensions in a (age, race, gender, etc.) and their combinations. VAE The vanilla Variational Autoencoder (VAE) (Kingma & Welling, 2013) is typically implemented with an isotropic Gaussian prior p(z) = N(0, I). The objective to be maximized is the Evidence Lower Bound (a.k.a., the ELBO), LVAE(p, q) = Eq(z|x) [log p(x|z)] DKL [q(z|x)||p(z)] , which bounds the data log likelihood log p(x) from below for any choice of q. The encoder and decoder are often implemented as Gaussians q(z|x) = N(z|µq(x), q(x)) p(x|z) = N(x|µp(z), p(z)) Flexibly Fair Representation Learning by Disentanglement whose distributional parameters are the outputs of neural networks µq( ), q( ), µp( ), p( ), with the typically exhibiting diagonal structure. For modeling binary-valued pixels, a Bernoulli decoder p(x|z) = Bernolli(x| p(z)) can be used. The goal is to maximize LVAE which is made differentiable by reparameterizing samples from q(z|x) w.r.t. the network parameters. β-VAE Higgins et al. (2017) modify the VAE objective: LβVAE(p, q) = Eq(z|x) [log p(x|z)] βDKL [q(z|x)||p(z)] . The hyperparameter β allows the practitioner to encourage the variational distribution q(z|x) to reduce its KLdivergence to the isotropic Gaussian prior p(z). With β > 1 this objective is a valid lower bound on the data likelihood. This gives greater control over the model s adherence to the prior. Because the prior factorizes per dimension p(z) = Q j p(zj), Higgins et al. (2017) argue that increasing β yields disentangled latent codes in the encoder distribution q(z|x). Broadly speaking, each dimension of a properly disentangled latent code should capture no more than one semantically meaningful factor of variation in the data. This allows the factors to be manipulated in isolation by altering the per-dimension values of the latent code. Disentangled autoencoders are often evaluated by their sample quality in the data domain, but we instead emphasize the role of the encoder as a representation learner to be evaluated on downstream fair classification tasks. Factor VAE and β-TCVAE Kim & Mnih (2018) propose a different variant of the VAE objective: LFactor VAE(p, q) = LVAE(p, q) γDKL(q(z)|| The main idea is to encourage factorization of the aggregate posterior q(z) = Epdata(x) [q(z|x)] so that zi correlates with zj if and only if i = j. The authors propose a simple trick to generate samples from the aggregate posterior q(z) and its marginals {q(zj)} using shuffled minibatch indices, then approximate the DKL(q(z)|| Q j q(zj)) term using the cross entropy loss of a classifier that distinguishes between the two sets of samples, which yields a mini-max optimization. Chen et al. (2018) show that the DKL(q(z)|| Q j q(zj)) term above a.k.a. the total correlation of the latent code can be naturally derived by decomposing the expected KL divergence from the variational posterior to prior: Epdata(x)[DKL(q(z|x)||p(z))] = DKL(q(z|x)pdata(x)||q(z)pdata(x)) + DKL(q(z)|| DKL [q(zj)||p(zj)] . They then augment the decomposed ELBO to arrive at the same objective as Kim & Mnih (2018), but optimize using a biased estimate of the marginal probabilities q(zj) rather than with the adversarial bound on the KL between aggregate posterior and its marginals. 3. Related Work Most work in fair machine learning deals with fairness with respect to single (binary) sensitive attributes. Multi-attribute fair classification was recently the focus of Kearns et al. (2018) with empirical follow-up (Kearns et al., 2019) and Hebert-Johnson et al. (2018). Both papers define the notion of an identifiable class of subgroups, and then obtain fair classification algorithms that are provably as efficient as the underlying learning problem for this class of subgroups. The main difference is the underlying metric; Kearns et al. (2018) use statistical parity whereas Hebert-Johnson et al. (2018) focus on calibration. Building on the multi-accuracy framework of Hebert-Johnson et al. (2018), Kim et al. (2019) develop a new algorithm to achieve multi-group accuracy via a post-processing boosting procedure. The search of independent latent components that explain observed data has long been a focus on the probabilistic modeling community (Comon, 1994; Hyv arinen & Oja, 2000; Bach & Jordan, 2002). In light of the increased prevalence of neural networks models in many data domains, the machine learning community has renewed its interest in learned features that disentangle semantic factors of data variation. The introduction of the β-VAE (Higgins et al., 2017), as discussed in section 2, motivated a number of subsequent studies that examine why adding additional weight on the KL-divergence of the ELBO encourages disentangled representations (Alemi et al., 2018; Burgess et al., 2017). Chen et al. (2018); Kim & Mnih (2018) and Esmaeili et al. (2019) argue that decomposing the ELBO and penalizing the total correlation increases disentanglement in the latent representations. Locatello et al. (2019) conduct extensive experiments comparing existing unsupervised disentanglement methods and metrics. They conclude pessimistically that learning disentangled representations requires inductive biases and possibly additional supervision, but identify fair machine learning as a potential application where additional supervision is available by way of sensitive attributes. Our work is the first to consider multi-attribute fair representation learning, which we accomplish by using sensitive attributes as labels to induce a factorized structure in the aggregate latent code. Bose & Hamilton (2018) proposed a compositional fair representation of graph-structured data. Kingma et al. (2014) previously incorporated (partiallyobserved) label information into the VAE framework to perform semi-supervised classification. Several recent VAE variants have incorporated label information into latent vari- Flexibly Fair Representation Learning by Disentanglement able learning for image synthesis (Klys et al., 2018) and single-attribute fair representation learning (Song et al., 2019; Botros & Tomczak, 2018; Moyer et al., 2018). Designing invariant representations with non-variational objectives has also been explored, including in reversible models (Ardizzone et al., 2019; Jacobsen et al., 2019). 4. Flexibly Fair VAE We want to learn fair representations that beyond being useful for predicting many test-time task labels y can be adapted simply and compositionally for a variety of sensitive attributes settings a after training. We call this property flexible fairness. Our approach to this problem involves inducing structure in the latent code that allows for easy manipulation. Specifically, we isolate information about each sensitive attribute to a specific subspace, while ensuring that the latent space factorizes these subspaces independently. Notation We employ the following notation: x 2 X: a vector of non-sensitive attributes, for exam- ple, the pixel values in an image or row of features in a tabular dataset; a 2 {0, 1}Na: a vector of binary sensitive attributes; z 2 RNz: non-sensitive subspace of the latent code; b 2 RNb: sensitive subspace of the latent code1. For example, we can express the VAE objective in this notation as LVAE(p, q) = Eq(z,b|x,a) [log p(x, a|z, b)] DKL [q(z, b|x, a)||p(z, b)] . In learning a flexibly fair representations [z, b] = f([x, a]), we aim to satisfy two general properties: disentanglement and predictiveness. We say that [z, b] is disentangled if its aggregate posterior factorizes as q(z, b) = q(z) Q j q(bj) and is predictive if each bi has high mutual information with the corresponding ai. Note that under the disentanglement criteria the dimensions of z are free to co-vary together, but must be independent from all sensitive subspaces bj. We have also specified factorization of the latent space in terms of the aggregate posterior q(z, b) = Epdata(x)[q(z, b|x)], to match the global independence criteria of group fairness. 1 In our experiments we used Nb = Na (same number of sensitive attributes as sensitive latent dimensions) to model binary sensitive attributes. But categorical or continuous sensitive attributes can also be accommodated. Desiderata We can formally express our desiderata as follows: z ? bj 8 j (disentanglement of the non-sensitive and sensitive latent dimensions); bi ? bj 8 i 6= j (disentanglement of the various differ- ent sensitive dimensions); MI(aj, bj) is large 8j (predictiveness of each sensitive dimension); where MI(u, v) = Ep(u,v) log p(u,v) p(u)p(v) represents the mutual information between random vectors u and v. We note that these desiderata differ in two ways from the standard disentanglement criteria. The predictiveness requirements are stronger: they allow for the injection of external information into the latent representation, requiring the model to structure its latent code to align with that external information. However, the disentanglement requirement is less restrictive since it allows for correlations between the dimensions of z. Since those are the non-sensitive dimensions, we are not interested in manipulating those at test time, and so we have no need for constraining them. If we satisfy these criteria, then it is possible to achieve demographic parity with respect to some ai by simply removing the dimension bi from the learned representation i.e. use instead [z, b]\bi. We can alternatively replace bi with independent noise. This adaptation procedure is simple and compositional: if we wish to achieve fairness with respect to a conjunction of binary attributes2 ai aj ak, we can simply use the representation [z, b]\{bi, bj, bk}. By comparison, while Factor VAE may disentangle dimensions of the aggregate posterior q(z) = Q j q(zj) it does not automatically satisfy flexible fairness, since the representations are not predictive, and cannot necessarily be easily modified along the attributes of interest. Distributions We propose a variation to the VAE which encourages our desiderata, building on methods for disentanglement and encouraging predictiveness. Firstly, we assume assume a variational posterior that factorizes across z and b: q(z, b|x) = q(z|x)q(b|x). (2) The parameters of these distributions are implemented as neural network outputs, with the encoder network yielding a tuple of parameters for each input: (µq(x), q(x), q(x)) = Encoder(x). We then specify q(z|x) = N(z|µq(x), q(x)) and q(b|x) = δ( q(x)) (i.e., b is non-stochastic)3. 2 and _ represent logical and and or operations, respectively. 3 We experimented with several distributions for modeling b|x stochastically, but modeling this uncertainty did not help optimization or downstream evaluation in our experiments. Flexibly Fair Representation Learning by Disentanglement Secondly, we model reconstruction of x and prediction of a separately using a factorized decoder: p(x, a|z, b) = p(x|z, b)p(a|b) (3) where p(x|z, b) is the decoder distribution suitably chosen for the inputs x, and p(a|b) = Q j Bernoulli(aj|σ(bj)) is a factorized binary classifier that uses bj as the logit for predicting aj (σ( ) represents the sigmoid function). Note that the p(a|b) factor of the decoder requires no extra parameters. Finally, we specify a factorized prior p(z, b) = p(z)p(b) with p(z) as a standard Gaussian and p(b) as Uniform. Learning Objective Using the encoder and decoder as defined above, we present our final objective: LFFVAE(p, q) = Eq(z,b|x)[log p(x|z, b) + log p(a|b)] γDKL(q(z, b)||q(z) DKL [q(z, b|x)||p(z, b)] . (4) It comprises the following four terms, respectively: a reconstruction term which rewards the model for successfully modeling non-sensitive observations; a predictiveness term which rewards the model for aligning the correct latent components with the sensitive attributes; a disentanglement term which rewards the model for decorrelating the latent dimensions of b from each other and z; and a dimension-wise KL term which rewards the model for matching the prior in the latent variables. We call our model FFVAE for Flexibly Fair VAE (see Figure 1 for a schematic representation). The hyperparameters and γ control aspects relevant to flexible fairness of the representation. controls the alignment of each aj to its corresponding bj (predictiveness), whereas γ controls the aggregate independence in the latent code (disentanglement). The γ-weighted total correlation term is realized by training a binary adversary to approximate the log density ratio log q(z,b) q(z) Q j q(bj). The adversary attempts to classify between true samples from the aggregate posterior q(z, b) and fake samples from the product of the marginals q(z) Q j q(bj) (see Appendix A for further details). If a strong adversary can do no better than random chance, then the desired independence property has been achieved. We note that our model requires the sensitive attributes a at training time but not at test time. This is advantageous, since often these attributes can be difficult to collect from users, due to practical and legal restrictions, particularly for sensitive information (Elliot et al., 2008; DCCA, 1983). 5. Experiments 5.1. Evaluation Criteria We evaluate the learned encoders with an auditing scheme on held-out data. The overall procedure is as follows: 1. Split data into a training set (for learning the encoder) and an audit set (for evaluating the encoder). 2. Train an encoder/representation using the training set. 3. Audit the learned encoder. Freeze the encoder weights and train an MLP to predict some task label given the (possibly modified) encoder outputs on the audit set. To evaluate various properties of the encoder we conduct three types of auditing tasks fair classification, predictiveness, and disentanglement which vary in task label and representation modification. The fair classification audit (Madras et al., 2018) trains an MLP to predict y (held-out from encoder training) given [z, b] with appropriate sensitive dimensions removed, and evaluates accuracy and DP on a test set. We repeat for a variety of demographic subgroups derived from the sensitive attributes. The predictiveness audit trains classifier Ci to predict sensitive attribute ai from bi alone. The disentanglement audit trains classifier C\i to predict sensitive attribute ai from the representation with bi removed (e.g. [z, b]\bi). If Ci has low loss, our representation is predictive; if C\i has high loss, it is disentangled. 5.2. Synthetic Data DSprites Unfair Dataset The DSprites dataset4 contains 64 64-pixel images of white shapes against a black background, and was designed to evaluate whether learned representations have disentangled sources of variation. The original dataset has several categorical factors of variation Scale, Orientation, XPosition, YPosition that combine to create 700, 000 unique images. We binarize the factors of variation to derive sensitive attributes and labels, so that many images now share any given attribute/label combination (See Appendix B for details). In the original DSprites dataset, the factors of variation are sampled uniformly. However, in fairness problems, we are often concerned with correlations between attributes and the labels we are trying to predict (otherwise, achieving low DP is aligned with standard classification objectives). Hence, we sampled an unfair version of this data (DSprites Unfair) with correlated factors of variation; in particular Shape and XPosition correlate positively. Then a non-trivial fair classification task would be, for instance, learning to predict shape without discriminating against inputs from the left side of the image. 4https://github.com/deepmind/dsprites-dataset Flexibly Fair Representation Learning by Disentanglement (a) a = Scale (b) a = Shape (c) a = Shape Scale (d) a = Shape _ Scale Figure 2. Fairness-accuracy tradeoff curves, DSprites Unfair dataset. We sweep a range of hyperparameters for each model and report Pareto fronts. Optimal point is the top left hand corner this represents perfect accuracy and fairness. MLP is a baseline classifier trained directly on the input data. For each model, encoder outputs are modified to remove information about a. y = XPosition for each plot. Figure 3. Black and pink dashed lines respectively show FFVAE disentanglement audit (the higher the better) and predictiveness audit (the lower the better) as a function of . These audits use Ai=Shape (see text for details). The blue line is a reference value the log loss of a classifier that predicts Ai from the other 5 DSprites factors of variation (Fo V) alone, ignoring the image representing the amount of information about Ai inherent in the data. Baselines To test the utility of our predictiveness prior, we compare our model to β-VAE (VAE with a coefficient β 1 on the KL term) and Factor VAE, which have disentanglement priors but no predictiveness prior. We can also think of these as FFVAE with = 0. To test the utility of our disentanglement prior, we also compare against a version of our model with γ = 0, denoted CVAE. This is similar to the class-conditional VAE (Kingma et al., 2014), with sensitive attributes as labels this model encourages predictiveness but no disentanglement. Fair Classification We perform the fair classification audit using several group/subgroup definitions for models trained on DSprites Unfair (see Appendix D for training details), and report fairness-accuracy tradeoff curves in Fig. 2. In these experiments, we used Shape and Scale as our sensitive attributes during encoder training. We perform the fair classification audit by training an MLP to predict y = XPosition which was not used in the representa- tion learning phase given the modified encoder outputs, and repeat for several sensitive groups and subgroups. We modify the encoder outputs as follows: When our sensitive attribute is ai we remove the associated dimension bi from [z, b]; when the attribute is a conjunction of ai and aj, we remove both bi and bj. For the baselines, we simply remove the latent dimension which is most correlated with ai, or the two most correlated dimensions with the conjunction. We sweep a range of hyperparameters to produce the fairness-accuracy tradeoff curve for each model. In Fig. 2, we show the Pareto front of these models: points in ( DP , accuracy)-space for which no other point is better along both dimensions. The optimal result is the top left hand corner (perfect accuracy and DP = 0). Since we have a 2-D sensitive input space, we show results for four different sensitive attributes derived from these dimensions: {a = Shape , a = Scale , a = Shape _ Scale , a = Shape Scale }. Recall that Shape and XPosition correlate in the DSprites Unfair dataset. Therefore, for sensitive attributes that involve Shape, we expect to see an improvement in DP . For sensitive attributes that do not involve Shape, we expect that our method does not hurt performance at all since the attributes are uncorrelated in the data, the optimal predictive solution also has DP = 0. When group membership a is uncorrelated with label y (Fig. 2a), all models achieve high accuracy and low DP (a and y successfully disentangled). When a correlates with y by design (Fig. 2b), we see the clearest improvement of the FFVAE over the baselines, with an almost complete reduction in DP and very little accuracy loss. The baseline models are all unable to improve DP by more than about 0.05, indicating that they have not effectively disentangled the sensitive information from the label. In Figs. 2c and 2d, we examine conjunctions of sensitive attributes, assessing FFVAE s ability to flexibly provide multi-attribute fair representations. Here FFVAE exceeds or matches the baselines accuracy-at-a-given DP almost everywhere; by disentangling information from multiple sensitive attributes, FFVAE enables flexibly fair downstream classification. Flexibly Fair Representation Learning by Disentanglement Disentanglement and Predictiveness Fig. 3 shows the FFVAE disentanglement and predictiveness audits (see above for description of this procedure). This result aggregates audits across all FFVAE models trained in the setting from Figure 2b. The classifier loss is cross-entropy, which is a lower bound on the mutual information between the input and target of the classifier. We observe that increasing helps both predictiveness and disentanglement in this scenario. In the disentanglement audit, larger makes predicting the sensitive attribute from the modified representation (with bi removed) more difficult. The horizontal dotted line shows the log loss of a classifier that predicts ai from the other DSprites factors of variation (including labels not available to FFVAE); this baseline reflects the correlation inherent in the data. We see that when = 0 (i.e. Factor VAE), it is slightly more difficult than this baseline to predict the sensitive attribute. This is due to the disentanglement prior. However, increasing > 0 increases disentanglement benefits in FFVAE beyond what is present in Factor VAE. This shows that encouraging predictive structure can help disentanglement through isolating each attribute s information in particular latent dimensions. Additionally, increasing improves predictiveness, as expected from the objective formulation. We further evaluate the disentanglement properties of our model in Appendix E using the Mutual Information Gap metric (Chen et al., 2018). 5.3. Communities & Crime Dataset Communities & Crime5 is a tabular UCI dataset containing neighborhood-level population statistics. 120 such statistics are recorded for each of the 1, 994 neighborhoods. Several attributes encode demographic information that may be protected. We chose three as sensitive: race Pct Black (% neighborhood population which is Black), black Per Cap (avg per capita income of Black residents), and pct Not Speak Engl Well (% neighborhood population that does not speak English well). We follow the same train/eval procedure as with DSprites Unfair - we train FFVAE with the sensitive attributes and evaluate using naive MLPs to predict a held-out label (violent crimes per capita) on held-out data. Fair Classification This dataset presents a more difficult disentanglement problem than DSprites Unfair. The three sensitive attributes we chose in Communities and Crime were somewhat correlated with each other, a natural artefact of using real (rather than simulated) data. We note that in general, the disentanglement literature does not provide much guidance in terms of disentangling correlated attributes. Despite this obstacle, FFVAE performed reasonably well in the fair classification audit (Fig. 4). It achieved 5http://archive.ics.uci.edu/ml/datasets/communities+ and+crime (a) a = R (b) a = B (c) a = P (d) a = R _ B (e) a = R _ P (f) a = B _ P (g) a = R B (h) a = R P (i) a = B P Figure 4. Communities & Crime subgroup fairness-accuracy trade- offs. Sensitive attributes: race Pct Black (R), black Per Cap Income (B), and pct Not Speak Engl Well (P). y = violent Crimes Per Captia. higher accuracy than the baselines in general, likely due to its ability to incorporate side information from a during training. Among the baselines, Factor VAE tended perform best, suggesting achieving a factorized aggregate posterior helps with fair classification. While our method does not outperform the baselines on each conjunction, its relatively strong performance on a difficult, tabular dataset shows the promise of using disentanglement priors in designing robust subgroup-fair machine learning models. 5.4. Celebrity Faces Dataset The Celeb A6 dataset contains over 200, 000 images of celebrity faces. Each image is associated with 40 human-labeled binary attributes (Oval Face, Heavy Makeup, etc.). We chose three attributes, Chubby, Eyeglasses, and Male as sensitive attributes7, and report fair classification results on 3 groups and 12 two-attribute-conjunction subgroups only (for brevity we omit three-attribute conjunctions). To our knowledge this is the first exploration of fair representation learning algorithms on the Celeb-A dataset. As in the previous sections we train the encoders on the train set, then evaluate performance of MLP classifiers trained on the encoded test set. 6http://mmlab.ie.cuhk.edu.hk/projects/Celeb A.html 7 We chose these attributes because they co-vary relatively weakly with each other (compared with other attribute triplets), but strongly with other attributes. Nevertheless the rich correlation structure amongst all attributes makes this a challenging fairness dataset; it is difficult to achieve high accuracy and low DP . Flexibly Fair Representation Learning by Disentanglement (a) a = C (b) a = E (c) a = M (d) a = C E (e) a = C E (f) a = C E (g) a = C E (h) a = C M (i) a = C M (j) a = C M (k) a = C M (l) a = E M (m) a = E M (n) a = E M (o) a = E M Figure 5. Celeb-A subgroup fair classification results. Sensitive attributes: Chubby (C), Eyeglasses (E), and Male (M). y = Heavy Makeup. Fair Classification We follow the fair classification audit procedure described above, where the held-out label Heavy Makeup which was not used at encoder train time is predicted by an MLP from the encoder representations. When training the MLPs we take a fresh encoder sample for each minibatch (statically encoding the dataset with one encoder sample per image induced overfitting). We found that training the MLPs on encoder means (rather than samples) increased accuracy but at the cost of very unfavorable DP . We also found that Factor VAE-style adversarial training does not scale well to this high-dimensional problem, so we instead optimize Equation 4 using the biased estimator from Chen et al. (2018). Figure 5 shows Pareto fronts that capture the fairness-accuracy tradeoff for FFVAE and β-VAE. While neither method dominates in this challenging setting, FFVAE achieves a favorable fairness-accuracy tradeoff across many of subgroups. We believe that using sensitive attributes as side information gives FFVAE an advantage over β-VAE in predicting the held-out label. In some cases (e.g., a= E M) FFVAE achieves better accuracy at all DP levels, while in others (e.g., a= C E) , FFVAE did not find a low DP solution. We believe Celeb-A with its many high dimensional data and rich label correlations is a useful test bed for subgroup fair machine learning algorithms, and we are encouraged by the reasonably robust performance of FFVAE in our experiments. 6. Discussion In this paper we discussed how disentangled representation learning aligns with the goals of subgroup fair machine learning, and presented a method for learning a structured latent code using multiple sensitive attributes. The proposed model, FFVAE, provides flexibly fair representations, which can be modified simply and compositionally at test time to yield a fair representation with respect to multiple sensitive attributes and their conjunctions, even when test-time sensitive attribute labels are unavailable. Empirically we found that FFVAE disentangled sensitive sources of variation in synthetic image data, even in the challenging scenario where attributes and labels correlate. Our method compared favorably with baseline disentanglement algorithms on downstream fair classifications by achieving better parity for a given accuracy budget across several group and subgroup definitions. FFVAE also performed well on the Communities & Crime and Celeb-A dataset, although none of the models performed robustly across all possible subgroups in the real-data setting. This result reflects the difficulty of subgroup fair representation learning and motivates further work on this topic. There are two main directions of interest for future work. First is the question of fairness metrics: a wide range of fairness metrics beyond demographic parity have been proposed (Hardt et al., 2016; Pleiss et al., 2017). Understanding how to learn flexibly fair representations with respect to other metrics is an important step in extending our approach. Secondly, robustness to distributional shift presents an important challenge in the context of both disentanglement and fairness. In disentanglement, we aim to learn independent factors of variation. Most empirical work on evaluating disentanglement has used synthetic data with uniformly distributed factors of variation, but this setting is unrealistic. Meanwhile, in fairness, we hope to learn from potentially biased data distributions, which may suffer from both undersampling and systemic historical discrimination. We might wish to imagine hypothetical unbiased data or compute robustly fair representations, but must do so given the data at hand. While learning fair or disentangled representations from real data remains a challenge in practice, we hope that this investigation serves as a first step towards understanding and leveraging the relationship between the two areas. Flexibly Fair Representation Learning by Disentanglement Alemi, A., Poole, B., Fischer, I., Dillon, J., Saurous, R. A., and Murphy, K. Fixing a broken elbo. In Proceedings of the 35th International Conference on Machine Learning, 2018. Aleo, M. and Svirsky, P. Foreclosure fallout: the banking industrys attack on disparate impact race discrimination claims under the fair housing act and the equal credit opportunity act. Public Law Interest Journal, 18(1):1 66, 2008. URL https://www.bu.edu/pilj/files/ 2015/09/18-1Aleoand Svirsky Article.pdf. Ardizzone, L., Kruse, J., Wirkert, S., Rahner, D., Pellegrini, E. W., Klessen, R. S., Maier-Hein, L., Rother, C., and K othe, U. Analyzing inverse problems with invertible neural networks. In International Conference on Learning Representations, 2019. Bach, F. R. and Jordan, M. I. Kernel independent component analysis. Journal of machine learning research, 3(Jul): 1 48, 2002. Barocas, S. and Selbst, A. D. Big data s disparate impact. Calif. L. Rev., 104:671, 2016. Bose, A. J. and Hamilton, W. L. Compositional fairness constraints for graph embeddings. Relational Representation Learning Workshop, Neural Information Processing Systems 2018, 2018. Botros, P. and Tomczak, J. M. Hierarchical vampprior variational fair auto-encoder. In ICML 2018 Workshop on Theoretical Foundations and Applications of Deep Generative Models, 2018. Buolamwini, J. and Gebru, T. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability and Transparency, pp. 77 91, 2018. Burgess, C. P., Higgins, I., Pal, A., Matthey, L., Watters, N., Desjardins, G., and Lerchner, A. Understanding disentangling in β-VAE. In 2017 NIPS Workshop on Learning Disentangled Representations, 2017. Chen, T. Q., Li, X., Grosse, R. B., and Duvenaud, D. K. Isolating sources of disentanglement in variational autoencoders. In Advances in Neural Information Processing Systems, pp. 2610 2620, 2018. Comon, P. Independent component analysis, a new concept? Signal processing, 36(3):287 314, 1994. DCCA. Division of consumer and community affairs. 2011-07. 12 cfr supplement i to part 202 - official staff interpretations. https: //www.law.cornell.edu/cfr/text/12/ appendix-Supplement I to part 202, 1983. Edwards, H. and Storkey, A. Censoring representations with an adversary. In International Conference on Learning Representations, 2016. Elliot, M. N., Fremont, A., Morrison, P. A., Pantoja, P., and Lurie, N. A New Method for Estimating Race/Ethnicity and Associated Disparities Where Administrative Records Lack Self Reported Race/Ethnicity. Health Services Research, 2008. Esmaeili, B., Wu, H., Jain, S., Bozkurt, A., Siddharth, N., Paige, B., Brooks, D. H., Dy, J., and van de Meent, J.- W. Structured disentangled representations. In The 22nd International Conference on Artificial Intelligence and Statistics, 2019. Hardt, M., Price, E., Srebro, N., et al. Equality of oppor- tunity in supervised learning. In Advances in neural information processing systems, pp. 3315 3323, 2016. Hebert-Johnson, U., Kim, M., Reingold, O., and Roth- blum, G. Multicalibration: Calibration for the (Computationally-identifiable) masses. In Proceedings of the 35th International Conference on Machine Learning, 2018. Hellman, D. Indirect discrimination and the duty to avoid compounding injustice. Foundations of Indirect Discrimination Law, 2018. Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., Mohamed, S., and Lerchner, A. In International Conference on Learning Representations, 2017. Hyv arinen, A. and Oja, E. Independent component analysis: algorithms and applications. Neural networks, 13(4-5): 411 430, 2000. Jacobsen, J.-H., Behrmann, J., Zemel, R., and Bethge, M. Excessive invariance causes adversarial vulnerability. In International Conference on Learning Representations, 2019. Kearns, M., Neel, S., Roth, A., and Wu, Z. S. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. In Proceedings of the 35th International Conference on Machine Learning, 2018. Kearns, M. J., Neel, S., Roth, A., and Wu, Z. S. An empiri- cal study of rich subgroup fairness for machine learning. In Conference on Fairness, Accountability and Transparency, 2019. Kim, H. and Mnih, A. Disentangling by factorising. In Proceedings of the 35th International Conference on Machine Learning, 2018. Flexibly Fair Representation Learning by Disentanglement Kim, M. P., Ghorbani, A., and Zou, J. Multiaccuracy: Black- box post-processing for fairness in classification. In AAAI Conference on AI, Ethics, and Society, 2019. Kim, S.-E., Paik, H. Y., Yoon, H., Lee, J. E., Kim, N., and Sung, M.-K. Sexand gender-specific disparities in colorectal cancer risk. World Journal of Gastroentorology, 21(17):5167 5175, 2015. Kingma, D. and Ba, J. Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015. Kingma, D. P. and Welling, M. Auto-encoding variational bayes. ar Xiv preprint ar Xiv:1312.6114, 2013. Kingma, D. P., Mohamed, S., Rezende, D. J., and Welling, M. Semi-supervised learning with deep generative models. In Advances in neural information processing systems, pp. 3581 3589, 2014. Kirchner, L., Mattu, S., Larson, J., and Angwin, J. Machine Bias: Theres Software Used Across the Country to Predict Future Criminals. And its Biased Against Blacks., May 2016. URL https://www.propublica.org/article/ machine-bias-risk-assessments-incriminal-sentencing. Klys, J., Snell, J., and Zemel, R. Learning latent subspaces in variational autoencoders. In Advances in Neural Information Processing Systems, pp. 6443 6453, 2018. Kusner, M. J., Loftus, J., Russell, C., and Silva, R. Coun- terfactual fairness. In Advances in Neural Information Processing Systems 30. 2017. Locatello, F., Bauer, S., Lucic, M., Gelly, S., Sch olkopf, B., and Bachem, O. Challenging common assumptions in the unsupervised learning of disentangled representations. In Proceedings of the 36th International Conference on Machine Learning, 2019. Louizos, C., Swersky, K., Li, Y., Welling, M., and Zemel, R. The variational fair autoencoder. In International Conference on Learning Representations, 2016. Madras, D., Creager, E., Pitassi, T., and Zemel, R. Learn- ing adversarially fair and transferable representations. In Proceedings of the 35th International Conference on Machine Learning, 2018. Moyer, D., Gao, S., Brekelmans, R., Galstyan, A., and Ver Steeg, G. Invariant representations without adversarial training. In Advances in Neural Information Processing Systems, pp. 9101 9110, 2018. Pleiss, G., Raghavan, M., Wu, F., Kleinberg, J., and Wein- berger, K. Q. On fairness and calibration. In Advances in Neural Information Processing Systems. 2017. Rothenhusler, D., Meinshausen, N., Bhlmann, P., and Peters, J. Anchor regression: heterogeneous data meets causality. ar Xiv preprint ar Xiv:1801.06229, 2018. Song, J., Kalluri, P., Grover, A., Zhao, S., and Ermon, S. Learning controllable fair representations. In The 22nd International Conference on Artificial Intelligence and Statistics, 2019. Zemel, R., Wu, Y., Swersky, K., Pitassi, T., and Dwork, C. Learning fair representations. In Proceedings of the 30th International Conference on Machine Learning, 2013.