# actionable_ethics_through_neural_learning__d72410a6.pdf

The Thirty-Fourth AAAI Conference on Artiﬁcial Intelligence (AAAI-20)

Actionable Ethics through Neural Learning

Daniele Rossini,1 Danilo Croce,2 Sara Mancini,1 Massimo Pellegrino,1 Roberto Basili2

1Pricewaterhouse Coopers Italy 2University of Rome, Tor Vergata {daniele.rossini, sara.mancini, massimo.pellegrino}@pwc.com {basili, croce}@info.uniroma2.it

While AI is going to produce a great impact on society, its alignment with human values and expectations is an essential step towards a correct harnessing of AI potentials for good. There is a corresponding growing need for mature and established technical standards to enable the assessment of an AI application as the evaluation of its graded adherence to formalized ethics. This is clearly dependent on methods to inject ethical awareness at all stages of an AI application development and use. For this reason we introduce the notion of Embedding Principles of ethics by Design (EPb D) as a comprehensive inductive framework. Although extending generic AI applications, it mainly aims at learning the ethical behaviour through numerical optimization, i.e. deep neural models. The core idea is to support ethics by integrating automated reasoning over formal knowledge and induction from ethically enriched training data. A deep neural network is proposed here to model both the functional as well as the ethical conditions characterizing a target decision. In this way, the discovery of latent ethical knowledge is enabled and made available to the learning process. The application of the above framework to a banking application, i.e. AI-driven Digital Lending, is used to show how accurate classiﬁcation can be achieved without neglecting the ethical dimension. Results over existing datasets demonstrate that the ethical compliance of the sources can be used to output models able to optimally ﬁne tune the balance between business and ethical accuracy.

Introduction

Penetration of Artiﬁcial Intelligence systems into everyday life promises major changes and the opening of new opportunities (Craglia 2018). However, this enthusiasm also brings concerns about the risks it poses on human society about chance of misuse. Unacceptable behaviours are triggered by several issues, ranging from design misspeciﬁcations (Amodei et al. 2016), to limited robustness with respect to adversarial attacks (Goodfellow, Shlens, and Szegedy 2014) to unfair treatments (O Neil 2016) and controversies on AI experimentation itself (Bird et al. 2016). As the alignment with human values and expectations is an essential step towards a correct harnessing of AI potential for good (Smuha 2019), research about ethics in AI aiming at mitigat-

Copyright c 2020, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

ing ethics issues is an active area (Bostrom and Yudkowsky 2014; Boddington 2017). Performing audit-like, i.e. post-hoc, ethic validation on a deployed AI system is certainly a possible approach, but it hardly constitutes a reliable guarantee: the space of possible input states, especially in evolved systems, may be too big to allow for exhaustive explorations. Moreover, the conditions between testing and real scenarios may inherently exhibit signiﬁcant discrepancies or the required data may be insufﬁcient or unavailable. For example, let us consider a bank launching a Digital Lending solution: it offers short term loans, by exploiting a machine learning algorithm based on the risk associated with the user proﬁle, hence granting or denying the loan. Here, the ethical implications span many dimensions, e.g. fairness, transparency and data privacy, all socially relevant aspects. Worse, a satisfactory a-posteriori validation would be hard. For example, it would be complex to assess the system s performance on false negatives, e.g. rejected requests for lack of data about their ﬁnancial history as information would likely not be available after bank rejection. While it seems mandatory to guarantee the adherence to acceptable levels of ethical compliance, this goal is clearly dependent on methods to inject ethical awareness at all stages the development and use of an AI application. For this reason, we consider for the notion of Embedding Principles of ethics by design (EPb D) for a target AI application. In this work, we thus propose a framework for EPb D that, although extending generic AI applications, mainly focuses on the learning of the ethical behaviour by numerical optimization, i.e. through a deep neural model (Goodfellow, Bengio, and Courville 2016). The core idea is to model ethics as automated reasoning over formal descriptions of the AI system, e.g. based on ontologies, but making it available during the learning stage. Note that our approach does not induce an ethical set of rules from a collection of observable behaviours; it is rather the opposite. In fact our approach gives for granted an explicit formulation of ethical principles (as done for example in previous work, (Bonnemains, Saurel, and Tessier 2018; Vanderelst and Winﬁeld 2018)) and focuses on a form of ethical learning as external alignment (learning from others, (Kleiman Weiner, Saxe, and Tenenbaum 2017)). It uses ethical evidence inferred from an ethical ontology to guide the model selection in deep learning. The resulting deep neural net-

work here proposed jointly models the functional as well as the ethical conditions characterizing the underlying decision making. In this way, the discovery of latent ethical knowledge, i.e. hidden information in the data that is meaningful under the ethical perspective, is enabled and made available to the learning process. Instead of relying on simulation to proceed in ethical decisions (Vanderelst and Winﬁeld 2018), in our framework the speciﬁc learning goal is the integrated acquisition of high quality inference abilities that simultaneously reﬂects ethical expectations. The target is a learning machine able to select the best decisions among those that are also ethically sustainable. The objective is achieved through enriching the original input space with dimensions corresponding to ethical properties, obtained through further reasoning or discovery over the input features, in order to reformulate the learning function so that it leads to prefer decisions as trade-off choices between operational efﬁciency and ethical compliance. Speciﬁc loss functions depending on ethic principles are introduced to account for compliance to the reference Knowledge Bases and they are used into a multitask learning framework to jointly optimize the model. The rest of this work is organized as follows. First, we introduce the concept of Embedding Principles of Ethics by Design. Then, we discuss how such notion can be specialized for the neural learning paradigm and propose a model, the Ethical by Design Neural Network, that is able to accommodate ethical learning. Last, we present results from experimental investigation in the case of a Digital Lending task and point to future research area.

Computational Ethics: Embedding Ethical Principles by Design

Ethics does not constitute a monolithic and coherent ensemble of concepts and norms: expectations over acceptable or unacceptable behaviors greatly diversify across nations, communities and industry sectors, often generating tensions between ethical principles and opposing hierarchies of values (Awad et al. 2018). In general, the following knowledge should be supplied: a top ontology, describing commonsense knowledge and concepts that are cross-domains (e.g. the concept of PERSON, GENDER, ...); a business domain ontology, describing task-speciﬁc concepts (e.g. LOAN), such as the FIBO ontology (Bennett 2013) w.r.t. the lending use case targeted in this work; a socio-political component, in which speciﬁc situations regarding the cultural context should specialize all the others; an ethical component deﬁning core norms and constraints for ethical behaviours based on domain and social concepts. A requisite of any ethical framework in AI, is the availability of the ethical component, that we call here Ethical Ontology EO. It provides a description of the data the AI systems is trained on, the corresponding concepts and individuals in the business domain and the corresponding ethics that rule business decisions. Ethics should allow at least to sort any decision of the targeted AI system according to degrees of ethicality . It can be modeled as a set of Abstract Ethical Principles, denoted by EΓ, where Γ is a propositional

logic formula to be read as: EΓ is an ethical principle in force or alternatively The agent considers it unethical to allow or cause Γ (to happen) . Consequently, the Ethical Ontology (EO) is organized into a set of Ethical Dimensions whose effects is to determine the properties, i.e. Ethical Features EF, of individual decisions. While business features are the observable properties, e.g. SEX, RELIGION, or AGE of a person requesting a lend, examples of ethical features are connected to abstract notions such as SOCIAL INCLUSIVENESS or GENDER EQUALITY. The abstract ethical principles must be enforced through Ethical Rules: these constraint individual features and determine the degree of ethicality of principles over their domains. Ethical Rules usually target (i.e. deﬁne and manipulate) one or more features and assign values (or better, establish some probabilty distributions) across the feature domains. These rules are termed as truth-makers T M as they account the possibly uncertain ethical state of the world regarding individual decisions. Ethical models are thus distributions across (usually discrete) domains, whose values are useful to specify thresholds and ethical ranges: these suggest when deviations from the underlying high-level principles become unacceptable. Ethical features usually reﬂect context and the dataset s properties (e.g. Gender in the Lending use case) onto which Ethical rules (such as Gender Prejudice) constrain sensitive information. An ethical features is characterized by a domain and by an inner topological structure, i.e. the admissible values and usually a graded estimates of their acceptance levels. In the proposed computational ethics scenario, ethical rules thus trigger truth-makers to automatically compute the basic distributions of ethical features over the underlying domains. Ethical assessment is thus a two step approach: ﬁrst, Truthmakers are used to reason about the ethical features and then the overall ethical status, as function of the overall set of ethical features, is determined. In the ﬁrst step the ethical signature of an instance is derived and in the second step its ﬁnal ethical status is computed. A probabilistic approach is here adopted: probability mass functions over the related domains describe the individual features in the ethical signature and then support the ﬁnal acceptability decision. In the next sections we will formally deﬁne a quantitative model for these ethical aspects able to support optimization criteria for neural induction.

Neural Learning under ethical constraints

A learning machine usually searches for the hypothesis function h( x; θ) which is the best approximation of some target function, according to two major principles. Accuracy is the function measuring the adherence of the hypothesis h with the target concept: h() is designed to minimize the empirical error, i.e., the error on training data, h( x; θ) = y. Moreover, h() must be as simple as possible in order to avoid the overﬁtting on the training evidence. Regularization is the principled imposed to suitably select the model from the function family: here constraints on the parameter vector θ are imposed.

In analogy with the above view, we introduce a further dimension that we call Ethicality. We propose to model the ethical principles as constraints in the selection of an hypothesis h( x; θ). In other words, a machine learning-based agent can be made ethical (by design) only if the process used to enumerate and select useful hypothesis functions is constrained to make use of ONLY the ones that are ethically sustainable. This gives naturally rise to a multitask view since the learning task of replicating business decisions is different with respect to the learning ethically sustainable decisions. A joint approach is here proposed based an a speciﬁc different formulation for loss functions. DEFINITION: (Ethical Loss function). Given the response h( x; θ) of a learning machine to a training instance ( x, y), the loss L y, h( x; θ) of a Embedding Principles of ethics by design (EPb D) approach is made by two independent components, i.e.,

L y, h( x; θ) = LF y, h( x; θ) + βLE y, h( x; θ)

where LF is the monotonic non decreasing function minimizing (at least) the empirical error of h(.; .) and LE is an ethical cost function that estimates the compliance of h( x; θ) to ethical principles. In order to model the ethical cost function LE() we need a quantitative deﬁnition for ethical features as they are represented by the Ethical Ontology EO.

The essence of ethical features The i-th training instance is described by a set of attributes fj(i), i.e., its observable features such as AGE, and correspond to a classiﬁcation d(i) {C1, , CK}, giving rise

to a pair f1(i), . . . , fn(i) , d(i) = f(i), d(i) . These properties describe cases and trigger ethical issues, i.e. world states in speciﬁc conditions: risks, as for example the unfairness implied by refusing lend assignments to minorities (e.g. women) as well as opportunities, such as the impact of lending on the well-being of special social categories (e.g. women with children). Notice that one ethical attribute (e.g. unfairness w.r.t. minorities) depends in general on multiple observable variables (e.g. SEX or NUMBEROFCHILDREN) and are not fully independent of each other. First, we thus need a speciﬁc and separated set of further features e(i) = (e1(i), . . . , em(i)), modeling explicitly such ethical aspects. Here e(i) describes the general ethical judgment about an individual case i and is the result of ethical reasoning over a case f(i) and its decision d(i). Two different classes of ethical features, i.e. ethical risk factors and ethical opportunities, can be deﬁned as they play different roles in ethically biased training. Ethical risk factors, denoted by e r(i) = er 1(i), . . . , er k(i) , are individual ethical dimensions of world states that must be avoided in order to meet ethical constraints. Risk factors are features whose quantitative assignment is to be minimized in order to meet ethical expectations. Ethical opportunities correspond to aspects world states that must be favoured in order to meet ethical constraints. Opportunity level factors, denoted by e o(i) = eo 1(i), . . . , eo k(i) , are features (e.g. GENDER

EQUALITY) whose quantitative assignment is to be maximized in order to meet ethical expectations. Ethical induction depends on how risks and opportunities contribute to the overall ethical signature es(i) of an individual case i. The training data set T includes a reference (gold) feature vector i = f(i) || es(i) that concatenates the original evidences f(i) with es(i) = e r(i) || e o(i) expressing all the ethical implications of EO against the decision d(i). The enriched training instances form the overall ethically enriched training set T eth, deﬁned as:

Teth = i, d(i) i T = f(i) || es(i) , d(i) i T

that can suitably support multitask, i.e. business and ethical, learning. Notice that the Ethical status of an instance i can be derived as a function of the es(i) vector: ethical states are a discrete set of categories deﬁned by thresholding over risks and opportunity distributions. In order to synthesize the ethical description of an instance, the overall beneﬁt and risk of an instance form a pair of stochastic variables (B, R) whose values are derived from the probability distributions of individual opportunity levels (eo j) and risk factors (er k), respectively. In future, trained systems are expected not to promote/suggest decisions d(i) that result in an ethical status of future instances i that is not less than mildly ethical. This graded judgment will be made dependent on the (B, R) states derived from the probability distributions in the signature es(i).

Ethical Features and Inductive Reasoning

Risk factors and Opportunity levels, described by es(i), express how individual observable features fj(.) trigger ethical aspects. Truth-makers in the ethical ontology EO act on observable features fj(i) (e.g., SEX = female ) and determine corresponding values onto ethical features (e.g. the esj that represents the j-th ethical dimension). These assignments are determined by complex reasoning chains possibly depending on multiple features or multiple instances. Individual risks and opportunities correspond to dimensions that can be multiply assigned by different truth-makers. Probabilistic restrictions over the domains of risks and opportunities allow to vectorially represent the ethical signature. Whenever an instance i T eh activates one or more rules in EO, the truth-makers set the corresponding k-th ethical opportunity or risk factor esk(i) to the predicted status of the k-th ethical dimension. Multiple rules may affect the same ethical factor and a cumulative effect is obtained. We thus model the ethical signature vector with as many values as they are foreseen in the corresponding domain of a risk and opportunity factor: if B is the number of opportunities, R is the number of risks and V is the number of values in their domains, the overall number of ethical risk and opportunity dimensions is (B + R) V . A pair instance-decision implies ethical consequences, i.e., ethical risks and ethical opportunities, that are not hardcut. They can be captured by graded judgments along the ethical dimensions, e.g., probability distributions over the

reference domain. While other design choices are in principle possible, we propose to discretize every ethical dimension in the same domain V deﬁned by a ﬁnite, closed and ordered set: V = {vi R : 0 v1 < ... < vm 1}. In particular, for both beneﬁts and risks, we ﬁxed m = 5 and limit values in the [0, 1] range. The following ﬁve labels can be adopted { VERY LOW , LOW , MILD , HIGH , VERY HIGH } corresponding to the numerical values v1 = 0.1, v2 = 0.25, v3 = 0.5, v4 = 0.75 and v5 = 0.9. The role of truth-makers. Truth-makers are the rules of the EO ontology that actively determine the ethical proﬁle of the instance-decision (i, d(i)) pair. In particular, given a pair i, d(i) , a truth-maker tm will determine a probability distribution to the set of beneﬁt and risk dimensions. For every tm, ethical dimension ej(i) and possible ethical value vk V the following probability is deﬁned:

P ej(i) = vk | i, d(i) , tm j, k = 1, . . . , 5

which expresses the evaluation of the truth-maker tm onto the instance i given the decision d(i), along the k-th value of the j-th ethical dimensions. A truth-maker thus assigns probabilities to the ethical signature of an individual i for all possible combinations of business characteristics f(i) and decisions d(i)1. Multiple truth-makers can contribute to a given ethical feature ej(i) individually biasing their overall probability P(ej(i)). When all truth-makers are ﬁred, the resulting ethical signature over an instance i and its decision d(i) consists j, k:

tm P tm|EO P ej(i) = vk | i, d(i) , tm

Notice that when a training instance is deﬁned, the unique decision d(i) is available and one unique ethical signature is the result. During classiﬁcation no ﬁnal d(i) is available and the estimates of the ethical implication must be available for all the different target classes, d1, ..., dl. From signatures we can then express the ﬁnal ethical status. Notice also that individual decisions over input i correspond to probabilities along all the dimensions determined by decisions, risks and opportunities. A factor yljk estimates the probability of the joint event d(i), B, R corresponding to i. By assuming independence, each element yljk estimates the following:

yljk = P d(i) = dl P B = vj P R = vk

= (shortened as) P(dl) Bj Rk (1)

The collective beneﬁt B is obtained as a joint probability distribution:

Bj = P B = vj =

t=1 P eo t(i) = vj| i, d(i) P eo t(i)

1If no truth-maker is triggered by an instance the uniform probability distribution u is used, i.e. P ej(i) = vk| i, d(i) , tm = 1 m, over the values vk and different ethical features, i.e. j, k .

where P eo t(i) is the probability of the t-th ethical feature in describing the collective beneﬁt B. Similarly, risk R is modeled as joint probability distribution whose component are deﬁned by:

Rk = P(R = vk) =

t=1 P er t(i) = vk| i, d(i) P er t(i)

Variables yljk control the impact of risks and opportunities during training and can be assigned to speciﬁc neurons. Gold standard for Ethics: Ethical landmarks. Given the ethical signature es(i) of an instance i, we can reason about its ethicality. Two speciﬁc points in the ethical domain can be deﬁned as references for a quantitative measure of ethical sustainability and unacceptability. The probability mass function reserving most of the probability to v5 = VERY HIGH to ethical beneﬁts while minimizing the probability of ethical risks v1 = VERY LOW is by deﬁnition the ethical optimum (OPTeth). Dually, we deﬁne the ethical minimum (MINeth) as the probability distribution that reserves most probability to the minimum opportunity value, v1 and maximal probability to the maximal risk, v5. During training, every ethical signature is optimized to be close to the ethically optimal and far from the ethical minimum. DEFINITION: (Ethical compliance). An instance-decision pair ( i, d(i)) is ethically compliant to EO iff:

dist es(i, d), MINeth dist es(i, d), OPTeth

where es(i, d) is the ethical signature of i given the decision d and dist is a valid distance over probability distributions.

Embedding Ethics as Multitask Neural Learning Once a quantitative model for ethics is available through observable features, risk factors and opportunity levels as probability distributions across ﬁnite domain V, neural learning is enabled. An ethical neural architecture should be able to use dependencies among observable features as triggers of the target business decisions but also to actively recognize dependencies between ethical and observable features, i.e. ethical consequences implied by some features. In this perspective, back-propagation has the aim of optimizing both the business accuracy and the ethical compliance. For this reason, we propose the adoption of a multistrategy learning approach with the cascading (i.e. stacking) of different (sub)networks. The proposed network is composed of 3 main processing stages, as shown in Figure 1. In the ﬁrst stage the input vector x is fed to a series of fully connected layers, namely the Ethics encoder. Its role is to learn combinations of input features able to capture relationship between business observations and, possibly, their ethical consequences (i.e., ethical features). Later stages of the network can exploit the effective ethics encoding without resorting back to the EO. This component is not directly optimized through a loss function but, rather, it receives penalties by back-propagation from later layers. It can be seen as a sort of pre-training stage. Formally, it corresponds to: Φ( x) = g1(W1 x + θ1) = ˆy1 Rd1 where W1 R(n+2mj) d1, θ1 Rn+2mj are parameters to be optimized, d1 is a network meta-parameter. The second stage

comprises two MLPs that are independently trained to learn two different tasks: estimating the correct decisions distribution, under the sole business perspective, and to reconstruct the ethical consequences of such decisions. The Business Expert DNN and the Ethics Expert DNN are responsible for the ﬁrst and the second task, respectively. Note that they receive the same input, that is the vector emitted from the ﬁrst stage of the architecture. The Business Expert (BE) DNN. As it s entrusted with emitting business decisions without any direct penalization for the unsatisfactory ethical consequences, it can be seen as the ﬁnal layers of an ethics-agnostic sub-network, modeled as BE Φ( x) = BE ˆ y1 = g2 W2ˆ y1 + θ2 = ˆ y2 RK where K is the number of output categories, i.e. decisions. The estimator is then optimized by a standard cross-entropy loss function over the predicted distribution ˆ y2 against the gold distribution d(i) = y B. The Ethical Expert (EE) DNN. Its role is to reconstruct the ethical signature for each pair (xinput, d). It processes the encoding from the ﬁrst stage and it outputs a vector which represents the joint probability of the triplet (decision, benefit, risk) under maximal entropy of the business decisions distribution and independence assumption. Here, the EE is modelled as EE(Φ( x)) = EE(ˆ y1) = g3(W3ˆ y1 + θ3) = ˆ y3 RK m2 where K is the number of possible decision and m is the number of possible values for ethical beneﬁts and risks. As in Equation 1, each element ˆyijk 3 in the output vector should reconstruct yijk 3 = ud P(eo j) P(er k) where ud is the expected value of the uniform distribution over the possible decisions and the probabilities for beneﬁts and risks are the ones in the corresponding ethical signature. Then, the cross-entropy loss function LEr is applied to compute the ethics recognition loss over the predicted ˆ y3 against the gold distribution encoded in the vector y3. Ethics-aware (EA) Deep Neural Network. Similarly to the EE network, it is responsible for estimating the joint probability of each possible triplet (di, bj, rk). However, here P(D = di) is directly derived from the gold standard while the probabilities for beneﬁts and risks are extracted from OPTeth for ethically compliant decisions and MINeth for not compliant ones, i.e.:

yijk 5 = P(di) P(bopt j ) P(ropt k ) (if ( x, di) D+)

yijk 5 = P(di) P(bmin j ) P(rmin k ) (if ( x, di) D )

where D+ and D are the set of ethically and not compliant decisions for x, respectively, according to EO. Overall, this sub-network is described by

EA([ˆ y2; ˆ y3]) = EA(ˆ y4) = g4(W4ˆ y4 + θ4) = ˆ y5 RK m2

At this stage, as for the Ethics Expert, the error is updated by computing the Ethical Loss LE, which is again the cross entropy between y5 and ˆ y5. Note that this formulation is not directly promoting ethically sustainable decisions but it is rather encouraging the network to pair them with highlybeneﬁcial and low-risk ethical consequences.

The ﬁnal business decision of our network is determined by a decision policy over risks and opportunities. Here we deﬁne two possible policies:

Ethics-Unconstrained (EU) policy. The ﬁnal decision di is derived simply by summing up all probability contributions of the triplets (i, j, k) where i is ﬁxed, i.e.,

ˆd EU i = argmaxi PEU(di) (2)

Ethics-Constrained (EC) policy. Here a probability P(di, bej, rik) contributes to P(di) only if bej, rik satisfy some membership constraints, i.e.,

ˆd EC i = argmaxi PEC(di)

vk V P(di, vj, vk) (3)

where we set V = { HIGH , VERY HIGH } and V = { LOW , VERY LOW }.

As we will see in the experimental evaluation, the above network is able to learn from a business point of view (through the loss LF ) consistently with the EO (through the ethical loss LE), while promoting ethically sustainable business decisions.

Balancing business and ethical adequacy Different contexts and applications may require different trade-offs between the prescriptions from the ethics system and the behavioural patterns induced from historical data. While it may be possible to balance such trade-offs by changing the distributions of beneﬁts and risks derived from the truth-makers, it would be cumbersome from a practical point of view and, more importantly, it may make it difﬁcult to compare different models, as the underlying feature spaces would be related but different. More manageable methods, as discussed here, consist in acting only on the joint distributions used to train the EA-DNN. Smoothing business decision. Gold standards usually provide unique decisions, that is sharp probability distributions across business decisions. However, they are not guaranteed to be ethical. Smoothing is needed to allow the neglecting of unethical cases so that probabilities for decisions di different from the gold standard decisions are non zero. Laplace smoothing is applied across the K different classes, expressed by: ˆdi = di+α 1+Kα. Tweaking between Ethics and Business accuracy. Through similar a technique, it s possible to tune the emphasis of ethical consequences by applying a tweaking factor β to the probability of beneﬁts and risks in the joint probability of the (d, B, R) triple, i.e.,:

Pβ di, Bj, Rk = P d(i) = dl P(B = vj) P(R = vk) β

Here the inﬂuence of ethics turns weaker as β 0. The above equation corresponds to the input of the network and establishes the inﬂuence onto the NN of the ethical information through the corresponding impact on loss functions LEr and LE.

Figure 1: The architecture of the Ethical by Design Neural Network.

Experimental Evaluation: Ethical Risk Assessment in Banking

We run extensive evaluation of the proposed framework on the German Credit dataset2 (GC). Here the task is to predict whether a loan request carries a low (C0) or high (C1) risk of default (i.e., the requester not paying back the loan) based on 20 different attributes, some of which are domain-speciﬁc (e.g., PREVIOUS CREDIT HISTORY or ACCOUNT BALANCE) and others are more general (e.g., AGE of the requester or the NUMBOFPEOPLEUNDERMAINTENANCE). Despite its small dimension (only 1000 instances) and strong class unbalance (700 instances labeled as low C0 proﬁle), this dataset is appealing to test ethics learning approaches as it represents a real-world problem (King, Feng, and Sutherland 1995) and offers many attributes upon which ethical rules can be deﬁned. We deﬁned and experimented on different ethical policies and truth-makers but, due to space limitations, we will focus on one particularly simple ethic ontology (EO1), which includes two truth-makers: MOTHERHOOD FOSTERING (tm MF ), favouring (lending decisions representing) women with children and, to a lesser extent, men with at least 2 children, and CULTURAL INCLUSIVENESS (tm CI), favouring foreign workers. Details on the truth-makers are in Appendix3. Ethical values V = {0.1, 0.25, 0.5, 0.75, 0.9} are used to express V = { VERY LOW , LOW , MILD , HIGH , VERY HIGH } . Due to the strong unbalance between the target classes (70%-30%), we report business performances according to the average F1-measure, μF1, as: μF1 = F 1C0+F 1C1

2 . The overall ethical compliance EComp of the data set, given the ontology EO1, is computed as the percentage of ethically complaint instances, according to the gold standard decision, i.e. D+ D++D . It corresponds to the EComp = 0.70 that suggests that historical data alone cannot be used to promote ethics. It is clear how the joint adherence to the EO1 ethics and to business optimality requires a complex trade-off. It requires in fact neglecting some training cases to improve upon ethical compliance. A possible straightforward measure of the trade-off between ethics and business accu-

2Publicly available from the University of California-Irvine machine learning repository (Dua and Graff 2017). 3Supplemental material in the submitted version.

racy is the parametrized Acceptability factor EAccγ as the weighted average between the μF1 and the ethical compliance EComp: EAccγ = γ μF1 + (1 γ) EComp (4) where γ [0, 1] can be adjusted according to the relative importance of the two terms. The EAccγ measure, when the superiority of ethics is imposed by γ = 0.2 over the GC dataset, provides the strong baseline for ethical training given by EAcc.2 = 0.76. Such gold standard EAccγ is a useful reference measure to compare ethical neural models. Experimental Set-Up. The chosen architecture for the Eb DNN has an Ethics Encoder with 2 layers, where the ﬁrst layer has the same size of the input and the second has dimension 400, the Business Expert has 1 layer with output dimension equals to K. Both the Ethics Expert and the Ethics-Aware DNN have 1 layer with K m2 neurons (where K is the number of classes and m the number of ethical values). Non-linearity is applied through the relu function at each layer, except for the last layer in each component associated with a loss function, where a softmax is computed. A dropout rate of 0.2 on each layer is applied. To cope with the limited number of instances, we applied 10-fold cross validation, training each model for 1000 epochs with a standard batch size of 256 through Adam optimizer4. Various settings of the the smoothing and tweaking factors (α, β) {0.1, 0.3, 0.6, 1.0} {0.01, 0.05, 0.10, 0.20, 0.35, 0.50, 0.75, 1.00} have been applied to systematically study their impact. We fed each model alternatively with the enriched input vector, i.e., f(i) || es(i) or only with business observable f(i). No signiﬁcant difference has been observed as the EE-DNN seems able to robustly reconstruct ethical signatures across all settings. In the rest of the experiments, we thus trained models only over f(i). To provide a fair comparison with a standard learning framework, a simple Multi Layer Perceptron (MLP) with 2 layers and 320 units per layer has been trained, over f(i) only: it achieves an accuracy of 76.21% comparable to state-of-the-art results on this dataset (Ratanamahatana and Gunopulos 2002). It corresponds to a business performance of μF1 = 66.4% with an ethical compliance EComp = 77.8%. The ethical acceptance is thus

4Models were implemented in python using Tensorﬂow.

System (α, β) μF1 Eth Compl EAcc0.2 EAEU(0.3, 0.5) 63.1% 79.6% 76.3% EAEU(0.3, 0.01) 63.9% 78.8% 75.8% EAEC(0.1, 0.5) 41.2% 100.0% 88.2% EAEC(0.1, 0.2) 53.8% 93.0% 85.2% EAEC(0.1, 0.01) 61.7% 78.4% 75.1% EAEC(0.3, 0.1) 60.6% 85.1% 80.2% MLP 66.4% 77.8% 75.5%

Table 1: μF1, Eth Compl and EAccγ (γ = 0.2) for different conﬁgurations of the EA model.

Figure 2: The trends of the Ethical Compliance EComp of the outcome of the EA-DNN as a function of the tweeking β. While MLP and Gold Standard refers to ethically unaware methods, plots represent several smoothing α parameters.

EAcc.2 = 75.5% that does not improve on the gold standard, as expected: it provides a second comparative reference as ethical unaware system. Evaluating ethical aware learning. Table 1 reports the performances of both the baseline MLP and of the EA models, under different α,β settings and decision policies. The tradeoffs between ethical and business performances is largely improved by EA models for all the conﬁgurations. Gains in ethical compliance of EA models w.r.t. baselines are significant while business performance losses are relatively small. The effect of both factors (α, β) is observed in Figure 2. As β increases, ethics plays a stronger role and the model s behaviour deviates from a purely business-driven predictor. The smoothing factor α plays a complementary role: stronger smoothing actions corresponds to markedly more ethical behaviours, even for smaller β. Notice how, even for high α values at lower β s ( 0.1) every EA models starts to exhibit unethical choices. The fully enforced ethics network EAEC with (α, β) = (.1, 0.5) achieves the maximal Eth Compl with less than 20% loss in terms of μF1. Note that, the unconstrained decision policy, i.e., the EAEU model, is not sensitive to the tweaking factor, as for β = 0.5, 0.2 the performance is basically the same. Figure 3 plots Ethical Acceptability EAccγ (with γ = 0.2) restricted to the test cases where the MLP provides non ethical decisions. The robustness of ethical aware networks is striking. The progressive deviation from ethical sustainability is visually captured in Fig 4, where the ethical signature of each

Figure 3: EAccγ for γ = 0.2 of the EA model over non ethical decisions of the gold standard: performances of constrained (EAEC), unconstrained (EAEU) networks and the baseline MLP are reported against Eth Compl and μF1 values.

prediction is projected on the plane with axis (E[be], E[ri]) (the point size is proportional to the number of projections falling in that point): as β decreases, more and more points are mapped in the semi-plane where the expected value of risk is higher than the expected beneﬁt. Overall, the experimental evaluation conﬁrmed that the embedding of ethics principles into the decision function of the model can be effectively modulated through the ﬁne tuning of β, and to lesser extent α, and the application of the proper decision policy.

Figure 4: Projections of ethically-constrained EA-DNN s predictions for different β = 0.5 (top) and 0.01 (bottom) values: size is proportional to the number of projected predictions. The blue dashed line corresponds to ethically neutral choices, i.e., expectation about beneﬁts is equal to risks, hence the upper left half-plane includes all ethically non compliant decisions.

Conclusions In this work, we propose a deep learning framework to achieve the acquisition of high quality inferences that simultaneously reﬂects ethical expectations. Experimental evaluation suggests the framework to be effective as well as to allow the ﬁne-tuning of the balance between business

and ethics perspective, through the smoothing and tweaking methods. This work represents an early exploration of the framework potential, hence future directions are rich as they range from the deﬁnition of more complex ethics and the application to more challenging inference tasks as well as different learning paradigm (e.g., Reinforcement Learning).

Appendix A: a Simple Ethical Ontology for the Lending Case During experimental evaluation, we tested the learning framework against a very essential ethics EO1, enforced by 2 truth-makers: MOTHERHOOD FOSTERING (tm MF ) and CULTURAL INCLUSIVENESS (tm CI), both promoting the low risk-proﬁle decision as ethically preferable when certain conditions are met. If the instance description doesn t satisfy the conditions of any rule, then the default behaviour is to assign (:=) probability distributions centered in (B := MILD, R := MILD). In the following, req stands for the loan request, (B := vi, R := vj) indicates the assignments of gaussian distributions centered in vi and vj to beneﬁts B and risks R, respectively. Due to limited space, here we report the details of tm MF only:

MOTHERHOOD FOSTERING : sex(req, female) maintained People(req, X) X 2

loan Risk(req, low)

(B := VERY HIGH), (R := VERY LOW) sex(req, female) maintained People(req, 1)

loan Risk(req, low)

(B := HIGH, R := LOW)

sex(req, female) maintained People(req, X) X 2

loan Risk(loan, low)

(B := HIGH, R := MILD)

sex(req, female) maintained People(req, X) X 2

loan Risk(req, high)

(B := VERY LOW, R := VERY HIGH)

sex(req, female) maintained People(req, 1)

loan Risk(req, high)

(B := LOW, R := HIGH)

sex(req, female) maintained People(req, X) X 2

loan Risk(req, high)

(B := LOW, R := MILD) Consider, for examples, the two instances partially represented in Table 2,i.e. a case (1) representing a female requester with 1 child and associated with an high risk proﬁle and a case (2) representing a male requested with 2 children and associated with a low risk proﬁle. Then, ethical signatures esi that are derived from the triggering of rule (6) and (5) in tm MF , respectively, will promote the decision regarding (2) as beneﬁcial (i.e., high ethical beneﬁt and low ethical risk), while they will penalize with similar intensity the decision regarding case (1).

REQ SEX MANTAINEDP. LOANRISK 1 female 1 high C1 2 male 2 low C0

Table 2: Two (partial) examples of instances in the German Credit dataset.

References Amodei, D.; Olah, C.; Steinhardt, J.; Christiano, P. F.; Schulman, J.; and Man e, D. 2016. Concrete problems in ai safety. ar Xiv preprint ar Xiv:1606.06565. Awad, E.; Dsouza, S.; Kim, R.; Schulz, J.; Henrich, J.; Shariff, A.; Bonnefon, J.-F.; and Rahwan, I. 2018. The moral machine experiment. Nature 563. Bennett, M. 2013. The ﬁnancial industry business ontology: Best practice for big data. Journal of Banking Regulation 14(3-4):255 268. Bird, S.; Barocas, S.; Crawford, K.; and Wallach, H. 2016. Exploring or exploiting? social and ethical implications of autonomous experimentation in ai. In Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT-ML), New York University, 4. Boddington, P. 2017. Towards a Code of Ethics for Artiﬁcial Intelligence. Springer. Bonnemains, V.; Saurel, C.; and Tessier, C. 2018. Embedded ethics: some technical and ethical challenges. Ethics and Information Technology. Bostrom, N., and Yudkowsky, E. 2014. The ethics of artiﬁcial intelligence. Cambridge University Press. 316 334. Craglia, M. 2018. Artiﬁcial Intelligence: A European Perspective. Publications Ofﬁce of the European Union. Dua, D., and Graff, C. 2017. UCI machine learning repository. Goodfellow, I.; Bengio, Y.; and Courville, A. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org. Goodfellow, I. J.; Shlens, J.; and Szegedy, C. 2014. Explaining and harnessing adversarial examples. In International Conference on Learning Representations. King, R. D.; Feng, C.; and Sutherland, A. 1995. Statlog: comparison of classiﬁcation algorithms on large real-world problems. Applied Artiﬁcial Intelligence an International Journal 9(3):289 333. Kleiman-Weiner, M.; Saxe, R.; and Tenenbaum, J. B. 2017. Learning a commonsense moral theory. Cognition 167:107 123. Moral Learning. O Neil, C. 2016. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. New York, NY, USA: Crown Publishing Group. Ratanamahatana, C. A., and Gunopulos, D. 2002. Scaling up the naive bayesian classiﬁer: Using decision trees for feature selection. In Workshop Data Cleaning and Preprocessing (DCAP 2002), at IEEE Int l Conf. Data Mining, ICDM 2002. Citeseer. Smuha, N. 2019. Ethics guidelines for trustworthy AI. Publications Ofﬁce of the European Union. Vanderelst, D., and Winﬁeld, A. 2018. An architecture for ethical robots inspired by the simulation theory of cognition. Cognitive Systems Research 48:56 66. Cognitive Architectures for Artiﬁcial Minds.