# unit_selection_with_causal_diagram__70557730.pdf Unit Selection with Causal Diagram Ang Li,1 Judea Pearl1 1 Cognitive Systems Laboratory, Department of Computer Science, University of California, Los Angeles, Los Angeles, California, USA. {angli, judea}@cs.ucla.edu The unit selection problem aims to identify a set of individuals who are most likely to exhibit a desired mode of behavior, for example, selecting individuals who would respond one way if encouraged and a different way if not encouraged. Using a combination of experimental and observational data, Li and Pearl derived tight bounds on the benefit function - the payoff/cost associated with selecting an individual with given characteristics. This paper shows that these bounds can be narrowed significantly (enough to change decisions) when structural information is available in the form of a causal model. We address the problem of estimating the benefit function using observational and experimental data when specific graphical criteria are assumed to hold. Introduction In many areas of industry, marketing, and health science, the unit selection dilemma arises. For example, in customer relationship management (Berson, Smith, and Thearling 1999; Lejeune 2001; Hung, Yen, and Wang 2006; Tsai and Lu 2009), it is useful to know which customers are going to churn but might reconsider if encouraged to stay. Due to the high expense of such initiatives, management is forced to limit inducement to customers who are most likely to exhibit the behavior of interest. As another example, companies are interested in identifying users who would click on an advertisement if and only if it is highlighted in online advertising (Yan et al. 2009; Bottou et al. 2013; Li et al. 2014; Sun et al. 2015). The challenge in identifying these users stems from the fact that the desired response pattern is not observed directly but rather is defined counterfactually in terms of what the individual would do under hypothetical unrealized conditions. For example, when we observe that a user has clicked on a highlighted advertisement, we do not know whether they would click on that same advertisement if it were not highlighted. The benefit function for the unit selection problem was defined by Li and Pearl (Li and Pearl 2019), and it properly captures the nature of the desired behavior. Using a combination of experimental and observational data, Li and Pearl derived tight bounds of the benefit function. The only assumption is that the treatment has no effect on the population Copyright 2022, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. specific characteristics. However, Li-Pearl s derivation does not leverage information from auxiliary covariates, if such is available. Mueller, Li, and Pearl (Mueller, Li, and Pearl 2021) recently proposed using covariate information and the causal structure to narrow the bounds of probability of necessity and sufficiency. Dawid et al. (Dawid, Musio, and Murtas 2017) also proposed using covariates information to narrow the bounds of probability of necessity. A similar approach might be used for the benefit function. Most crucially, the information provided by covariates and their causal structure may result in a reversal of decision (relative to not considering such covariates). Consider the following motivating scenario: a carwash company wants to offer a discount to employees of company A. The offer can only be presented to the entire company A; the carwash company will not be able to provide a discount to a specific group inside the company A. The carwash company s manager seeks to maximize total profit, including nonimmediate profit. The management estimates that the benefit of selecting a complier (i.e., offer the discount to a customer who would use the carwash service if they received the discount, but would not otherwise) is $100 as the profit is $140 but the discount is $40, that of selecting an always-taker (i.e., offer the discount to a customer who would use the carwash service regardless of whether they received the discount) is $60 as the customer would use the service anyway (so the company loses the value of the discount and an extra cost of $20 because the always-taker may require additional discounts in the future), that of selecting a never-taker (i.e., offer the discount to a customer who would never use the carwash service regardless of whether they received the discount) is $0 as the cost of issuing the discount is negligible, and that of selecting a defier (i.e., offer the discount to a customer who would not use the carwash service if they received the discount, but would use the carwash service otherwise) is $140 as the customer is lost due to the discount. The manager of carwash company has both experimental and observational data related to customer age collected from the company A. If the entire company A s employees are given the discount, the manager of carwash wants to know what the average profit will be. Based on Li-Pearl s model, it is easy to see that the benefit vector for the aforementioned example is (100, 60, 0, 140), and a corresponding benefit function The Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22) can be defined as the objective function. Li-Pearl s model can then obtain the bounds of the benefit function using experimental and observational data. The model, however, does not take into account the covariate information (customer age) and the causal structure. In this paper, we show how the information included in such covariates and their causal structure, can be used to narrow the bounds of the benefit function in Li-Pearl s model. Most importantly, the narrower bounds can, sometimes, flip the decision. Preliminaries In this section, we review Li and Pearl s benefit function of the unit selection problem (Li and Pearl 2019). Individual behavior was classified into four response types: labeled complier, always-taker, never-taker, and defier. Suppose the benefit of selecting one individual in each category are β, γ, θ, δ respectively (i.e., the benefit vector is (β, γ, θ, δ)). They defined the objective function of the unit selection problem as the average benefit gained per individual. Suppose a and a are binary treatments, r and r are binary outcomes, and c are population-specific characteristics, the objective function (i.e., benefit function) is following (If the goal is to evaluate the average benefit gained per individual for a specific population c, argmaxc can be dropped.): argmaxc βP(ra, r a |c) + γP(ra, ra |c) + +θP(r a, r a |c) + δP(r a, ra |c). Using a combination of experimental and observational data, Li and Pearl established the most general tight bounds on this benefit function (which we refer to as Li-Pearl s Theorem in the rest of the paper). The only constraint is that the population-specific characteristics are not a descendant of the treatment. However, the information of covariates (if available, such as the age in the motivating example in the previous section) are not considered. In this paper, we present three common cases of covariates and their causal structures and theorems that show how the information about the covariates along with their causal structures could narrow the bounds of the benefit function. The improvement of the bounds is sometimes significant and can change the decisions compared to Li-Pearl s Theorem. Selection Criteria with Causal Diagrams We present three common cases of the covariates and their causal structures in this section. For each case, we provide a theorem for estimating the benefit function in such a case. The proof of all theorems is in the appendix2. In any causal diagram of this paper, the dot line between A and B represents either A affects B, B affects A, or A and B are independent; the dot line with arrow from A to B represents either A affects B or A and B are independent. 2See for the appendix. Causal Diagram with Non-descendant Covariates Theorem 1 provides bounds for the benefit function when a set Z of variables can be measured, which satisfies only one condition: both population-specific variables C and covariates Z contain no descendant of X. This condition is important because if X is set to x and C Z contains a descendant of X, then C Z could be altered and P(yx|z, c) would be another unmeasurable counterfactual term. If the descendant is independent of Yx, then P(yx|z, c) would be measurable, but the descendant would not contribute to any narrowing of the bounds. These bounds are always contained within the bounds of the benefit function in Li-Pearl s Theorem. Theorem 1. Given a causal diagram G and distribution compatible with G, let Z C be a set of variables that does not contain any descendant of X in G, then the benefit function f(c) = βP(yx, y x |c) + γP(yx, yx |c) + θP(y x, y x |c) + δP(yx , y x|c) is bounded as follows: W + σU f(c) W + σL if σ < 0, W + σL f(c) W + σU if σ > 0, where σ, W, L, U are given by, σ = β γ θ + δ, W = (γ δ)P(yx|c) + δP(yx |c) + θP(y x |c), 0, P(yx|z, c) P(yx |z, c), P(y|z, c) P(yx |z, c), P(yx|z, c) P(y|z, c) P(yx|z, c), P(y x |z, c), P(y, x|z, c) + P(y , x |z, c), P(yx|z, c) P(yx |z, c)+ +P(y, x |z, c) + P(y , x|z, c) Notably, C can be interpreted as the population-specific variables, and Z are the attributes in each population. Moreover, the bounds provided above are always no worse than Li-Pearl s bound (see proof in the appendix2). Besides, if σ = 0, the Gain Equality is satisfied in Li-Pearl s model, and the result of the benefit function is no longer bounds, but a point estimate. Causal Diagram with Mediators Figure 1: Mediator Z with direct effects of X on Y . Partial Mediators In Figure 1, partial mediator Z is a descendant of X; thus, we cannot use Theorem 1. However, the absence of confounders (other than population specific variables C) between Z and Y and between X and Y permits us to bound the benefit function as follows: Theorem 2. Given a causal diagram G and distribution compatible with G, let Z be a set of variables such that x, x X : x = x , (Yx X Zx | Zx, C) in G, and C does not contain any descendant of X in G, then the benefit function f(c) = βP(yx, y x |c) + γP(yx, yx |c) + θP(y x, y x |c) + δP(yx , y x|c) is bounded as follows: W + σU f(c) W + σL if σ < 0, W + σL f(c) W + σU if σ > 0, where σ, W, L, U are given by, σ = β γ θ + δ, W = (γ δ)P(yx|c) + δP(yx |c) + θP(y x |c), 0, P(yx|c) P(yx |c), P(y|c) P(yx |c), P(yx|c) P(y|c) P(yx|c), P(y x |c), P(y, x|c) + P(y , x |c), P(yx|c) P(yx |c)+ +P(y, x |c) + P(y , x|c), P z P z min{P(y|z, x, c), P(y |z , x , c)} min{P(zx|c), P(z x |c)} Although this lower bound is unchanged from that in Li Pearl s Theorem, the upper bound contains a vital additional argument (i.e., the last term in the min function of U) to the min function. This new term can significantly reduce the upper bound. The rest of the terms are included because sometimes the bounds of Li-Pearl s Theorem are superior. The following theorem has the same quality. Pure Mediators Figure 2 is a special case of Figure 1, in which X has no direct effects on Y . The resulting bounds for the benefit function are as follows: Theorem 3. Given a causal diagram G in Figure 2 and distribution compatible with G, and C does not contain any descendant of X, then the benefit function f(c) = βP(yx, y x |c) + γP(yx, yx |c) + θP(y x, y x |c) + δP(yx , y x|c) is bounded as follows: Figure 2: Mediator Z with no direct effects of X on Y . W + σU f(c) W + σL if σ < 0, W + σL f(c) W + σU if σ > 0, where σ, W, L, U are given by, σ = β γ θ + δ, W = (γ δ)P(yx|c) + δP(yx |c) + θP(y x |c), 0, P(yx|c) P(yx |c), P(y|c) P(yx |c), P(yx|c) P(y|c) P(yx|c), P(y x |c), P(y, x|c) + P(y , x |c), P(yx|c) P(yx |c)+ +P(y, x |c) + P(y , x|c), ΣzΣz =z min{P(y|z, c), P(y |z , c)} min{P(z|x, c), P(z |x , c)} The core term (i.e., the last term in the min function of U) for Theorem 3 added to the upper bound notably only requires observational data. Examples In this section, we will show how the presented theorems can be applied to applications and how the theorems affect judgments using two cases. Company Selection Consider the motivating example in the introduction section. Let A = a denote the event that a customer receives the discount, A = a denote the event that a customer does not receive the discount, R = r denote the event that a customer uses the services, R = r denote the event that a customer does not use the services, C = c denote a company A s customer, Z = z denote a younger customer (age below or equal to 50), and Z = z denote an older customer (age above 50). The model is as shown in Figure 3. Figure 3: Company selection model. Based on Li-Pearl s model, it is easy to see that the benefit vector is (100, 60, 0, 140) (see the introduction section). Therefore, the benefit function is: argmaxc 100P(ra, r a |c) 60P(ra, ra |c) + +0P(r a, r a |c) 140P(r a, ra |c). (1) The manager of the carwash company collected the data listed in Tables 1 and 2 from company A. By Li-Pearl s Theorem, the bounds of the benefit function are [ 0.423, 2.832] (see the appendix2 for details), and the midpoint is 1.205. It suggests that the carwash company would gain $1.205 profit Discount No Discount 45 out of 101 used the service (44.6%) 5 out of 101 used the service (5.0%) 248 out of 249 used the service (99.6%) 179 out of 249 used the service (71.9%) 293 out of 350 used the service (83.7%) 184 out of 350 used the service (52.6%) Table 1: Experimental data collected by the carwash company. 350 customers were forced to receive the discount and 350 customers were forced not to receive the discount. Discount No Discount 90 out of 152 used the service (59.2%) 9 out of 50 used the service (18.0%) 157 out of 159 used the service (98.7%) 239 out of 339 used the service (70.5%) 247 out of 311 used the service (79.4%) 248 out of 389 used the service (63.8%) Table 2: Observational data collected by the carwash company. 700 customers were given access to the discount, they can choose whether to obtain the discount by themselves (note that a customer may still not use the service even they obtained the discount by themselves). from each individual from company A if they offer company A s employees the discount. Besides, most of the bounded area is positive, which provided more confidence that the conclusion is correct. However, Li-Pearl s theorem only uses the overall data in Tables 1 and 2 (i.e., customer age is not considered). Now, if we apply Theorem 1 to the data in Tables 1 and 2, the bounds of the benefit function is [ 0.168, 0.077] (see the appendix2 for details), with the midpoint at 0.123. This suggests that if the carwash company offers the discount to company A s employees, the carwash company will lose $0.123 profit per individual. Notably, the upper bound ( 0.077) is negative, implying that the carwash company must lose profit if they offers the discount to company A s employees regardless of how the bounds are used. Effective Patients of a Drug When a pharmaceutical company develops a new drug, it seeks to identify patients so as to maximize the difference between the number of effective patients and the number of ineffective patients. The causal diagram is shown in Figure 4. For the benefit vector, the pharmaceutical company assigned 1 to a complier because the complier is the patient cured by the drug, assign 1 to an always-taker, a nevertaker, and a defier because they are all ineffective patients. The benefit vector is then (1, 1, 1, 1). Drug No Drug 375 out of 405 recovered (92.6%) 159 out of 481 recovered (33.1%) 17 out of 183 recovered (9.3%) 3 out of 6 recovered (50.0%) Combined data 392 out of 588 recovered (66.7%) 162 out of 487 recovered (33.3%) Table 3: Results of an observational study (30 years old male) into a new drug, with post-treatment blood pressure taken into account. Figure 4: A graphical model representing the effects of a new drug, with A representing drug usage, R representing recovery, Z representing blood pressure (measured at the end of the study), and C representing the population specific variables (gender and age). Let A = a denote the event that a patient takes the drug, A = a denote the event that a patient does not take the drug, R = r denote the event that a patient is recovered, R = r denote the event that a patient is not recovered, Z = z denote low blood pressure (measured at the end of the study), Z = z denote high blood pressure, and C (a set of variables) denote the population-specific characteristics (gender and age) of a patient. The benefit function is then argmaxc P(ra, r a |c) P(ra, ra |c) P(r a, r a |c) P(r a, ra |c). (2) The pharmaceutical company records the recovery rates of 70000 patients who were given access to the drug (i.e., observational study). For each group of patients who have the same gender and age, they record the number of patients who chose to take the drug and their recovery rates, the number of patients who did not choose to take the drug, and their recovery rates. For example, the results of the 30 years old male patients (1075 patients) are shown in Table 3. Note that the data in Table 3 is observational data. The experimental data is not available yet. However, the set {C} satisfied the back-door criterion for both (A, Z) and (A, R) (Pearl 1995). By Pearl s adjustment formula, the experimental data needed are: P(ra|c) = P(r|a, c) = 0.6667, P(ra |c) = P(r|a , c) = 0.3326, P(za|c) = P(z|a, c) = 0.6888, and P(z a |c) = P(z |a , c) = 0.0123. First, we apply Li-Pearl s Theorem to the combined data in Table 3 and the above experimental data, the bounds of the benefit function are [ 0.3320, 0.3333] (see the appendix2 for details), and the midpoint is 0.0007. It suggests that the drug should apply to the 30 years old male because the difference between the number of effective patients and the number of ineffective patients per 30 years old male is positive. Or someone may say that it is hard to decide because the bounded area is roughly half positive and half negative. Second, we apply the proposed Theorem 2 to the entire data in Table 3 and the above experimental data, the bounds of the benefit function are [ 0.3320, 0.0054] (see the appendix2 for details), and the midpoint is 0.1687. The upper bound dropped significantly from 0.3333 to 0.0054. It suggests that the drug should not apply to the 30 years old male, because the difference between the number of effective patients and the number of ineffective patients per 30 years old male is negative. Most importantly, the entire bounded area is negative so that the decision is convincing. Simulated Results In this section, we will show how much in general the bounds of the benefit function are improved by Theorems 1, 2, and 3 in three simple causal diagrams. For each theorem, we randomly generated 100000 sample distributions (observational data and experimental data) compatible with the causal diagram (see the appendix2 for the generating algorithm). Each sample distribution represents a different instantiate of the population-specific characteristics C in the model. The generating algorithm ensures that the experimental data and observational data satisfy the general relation (i.e., P(x, y|c) P(y|do(x), c) 1 P(x, y |c)) (Tian and Pearl 2000). We set the benefit vector (β, γ, θ, δ) to be the most common (1, 1, 1, 1) to encourage compliers while avoiding always-takers, never-takers, and defiers. For the sample distribution i, let [ai, bi] be the bounds that considered the covariates and the causal diagram from the proposed theorems and [ci, di] be the bounds that did not consider the covariates and the causal diagram from Li-Pearl s Theorem. We summarized the following criteria for each case: Average increased lower bound : P(ai ci) Average decreased upper bound : P(di bi) 100000 ; Average gap that did not consider the covariates and the causal diagram : P(di ci) 100000 ; Average gap that considered the covariates and the causal diagram : P(bi ai) 100000 ; Number of sample distributions in which the decision was flipped : P ei where, ei = 1 if (ai + bi) (ci + di) < 0 and ei = 0 otherwise; Number of sample distributions in which the bounds that considered the covariates and the causal diagram from proposed Theorems were narrower : P fi where, fi = 1 if (ai > ci) or (bi < di) and fi = 0 otherwise. Non-descendant Covariates In the case of non-descendant covariates compatible with Theorem 1. We randomly generated 100000 sample distributions compatible with the causal diagram in Figure 5. Figure 5: Causal diagram such that C Z is not a descendant of X. The results between proposed Theorem 1 and Li-Pearl s Theorem are summarized in Table 4. We can see that the average gap that did not consider the covariates and the causal diagram by Li-Pearl s Theorem is 0.4342, while the average gap that considered the covariates and the causal diagram by Theorem 1 is 0.3352, and both the lower bound and upper bound are improved by roughly 0.05. The decisions flipped (i.e., the results of Li-Pearl s Theorem suggest gain profit, while the results of Theorem 1 suggest losing profit, or the reverse) is 920/100000 1% of the samples, which means that at least 1% of the applications would have the wrong decision if we do not consider the covariates. The bounds that considered the covariates and the causal diagram are narrower in 93688/100000 93.7% of the samples. Therefore, if a set of Z is available that satisfies Theorem 1, the bounds of the benefit function by the proposed theorem are more useful as the gap is narrower. Average increased lower bound Average decreased upper bound Average gap by Li-Pearl s Theorem 0.0494 0.0496 0.4342 Average gap by Theorem 1 Decision flipped Bounds narrower 0.3352 920 93688 Table 4: Simulation results of 100000 sample distributions compatible with the causal diagram in Figure 5. We then randomly picked 100 of 100000 sample distributions to draw the graph of bounds that considered and did not consider the covariates and the causal diagram (To have a better vision, we sorted the sample distributions by the general lower bound that did not considered the covariates and the causal diagram). The results are shown in Figure 6. We can see that the bounds of the benefit function are improved in most of the samples with the causal diagram. Partial Mediators In the case of partial mediators compatible with Theorem 2. We randomly generated 100000 sample distributions that are compatible with the causal diagram in Figure 1. The results between the proposed Theorem 2 and Li Pearl s Theorem are summarized in Table 5. First, the av- Figure 6: Bounds of the benefit function for 100 samples compatible with the causal diagram of Figure 5, where the general bounds are obtained from Li-Pearl s Theorem and the bounds that considered the non-descendant covariate and the causal diagram are obtained from Theorem 1. erage increased lower bound is 0 because the lower bound in Theorem 2 is exactly the lower bound in Li-Pearl s Theorem. The partial mediator cannot improve the lower bound. The average gap is also close between Li-Pearl s Theorem and proposed Theorem 2 because the bounds of only 12724/100000 12.7% of samples are narrowed by the proposed Theorem 2. 12.7% is an acceptable number if the costs for considering the partial mediators are acceptable. The actual improvement among the narrowed samples is impressive. We then randomly generated 100000 samples that the bounds are indeed narrowed by the proposed Theorem 2 (same generating algorithm, but we keep generating until we have 100000 narrowed samples). The results of the comparison between the proposed Theorem 2 and Li-Pearl s Theorem are summarized in Table 6. We can see that the average gap that did not consider the partial mediator and the causal diagram is 0.5531, while the average gap that considered the partial mediator and the causal diagram by Theorem 2 is 0.4768, and the upper bound is improved by roughly 0.0764. Therefore, if a set of Z is available that satisfies Theorem 2 and the costs permitted, we should always consider the partial mediators and using Theorem 2. We then randomly picked 100 of 100000 narrowed sample distributions to draw the graph of bounds that considered and did not considered the partial mediator and the causal diagram (To have a better vision, we sorted the sample distributions by the general upper bound that did not consider the partial mediator and the causal diagram). The results are shown in Figure 7. We can see that the upper bounds of the benefit function are improved significantly among these narrowed cases. Pure Mediators In the case of pure mediators compatible with Theorem 3. We randomly generated 100000 sample distributions compatible Average increased lower bound Average decreased upper bound Average gap by Li-Pearl s Theorem 0 0.00985 0.4564 Average gap by Theorem 2 Decision flipped Bounds narrower 0.4465 139 12724 Table 5: Simulation results of 100000 sample distributions compatible with the causal diagram in Figure 1. Average increased lower bound Average decreased upper bound Average gap by Li-Pearl s Theorem 0 0.0764 0.5531 Average gap by Theorem 2 Decision flipped 0.4768 1033 Table 6: Simulation results of 100000 narrowed sample distributions compatible with the causal diagram in Figure 1. with the causal diagram in Figure 2. The results between the proposed Theorem 3 and Li Pearl s Theorem are summarized in Table 7. We can see that the average gap that did not consider the pure mediator and the causal diagram by Li-Pearl s Theorem is 0.5195, while the average gap that considered the pure mediator and the causal diagram by Theorem 3 is 0.3324, and the upper bound is improved by roughly 0.187. The lower bound is not improved, because the lower bound in Theorem 3 is exactly the same as in Li-Pearl s Theorem. The decisions flipped (i.e., the results of Li-Pearl s Theorem suggest gain profit, while the results of Theorem 3 suggest losing profit, or the reverse) is 459/100000 0.46% of the samples, which means that at least 0.46% of the applications would have the wrong decision if we do not consider the pure mediators. The bounds that considered the pure mediator and the causal diagram are narrower in 99996/100000 99.9% of the samples. Therefore, if a set of Z is available that satisfies Theorem 3, the bounds of the benefit function by the proposed theorem is more useful as the gap is narrower. We then randomly picked 100 of 100000 sample distributions to draw the graph of bounds that considered and did not consider the pure mediator and the causal diagram (To have a better vision, we sorted the sample distributions by the general upper bound that did not consider the pure mediator and the causal diagram). The results are shown in Figure 8. We can see that the bounds of the benefit function are improved in almost all the samples with the causal diagram. Figure 7: Upper bound of the benefit function for 100 narrowed samples compatible with the causal diagram of Figure 1, where the general upper bounds are obtained from Li Pearl s Theorem and the upper bounds that considered the partial mediator and the causal diagram are obtained from Theorem 2. Figure 8: Bounds of the benefit function for 100 samples compatible with the causal diagram of Figure 2, where the general bounds are obtained from Li-Pearl s Theorem and the bounds that considered the pure mediator and the causal diagram are obtained from Theorem 3. Average increased lower bound Average decreased upper bound Average gap by Li-Pearl s Theorem 0 0.1870 0.5195 Average gap by Theorem 3 Decision flipped Bounds narrower 0.3324 459 99996 Table 7: Simulation results of 100000 sample distributions compatible with the causal diagram in Figure 2. Discussion In this section, we will discuss one more requirement of covariates Z in Theorem 1. Note that in the motivating example in the introduction section, the discount should apply to the entire company A s employees; the carwash company can only decide to offer the discount to the entire company A or not to the entire company A. The carwash company cannot offer the discount to a specific age group in company A. Otherwise, if the carwash company can offer the discount to a specific age group, the covariates Z should be considered as the population-specific characteristics and combined into C, and apply Li-Pearl s Theorem separately to each populationspecific group. This requirement is common; for example, an election speech cannot offer to only a specific group of people in a region, and an auto show cannot offer to only a specific group of customers in a region. This requirement does not apply to Theorems 2 and 3 because the mediators happen after the treatment. Conclusion We demonstrated how bounds of the benefit function in the unit selection problem could be narrowed if covariates information and their associated causal structures are available. We derived three theorems to narrow the bounds of the benefit function in three common graphical conditions. We illustrated that if costs are permitted, and there are covariates and causal structures available, the proposed theorems should always be applied, as narrower bounds are helping to make accurate decisions. Examples and simulation results are provided to support the proposed theorems. Acknowledgements This research was supported in parts by grants from the National Science Foundation [#IIS-2106908], Office of Naval Research [#N00014-17-S-12091 and #N00014-21-1-2351], and Toyota Research Institute of North America [#PO000897]. References Berson, A.; Smith, S.; and Thearling, K. 1999. Building data mining applications for CRM. Mc Graw-Hill Professional. Bottou, L.; Peters, J.; Qui nonero-Candela, J.; Charles, D. X.; Chickering, D. M.; Portugaly, E.; Ray, D.; Simard, P.; and Snelson, E. 2013. Counterfactual reasoning and learning systems: The example of computational advertising. The Journal of Machine Learning Research, 14(1): 3207 3260. Dawid, P.; Musio, M.; and Murtas, R. 2017. The Probability of Causation. Law, Probability and Risk, (16): 163 179. Hung, S.-Y.; Yen, D. C.; and Wang, H.-Y. 2006. Applying data mining to telecom churn management. Expert Systems with Applications, 31(3): 515 524. Lejeune, M. A. 2001. Measuring the impact of data mining on churn management. Internet Research, 11(5): 375 387. Li, A.; and Pearl, J. 2019. Unit selection based on counterfactual logic. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, 1793 1799. AAAI Press. Li, L.; Chen, S.; Kleban, J.; and Gupta, A. 2014. Counterfactual estimation and optimization of click metrics for search engines. ar Xiv preprint ar Xiv:1403.1891. Mueller, S.; Li, A.; and Pearl, J. 2021. Causes of Effects: Learning individual responses from population data. ar Xiv preprint ar Xiv:2104.13730. Pearl, J. 1995. Causal diagrams for empirical research. Biometrika, 82(4): 669 688. Sun, W.; Wang, P.; Yin, D.; Yang, J.; and Chang, Y. 2015. Causal inference via sparse additive models with application to online advertising. In AAAI, 297 303. Tian, J.; and Pearl, J. 2000. Probabilities of causation: Bounds and identification. Annals of Mathematics and Artificial Intelligence, 28(1-4): 287 313. Tsai, C.-F.; and Lu, Y.-H. 2009. Customer churn prediction by hybrid neural networks. Expert Systems with Applications, 36(10): 12547 12553. Yan, J.; Liu, N.; Wang, G.; Zhang, W.; Jiang, Y.; and Chen, Z. 2009. How much can behavioral targeting help online advertising? In Proceedings of the 18th international conference on World Wide Web, 261 270. ACM.