# a_kernel_test_for_quasiindependence__3b08d79f.pdf A kernel test for quasi-independence Tamara Fernández Gatsby Unit University College London t.a.fernandez@ucl.ac.uk Wenkai Xu Gatsby Unit University College London xwk4813@gmail.com Marc Ditzhaus Department of Statistics TU Dortmund University marc.ditzhaus@tu-dortmund.de Arthur Gretton Gatsby Unit University College London arthur.gretton@gmail.com We consider settings in which the data of interest correspond to pairs of ordered times, e.g, the birth times of the first and second child, the times at which a new user creates an account and makes the first purchase on a website, and the entry and survival times of patients in a clinical trial. In these settings, the two times are not independent (the second occurs after the first), yet it is still of interest to determine whether there exists significant dependence beyond their ordering in time. We refer to this notion as "quasi-(in)dependence". For instance, in a clinical trial, to avoid biased selection, we might wish to verify that recruitment times are quasi-independent of survival times, where dependencies might arise due to seasonal effects. In this paper, we propose a nonparametric statistical test of quasiindependence. Our test considers a potentially infinite space of alternatives, making it suitable for complex data where the nature of the possible quasi-dependence is not known in advance. Standard parametric approaches are recovered as special cases, such as the classical conditional Kendall s tau, and log-rank tests. The tests apply in the right-censored setting: an essential feature in clinical trials, where patients can withdraw from the study. We provide an asymptotic analysis of our test-statistic, and demonstrate in experiments that our test obtains better power than existing approaches, while being more computationally efficient. 1 Introduction Many practical scientific problems require the study of events which occur consecutively in time. We focus here on the setting where event-times, X and Y , are only observed if they are in the ordered relationship X Y . This type of data is commonly known as truncated data, and, in particular, we say that X is right-truncated by Y , or Y is left-truncated by X. In clinical trails, for example, only patients still alive at the beginning of the study can be recruited, hence the recruitment times X and the survival times Y are ordered. In the field of insurance, a liability claim may be placed at a time Y as a consequence of an incident at a time X. In e-commerce, the time Y of first purchase by a new user may only happen after the time X when the user registers with the website. Our goal is to determine whether there exists an association between X and Y in the truncated data setting. Given that X Y , the times X and Y clearly will not be independent (with the exception of trivial cases in which, for instance, X and Y have disjoint support). Thus, while it is not meaningful to test for statistical independence in the truncated setting, we can nevertheless still test for whether X and Y are uncoupled apart from the fact that X Y , using the notion of quasi-independence. We will make this notion formal in Section 2. 34th Conference on Neural Information Processing Systems (Neur IPS 2020), Vancouver, Canada. Figure 1: Channing House dataset: the x-axis shows the entry time to the retirement center; and the y-axis shows the right-censored lifetimes . Events are censored by withdrawal from the center or study finishes at July 1, 1975. AIDS dataset: the x-axis shows the incubation time X; and the y-axis shows the censored lapse time Y, measured from infection to recruitment time. Events are censored by death or left the study. Infected patients were recruited in the study only if they developed AIDS within the study period, therefore, in this dataset, the incubation time X does not exceed the lapse time Y. Abortion dataset: the x-axis shows the time to enter the study; and the y-axis shows the right-censored time for spontaneous abortion. Events are censored due to life birth and induced abortions. All censored times are marked in dark. Testing for an association between ordered X and Y may be important in making business/medical decisions. In the setting of clinical trials, it is important to ensure that survival times are as independent" as possible from recruitment times, in order to avoid bias in the recruitment process. In e-commerce, it may be of interest to test whether the purchase time for an item, such as a swimsuit, depends on the registration time, to determine seasonal effects on consumer behaviour and refine advertising strategies. In statistical modelling, a common working assumption is that X and Y are independent, but can only be observed when X Y holds: see e.g, [18, 33, 38], and [22, Chapter 9]. The independence assumption can be weakened to quasi-independence, which is testable, and under which typical methods are still valid [22, 24, 35, 36, 37, 38]. Our tests apply in the setting where Y is right-censored. This is a very common scenario in real-world applications, particularly in clinical trials, where patients may withdraw from the study before their event of interest is observed. In the e-commerce example, there may be registered users that have not yet made a purchase when the study ends. Formally, the data corresponds to the triple (X, T, ), where T = min{C, Y } is the minimum between the survival time Y of a given patient, and the time C at which said patient leaves the study (or the study ends), and = 1{T =Y }. Given the truncated data setting, we have further that X min{Y, C}. We emphasise that quasi-independence and rightcensoring are very different data properties. Quasi-independence is a deterministic hard constraint (X Y ), while right-censoring is a stochastic property of the data (incomplete observations). Quasi-independence has been widely studied in the statistics community, including for right-censored data: we provide a brief review below (more detailed descriptions of relevant concepts and methods will be provided in subsequent sections). In this work, we propose a non-parametric statistical test for quasi-independence, which applies under right censoring. Our test statistic is a nonparametric generalisation of the log-rank test proposed by [7], where the departure from the null is characterised by functions in a reproducing kernel Hilbert space (RKHS). Consequently, we are able to straightforwardly detect a very rich family of alternatives, including non-monotone alternatives. Our test generalises statistical tests of independence based on the Hilbert-Schmidt Independence Criterion [16]; which were adapted to the right-censoring setting in [8, 28]. Due to the additional correlations present in the test statistic under quasi-independence, however, we will require new approaches in our analysis of the consistency and asymptotic behaviour of our test statistic, compared with these earlier works. In Section 2, we introduce the notion of quasi-independence. We next propose an RKHS statistic to detect this quasi-independence, and its finite sample estimate from data. We contrast the statistic for quasi-independence with the analogous RKHS statistic for independence, noting the additional sample dependencies on account of the left-truncation. Next, in Section 3, we generalise the quasiindependence statistics to account for the presence of right-censored observations. In Section 4, we provide our main theoretical results: an asymptotic analysis for our test statistic, and a guarantee of consistency under the alternative. In order to determine the test threshold in practice, we introduce a Wild Bootstrap procedure to approximate the test threshold. In Section 5 we give a detailed empirical evaluation of our method. We begin with challenging synthetic datasets exhibiting periodic quasi-dependence, as would be expected for example from seasonal or daily variations, where our approach strongly outperforms the alternatives. Additionally, we show our test is consistently the best test in data-scenarios in which the censoring percentage is relatively high, see Figure 6. Next, we apply our test statistic to three real-data scenarios, shown in Figure 1: a survival analysis study for residents in the Channing House retirement community in Palo Alto, California [18]; a study of transfusion-related AIDS [24]; and a spontaneous abortion study [26]. For this last dataset, our general-purpose test is able to detect a mode of quasi-dependence discovered by a model that exploits domain-specific knowledge, but not found by alternative general-purpose testing approaches. This was a particular challenge due to the large percentage of censored observations in the abortion dataset; see Figure 6. Proofs of all results are given in the Appendix. 2 Quasi-independence Our goal is to infer the null hypothesis of quasi-independence between X and Y . Formally, this null hypothesis is characterised as H0 : π(x, y) = e FX(x)e SY (y), for all x y, (1) where π(x, y) = P(X x, Y y), and FX(x) and SY (y) are functions that only depend on x and y, respectively. In case of independent X and Y , e FX(x) and e SY (y) coincide with FX(x) = P(X x) and SY (y) = P(Y y), but in general they may differ. For simplicity, X and Y are assumed continuously distributed on R+, and f XY , f X and f Y denote the joint density and the corresponding marginals, and f Y |X=x denotes the conditional density of Y given X = x. To simplify the notation, we suppose throughout that X Y always holds, and thus write π(x, y) = P(X x, Y y) instead of π(x, y) = P(X x, Y y|X Y ), as P(X Y ) = 1. We remark, however, that the ordering X Y can be ensured by considering a conditional probability space given X Y, and restricting calculations of probabilities, expectation etc. to this space; see [2, 7, 34]. The notion of quasi-independence must not be confused with the notion of independent increments, i.e., X (Y X). For instance, generate X and Y such that X Y by sampling i.i.d. uniform random variables, say (U1, U2), in the interval (0, 1), and make X = U1 and Y = U2 for the first pair (U1, U2) such that U1 U2. It can be verified that this construction leads to quasi-independent random variables (X, Y ), but X and Y X are not independent as the distribution of Y X is constrained by how large the original value of X was. The larger X is, the smaller is the value of Y X. In [7], the authors propose to measure quasi-independence by using a log-rank-type test-statistic which estimates R x y ω(x, y)ρ(x, y)dxdy, where ρ(x, y) = π(x, y) 2π(x, y) x y + π(x, y) y , x y. (2) The function ρ is originally inspired by the odds ratio proposed by [1] (notwithstanding that ρ is here a difference, rather than a ratio). Under the assumption of quasi-independence, ρ = 0, and thus R x y ω(x, y)ρ(x, y)dxdy = 0. Nevertheless, it may be that R x y ω(x, y)ρ(x, y)dxdy = 0 even if the quasi-independence assumption is not satisfied, since the quantity depends on the function ω: for instance, it is trivially zero when ω = 0. To avoid choosing a specific weight function ω, we optimise over a class of weight functions, taking an RKHS approach, Ψ = sup ω B1(H) x y ω(x, y)ρ(x, y)dxdy, (3) where B1(H) is the unit ball of a reproducing kernel Hilbert space H with bounded measurable kernel given by K : R2 + R2 + R. We refer to the measure Ψ2 as Kernel Quasi-Independent Criterion (KQIC). It can easily be verified that Ψ 0; and, if X and Y are quasi-independent, then Ψ = 0. For c0-universal kernels [30], we have that Ψ = 0 if and only if X and Y are quasi-independent: see Theorem 4.2. Given the i.i.d. sample ((Xi, Yi))i [n], we can estimate Ψ via Ψn, defined as Ψn = sup ω B1(H) i=1 ω(Xi, Yi)bπ(Xi, Yi) 1 k=1 ω(Xi, Yk)1{Xk Xi 0. (5) As with the uncensored case, f X and f Y are not necessarily equal to the marginal densities f X and f Y . The additional condition ST |X=x(y) > 0 ensures that the pair (x, y) is actually observable despite the censoring. The statistic Ψ from Equation (3) is then extended to the censored setting, Ψc = sup ω B1(H) x y ω(x, y)ρc(x, y)dxdy 0, where ρc(x, y) = πc(x, y) 2 x y πc 1(x, y) + πc(x, y) x πc 1(x, y) and πc 1(x, y) = P(X x, T y, = 1) and πc(x, y) = P(X x, T y) for x y. Proposition 3.2. We have Ψc = 0 if the null hypothesis H0 of quasi-independence is fulfilled. The (updated) estimator for the (new) Kernel Quasi Independent Criterion Ψc is defined by Ψc,n = sup ω B1(H) i=1 iω(Xi, Ti)bπc(Xi, Ti) 1 k=1 kω(Xi, Tk)1{Xk Xi q WB α } to infer H0, where q WB α denotes the (1 α)-quantile of ΨWB c,n)2 given the observations ((Xi, i, Ti))i [n]. 5 Experiments We perform synthetic experiments followed by real data applications. In the first set of synthetic examples, we replicate the settings studied in [3], where Gaussian copula models were used to create dependencies between X and Y . In the second synthetic experiment, we investigate distribution functions f Y |X=x that have a periodic dependence on x. We then apply our tests to real-data scenarios such as those studied in [7] and [26]. Methods We implement the proposed quasi-independence test based on the test-statistic KQIC given in Equation (7). The kernels are chosen to be Gaussian with bandwidth optimised by using approximate test power [15, 21]. See Appendix G for details. Competing approaches include: WLR, the weighted log-rank test proposed in [7], with weight function chosen equal to nbπc(x, y);1 WLR_SC, the weighted log-rank test proposed in [7], with weight function chosen as suggested by the authors, i.e, W(x, y) = R x 0 b SCR((y u) ) 1bπc(du, y), where b SCR is the Kaplan-Meier estimator associated to the data ((Ci Xi, 1 i))n i=1; M&B, the conditional Kendall s tau statistic modified to incorporate censoring as proposed in [25]; and Min P1 and Min P2, the minimal p-value selection" tests proposed in [3], which rely on permutations of the observed pairs. A review of these approaches can be found in Appendix F. For the synthetic experiments, we recorded the rejection rate over 200 trials. The wild-bootstrap size for KQIC and the permutation size for Min P1, Min P2 are set to be 500. Monotonic Dependency The first synthetic example from [3] is generated as follows: X Exp(5) and Y Weibull(3, 8.5); (X, Y ) are then coupled via a 2-dimensional Gaussian copula model with correlation parameter ρ. The censoring variable is set to be exponentially distributed and truncation applies. With the copula construction, the magnitude of the correlation parameter ρ is a fair indicator of the degree of dependence, with ρ = 0 denoting independence. Rejection rates are reported in Table 1. At ρ = 0, the null hypothesis holds, and the rejection rates refer to the Type-I error. All the tests achieve a correct Type-I error around a test level α = 0.05. For ρ = 0, the alternative holds, and the rejection rates correspond to test power (the higher the better). The highest value is in bold. Test results w.r.t. different censoring rates can be found in the Appendix. Overall, our method outperforms all competing approaches. 1Our test-statistic recovers, as a particular case, the squared of this log-rank test by choosing K = 1 ρ -0.4 -0.2 0.0 0.2 0.4 KQIC 0.93 0.46 0.06 0.42 0.86 WLR 0.80 0.33 0.10 0.18 0.66 WLR_SC 0.85 0.42 0.03 0.24 0.74 M&B 0.64 0.22 0.02 0.16 0.74 Min P1 0.58 0.12 0.03 0.17 0.62 Min P2 0.33 0.04 0.06 0.10 0.28 -0.4 -0.2 0.0 0.2 0.4 0.99 0.67 0.05 0.63 1.00 0.94 0.52 0.06 0.32 0.94 0.93 0.53 0.06 0.43 0.99 0.94 0.28 0.03 0.42 0.92 0.84 0.12 0.10 0.34 0.84 0.56 0.08 0.08 0.28 0.52 Table 1: Rejection rates for monotonic dependency models based on Gaussian copula, with n = 100 on the left; n = 200 on the right; α = 0.05; censoring rate: 50%. Figure 2: Rejection rate for V-shape Gaussian copula model Figure 3: Samples from Periodic Dependency Model w.r.t. Frequency Coefficient β Figure 4: Rejection Rate for Periodic Dependency Model with 25% data censored. V-shaped Dependency A synthetic example [3], in which the authors compare the behaviour of their tests against the conditional Kendall s tau test of [25] in detecting non-monotonic dependencies. The following V-shaped dependency structure applies: X Weibull(0.5, 4); Y Uniform[0, 1]; (X, |Y 0.5|) is coupled via the 2-dimensional Gaussian copula with correlation coefficient ρ as above. Exponential censoring and truncation apply. Rejection rates are plotted against the perturbation of correlation coefficient ρ in Figure 2, where KQIC outperforms competing methods. Periodic Dependency Apart from the V-shaped dependencies studied in [3], we investigate more complicated non-monotonic dependencies structures. The data are generated with a periodic dependency structure, X Exp(1); Y |X Exp(ecos(2πβX). The coefficient β controls the frequency of Figure 5: Rejection rate for high frequency dependency, with α = 0.05, 40% data censored Figure 6: Rejection rate for periodic dependencies (β = 5.0), with α = 0.05 and 200 trials. the dependence. A set of examples with different parameters β is shown in Figure 3, with β = 0 implying independence. Further details are discussed in Appendix G.3. Examining the results in Figure 4, we see that our method outperforms competing approaches. Unlike the correlation coefficient ρ in Gaussian copula models, the coefficient β does not directly imply the amount of dependence; rather, a higher β indicates a more difficult problem. Thus, as anticipated, power drops for large values of β, and the effect is more apparent at low sample sizes. Note in particular that the permutation based tests [3] are more affected by an increase in frequency at which dependence occurs, while our test shows a more robust behaviour. High Frequency Dependency In the period dependency problem above , the parameter β controls the frequency of sinusoidal dependence. At a given sample size, the dependence becomes harder to detect as the frequency β increases. We visually show this in Appendix G.3. For problems with high frequency dependence, a larger sample size is required. When the sample size increases, KQIC is able to successfully reject the null at relatively high frequencies (large β), as shown in Figure 5. At lower frequencies β = 3.0, WLR_SC has similar test power as KQIC. As the problem gets harder with larger β, KQIC outperforms WLR_SC. The IMQ kernel has similar test power as the Gaussian kernel on this example. We report the Type-I error that is well controlled in Appendix G.3 Table 5. Censoring level We investigate how our test is affected by the censoring level, in particular when the censoring percentage increases. We analyse performance under both the null and alternative hypotheses. The Type-I error is well controlled for KQIC and details are reported in Appendix G.5. Under the alternative hypothesis, in Figure 6, we show the rejection rate w.r.t. different censoring percentages and fixed sample size. This is done in our periodic dependency setting. From the plot, we see that KQIC with Gaussian and IMQ kernels is more robust to censoring, with test power starting to drop at 85% of censoring for sample size = 800. WLR_SC is strongly affected by censoring. WLR is not capable of detecting H1 in this hard problem with high frequency. In addition, we study the test behaviour with dependent censoring, since in Assumption 3.1, only conditional independence Y C|X is required [7]. Detailed results are reported in Appendix G.4. Computational cost Our proposed test, implemented as described in Appendix E, has a significantly lower runtime when compared with the competing permutation approaches. M&B implements the conditional Kendall s tau statistic, which has a closed-form expression for the null distribution, therefore its runtime is lowest of all. See Appendix G for details. (p-value) Channing House AIDS Abortion Times Combined Male Female Combined Control Treatment KQIC_Gauss 0.072 0.012 0.566 0.030 0.014 0.440 0.028 KQIC_IMQ 0.078 0.022 0.414 0.010 0.032 0.158 0.048 WLR 0.058 0.016 0.444 0.035 0.408 0.868 0.748 WLR_SC 0.086 0.020 0.422 0.030 0.511 0.674 0.450 Min P1 0.084 0.036 0.396 0.012 0.584 0.584 0.452 Min P2 0.198 0.426 0.118 0.406 0.694 0.572 0.346 M&B 0.178 0.199 0.495 0.010 0.712 0.693 0.752 % Events 0.379 0.474 0.354 0.875 0.094 0.069 0.098 Table 2: Real data, with marked results contradicting and supporting the scientific literature. Real Data Experiment We consider three real data scenarios: Channing House [18]: contains the recorded entry times and lifetimes of 461 patients (97 men and 364 women). Among them, 268 subjects withdrew from the retirement center, yielding to a censoring proportion of 0.62. The data are naturally left truncated, as only patients who entered the center are observed; AIDS [24]: the data contain the incubation time and lapse time, measured from infection to recruitment time, for 295 subjects. A censoring of proportion of 0.125 occurs due to death or withdrawal from the study. Left truncation applies since only patients that developed AIDS within the study period were recruited, thus only patients with incubation time not exceeding the lapse time were observed; and Abortion [26]: contains the entry time and the spontaneous abortion time for 1186 women (197 control group and 989 treatment group exposed to Coumarin derivatives). A censoring proportion of 0.906 occurs due to live birth or induced abortions. Delayed entry to the study is substantial in this dataset: 50% of the control cohort entered the study in week 9 or later, while in the treatment group this occurs for 25% of the cohort. Implementation: For our test we used both Gaussian kernels KQIC_Gauss and IMQ kernels KQIC_IMQ. For competing approaches, the implementation is as discussed at the beginning of this section. Results: For the Channing house dataset, in Table 2, we observe that all tests agree in not rejecting the null hypothesis for the combined and female groups at a level α = 0.05. For the male group, all tests but Min P2 and M&B reject the null hypothesis at α = 0, 05. Our results agree with [7]. For the AIDS dataset, all tests reach a consensus of rejecting the null, which is consistent with [7], except for Min P2 marked in blue. For the abortion dataset, our test rejects the null hypothesis, suggesting dependency between the entry time X and the spontaneous abortion time Y in both the treatment group and the combined case (in red). This finding is in accordance with domain knowledge [26], where the presence of this dependence was indicated to be due to the study design. The competing tests were unable to detect the dependence; however, did not reject the null hypothesis. 6 Conclusions We address the problem of testing for quasi-independence in the presence of left-truncation, as occurs in real-world examples where events are ordered. The test is nonparametric and general-purpose, can detect a broad class of departures from the null, and applies even where right-censoring is present. In experiments on challenging synthetic data, our method strongly outperforms the alternatives. On real-life datasets, our method yields consistent results to classical approaches where these apply; however, it also detects quasi-dependence in a case where competing general-purpose approaches fail, and where models based on domain knowledge were needed in establishing the result. Our tests are a first step towards the wider challenge of testing quasi-independence in the presence of general physical or causal constraints on the variables, which themselves induces a baseline level of dependence. Many real-world settings, apart from left-truncation, do not have the advantage of a pure null scenario of perfect independence, and tests must be designed in light of these constraints. These directions are an exciting topic for future study. 7 Broader Impact Potential benefits to society Finding dependencies is a key tool in a broad variety of scientific domains, including clinical treatments, demography, business strategy development, and public policy formulation, with applications spanning the natural and social sciences. Our work addresses these questions by studying the dependence relationships between observed data, where the data already have an intrinsic dependence due to natural order. Moreover, the dependence need not be monotonic, but can take a variety of forms. Detecting the dependence of variables in this setting, which corresponds to many real-life scenarios, will allow scientists or policymakers to better understand their data and research problems, and guide the better design of future research questions. The dependence detection strategy may be used to detect bias in data collection procedures. Such bias could be avoided by verifying the absence of inadvertent dependency relationships in collected data. Potential risks to society There are a number of ways in which statistical tests can be mis-applied in the wider scientific community, and these must be guarded against. As one example, p-value hacking/failure to correct for multiple testing can result in false positives. In the event that these false positives are surprising or controversial, they can gain considerable traction in the media. In some cases, peoples health can be at risk. A second risk, specific to tests of dependence, is for correlation and causation to be confused. Our tests detect correlation, however, a misunderstanding of such tests might result in false conclusions of cause and effect. There have been especially pernicious instances when using statistics in domains such as crime prediction. Acknowledgement TF, WX, and AG thank the Gatsby Charitable Foundation for the financial support. MD gratefully acknowledge support from the Deutsche Forschungsgemeinschaft (grant no. PA-2409 5-1). [1] L.L. Chaieb, L.-P. Rivest, and B. Abdous. Estimating survival under a dependent truncation. Biometrika, 93(3):655 669, 2006. [2] S.H. Chiou, J. Qian, E. Mormino, and R.A. Betensky. Permutation tests for general dependent truncation. Computational Statistics & Data Analysis, 128:308 324, 2018. [3] S.H. Chiou, J. Qian, E. Mormino, R.A. Betensky, Imaging Biomarkers Lifestyle Flagship Study Australian, Aging Brain Study Harvard, and Alzheimer s Disease Neuroimaging Initiative. Permutation tests for general dependent truncation. Computational Statistics & Data Analysis, 128:308, 2018. [4] K.P. Chwialkowski and A. Gretton. A kernel independence test for random processes. In International Conference on Machine Learning, 2014. [5] K.P. Chwialkowski, D. Sejdinovic, and A. Gretton. A wild bootstrap for degenerate kernel tests. In Advances in Neural Information Processing Systems, pages 3608 3616, 2014. [6] H. Dehling and T. Mikosch. Random quadratic forms and the bootstrap for U-statistics. Journal of Multivariate Analysis, 51(2):392 413, 1994. [7] T. Emura and W. Wang. Testing quasi-independence for truncation data. Journal of Multivariate Analysis, 101:223 239, 2010. [8] T. Fernandez, A. Gretton, D. Rindt, and D. Sejdinovic. A kernel log-rank test of independence for right-censored data. ar Xiv preprint ar Xiv:1912.03784, 2019. [9] T. Fernandez and N. Rivera. A reproducing kernel Hilbert space log-rank test for the two-sample problem. ar Xiv preprint ar Xiv:1904.05187, 2019. [10] T. Fernández and N. Rivera. Kaplan-Meier Vand U-statistics. Electronic Journal of Statistics, 14(1):1872 1916, 2020. [11] K. Fukumizu, A. Gretton, X. Sun, and B. Schölkopf. Kernel measures of conditional dependence. In Neur IPS, pages 489 496, 2008. [12] J. Gorham and L. Mackey. Measuring sample quality with kernels. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1292 1301. JMLR. org, 2017. [13] A. Gretton. A simpler condition for consistency of a kernel independence test. ar Xiv preprint ar Xiv:1501.06103, 2015. [14] A. Gretton, O. Bousquet, A. J. Smola, and B. Schölkopf. Measuring statistical dependence with Hilbert-Schmidt norms. In ALT, pages 63 78, 2005. [15] A. Gretton, K. Fukumizu, Z. Harchaoui, and B.K. Sriperumbudur. A fast, consistent kernel two-sample test. In Advances in Neural Information Processing Systems, pages 673 681, 2009. [16] A. Gretton, K. Fukumizu, C.H. Teo, L. Song, B. Schölkopf, and A.J. Smola. A kernel statistical test of independence. In Advances in Neural Information Processing Systems, pages 585 592, 2008. [17] A. Gretton, D. Sejdinovic, H. Strathmann, S. Balakrishnan, M. Pontil, K. Fukumizu, and B.K. Sriperumbudur. Optimal kernel choice for large-scale two-sample tests. In Advances in Neural Information Processing Systems, pages 1205 1213, 2012. [18] J. Hyde. Testing survival under right censoring and left truncation. Biometrika, 64(2):225 230, 1977. [19] W. Jitkrittum, H. Kanagawa, P. Sangkloy, J. Hays, B. Schölkopf, and A. Gretton. Informative features for model comparison. In Advances in Neural Information Processing Systems, pages 808 819, 2018. [20] W. Jitkrittum, Z. Szabó, K.P. Chwialkowski, and A. Gretton. Interpretable distribution features with maximum testing power. In Advances in Neural Information Processing Systems, pages 181 189, 2016. [21] W. Jitkrittum, W. Xu, Z. Szabó, K. Fukumizu, and A. Gretton. A linear-time kernel goodnessof-fit test. In Advances in Neural Information Processing Systems, pages 262 271, 2017. [22] J.P. Klein and M.L. Moeschberger. Survival analysis: techniques for censored and truncated data. Springer Science & Business Media, 2006. [23] V. S. Koroljuk and Yu. V. Borovskich. Theory of U-statistics, volume 273 of Mathematics and its Applications. Kluwer Academic Publishers Group, Dordrecht, 1994. [24] S. Lagakos, L. Barraj, and V. De Gruttola. Nonparametric analysis of truncated survival data, with application to AIDS. Biometrika, 75:515 523, 1988. [25] E.C. Martin and R.A Betensky. Testing quasi-independence of failure and truncation via conditional Kendall s tau. Journal of the American Statistical Association, 100:484 492, 2005. [26] R. Meister and C. Schaefer. Statistical methods for estimating the probability of spontaneous abortion in observational studies analyzing pregnancies exposed to coumarin derivatives. Reproductive Toxicology, 26(1):31 35, 2008. [27] A. Meynaoui, M. Albert, B. Laurent, and A. Marrel. Adaptive test of independence based on HSIC measures. ar Xiv preprint ar Xiv:1902.06441, 2019. [28] D. Rindt, D. Sejdinovic, and D. Steinsaltz. Nonparametric independence testing for rightcensored data using optimal transport. ar Xiv preprint ar Xiv:1906.03866, 2019. [29] D. Sejdinovic, A. Gretton, and W. Bergsma. A kernel test for three-variable interactions. In Advances in Neural Information Processing Systems, pages 1124 1132, 2013. [30] B.K. Sriperumbudur, K. Fukumizu, and G.R.G. Lanckriet. Universality, characteristic kernels and RKHS embedding of measures. Journal of Machine Learning Research, 12(Jul):2389 2410, 2011. [31] B.K. Sriperumbudur, A. Gretton, K. Fukumizu, B. Schölkopf, and G.R.G. Lanckriet. Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research, 11(Apr):1517 1561, 2010. [32] D.J. Sutherland, H.-Y. Tung, H. Strathmann, S. De, A. Ramdas, A. Smola, and A. Gretton. Generative models and model criticism via optimized maximum mean discrepancy. ar Xiv preprint ar Xiv:1611.04488, 2016. [33] W.-Y. Tsai. Estimation of the survival function with increasing failure rate based on left truncated and right censored data. Biometrika, 75(2):319 324, 1988. [34] W.-Y. Tsai. Testing the assumption of independence of truncation time and failure time. Biometrika, 77:169 177, 1990. [35] W.-Y. Tsai, N.P. Jewell, and M.C. Wang. A note on the product-limit estimator under right censoring and left truncation. Biometrika, 74:883 886, 1987. [36] B.W. Turnbull. The empirical distribution function with arbitrarily grouped, censored and truncated data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 38:290 295, 1976. [37] M.C. Wang. Nonparametric estimation from cross-sectional survival data. Journal of the American Statistical Association, 86:130 143, 1991. [38] M. Woodroofe. Estimating a distribution function with truncated data. The Annals of Statistics, 13:163 177, 1985.