# optimal_transportbased_conformal_prediction__d70ff16e.pdf Optimal Transport-based Conformal Prediction Gauthier Thurin 1 Kimia Nadjahi 1 Claire Boyer 2 Conformal Prediction (CP) is a principled framework for quantifying uncertainty in black-box learning models, by constructing prediction sets with finite-sample coverage guarantees. Traditional approaches rely on scalar nonconformity scores, which fail to fully exploit the geometric structure of multivariate outputs, such as in multioutput regression or multiclass classification. Recent methods addressing this limitation impose predefined convex shapes for the prediction sets, potentially misaligning with the intrinsic data geometry. We introduce a novel CP procedure handling multivariate score functions through the lens of optimal transport. Specifically, we leverage Monge-Kantorovich vector ranks and quantiles to construct prediction region with flexible, potentially non-convex shapes, better suited to the complex uncertainty patterns encountered in multivariate learning tasks. We prove that our approach ensures finite-sample, distribution-free coverage properties, similar to typical CP methods. We then adapt our method for multi-output regression and multiclass classification, and also propose simple adjustments to generate adaptive prediction regions with asymptotic conditional coverage guarantees. Finally, we evaluate our method on practical regression and classification problems, illustrating its advantages in terms of (conditional) coverage and efficiency. 1. Introduction In various domains, including high-stakes applications, state-of-the-art performances are often achieved by blackbox machine learning models. As a result, accurately quantifying the uncertainty of their predictions has become a crit- 1CNRS, Ecole Normale Sup erieure, Paris, France 2Laboratoire de Math ematiques d Orsay (LMO), Universit e Paris Saclay, France, and Institut universitaire de France. Correspondence to: Gauthier Thurin . Proceedings of the 42 nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025. Copyright 2025 by the author(s). ical priority. Conformal Prediction (CP, Vovk et al., 2005) has emerged as a compelling framework to address this need, by generating prediction sets with coverage guarantees (ensuring they contain the true outcome with a specified confidence level) regardless of the model or data distribution. Most CP methods are thus model-agnostic and distributionfree while being easy to implement, which explains their growing popularity in recent years. The main idea of CP is to convert a set of non-conformity scores into reliable uncertainty sets using quantiles. Nonconformity scores are empirical measurements of how unusual a prediction is. For example, in regression, the score can be defined as |ˆy y|, where ˆy R is the model s prediction and y R the true response (Lei et al., 2018). These scores are central to the CP framework as they encapsulate the uncertainty stemming from both the model and the data, directly influencing the size and shape of the resulting prediction sets. Therefore, the quality of the prediction sets hinges on the relevance of the chosen non-conformity score: while a poorly designed score may still achieve the required coverage guarantee, it often leads to overly conservative or inefficient prediction sets, failing to capture the complex patterns of the underlying data distribution (Angelopoulos & Bates, 2023). Most CP approaches rely on scalar non-conformity scores (e.g., Angelopoulos & Bates, 2023; Romano et al., 2020b; Cauchois et al., 2021; Sesia & Romano, 2021; Lei et al., 2018). Although conceptually simple, such onedimensional representations can be too restrictive or poorly suited in applications that require multivariate prediction sets. To circumvent this, recently-proposed CP methods seek to incorporate correlations despite the use of scalar scores, by leveraging techniques such as copulas (Messoudi et al., 2021) or ellipsoids (Johnstone & Cox, 2021; Messoudi et al., 2022; Henderson et al., 2024). Nevertheless, these approaches either lack finite-sample coverage guarantees or impose restrictive modeling assumptions that prescribe the shape of the prediction region. Feldman et al. (2023) recently proposed a CP method able to construct more adaptive prediction regions with non-convex shapes, establishing a connection with multivariate quantiles. However, their method (as conformalized quantile regression, Romano et al., 2019) requires intervening in the way the predictor is trained and, therefore, cannot be directly applied Optimal Transport-based Conformal Prediction to a black-box model. Contributions. In this work, we introduce a novel general CP framework that accommodates multivariate scores, enabling more expressive representations of prediction errors. The core idea is to leverage Monge-Kantorovich (MK) quantiles (Chernozhukov et al., 2017; Hallin et al., 2021), a multivariate extension of traditional scalar quantiles rooted in optimal transport theory with various applications (see e.g., Carlier et al., 2016; Hallin, 2022; Rosenberg et al., 2023). MK quantiles are constructed by mapping multidimensional scores onto a reference distribution. The resulting CP framework, called OT-CP for Optimal Transport-based Conformal Prediction, effectively captures the structure and dependencies within multivariate data while ensuring distribution-free ranks, thanks to the distinctive properties of MK quantiles (Deb & Sen, 2023; Hallin et al., 2021). We demonstrate that OT-CP constructs prediction regions with finite-sample coverage guarantees. These hold for any choice of multivariate score function, which makes OT-CP a robust and practical tool to address complex uncertainty quantification task. After presenting the general OT-CP methodology with its theoretical guarantees (Section 2), we apply it to two typical learning tasks: multi-output regression (Section 3) and classification (Section 4). For each of these, we use multivariate score functions which, when integrated in OT-CP, yield prediction regions that effectively capture correlations between the score dimensions. In the context of regression, we also develop an extension of OT-CP that conditionally adapts to input covariates, further enhancing the flexibility of our method. Moreover, we show that this adaptive version provably reaches asymptotic conditional coverage. These two case studies serve a dual purpose: they highlight the versatility and user-friendliness of OT-CP while offering concrete frameworks to evaluate its benefits over existing methods through numerical experiments. In doing so, we believe this lays a solid foundation for future explorations of OT-CP across a wider range of applications. The code used to produce the results in this paper can be accessed at this Git Hub repository. 2. Methodology 2.1. Setting We consider a pre-trained black-box model ˆf : X Y, where X and Y respectively denote the input and output spaces of the learning task. Assume we have access to a set of n exchangeable observations (Xi, Yi) X Y, not used during the training of ˆf and referred to as the calibration set. Consider a score function s : X Y Rd + that produces d 1 non-conformity scores, measuring the discrepancies between the target Yi and the prediction ˆf(Xi). Considering a multivariate score in the context of CP departs from typical strategies, which rely on scalar scores. Such multivariate scores can particularly be useful for quantifying uncertainties, as described in the examples below. Example 1 (Multi-output regression). In multi-output regression, both the response Y and prediction ˆf(X) take values in Rd. One can consider multivariate scores s(Y, ˆf(X)) corresponding to component-wise prediction errors (see Section 3), without the need of aggregating them into a single value (e.g., by considering the mean squared error). Figure 1(a) illustrates 2-dimensional scores in a context of bivariate regression. Example 2 (Multiclass classification). Consider a classification setting with K 3 classes and let ˆπ(x) = {ˆπk(x)}K k=1 be the estimated class probabilities returned by a classifier for some input x. Denote by y = {1k=y}K k=1 the one-hot encoding of a label y. A multivariate score can be formed as the component-wise absolute difference s(x, y) = |ˆπ(x) y| RK + . (1) This score retains K-dimensional predictive information, allowing for the exploration of correlations between its components. For instance, when K = 3, consider two inputs x1 and x2 with output probabilities ˆπ(x1) = (0.6, 0.4, 0) and ˆπ(x2) = (0, 0.4, 0.6). For both predictions, assessing the conformity of y = 2 with a typical score 1 ˆπy(x) used in CP for classification would return the same value of 0.6. This potentially ignores that co-occurrences between labels 1 and 2 might be more frequent than between 2 and 3. In contrast, the multivariate alternative (1) distinguishes these two probability profiles, as s(x1, y) = s(x2, y). This can be more helpful to capture the underlying confusion patterns of the predictor across different label modalities. In the rest of the paper, we denote by {Si}n i=1 = {s(Xi, Yi)}n i=1 the scores computed on the calibration set. 2.2. Optimal transport toolbox In the context of conformal prediction, dealing with multivariate scores implies defining an adequate notion of multivariate quantiles. To do so, we view the non-conformity scores {Si}n i=1 through the empirical distribution ˆνn = 1 n Pn i=1 δSi and leverage optimal transport (OT) tools, more specifically, Monge-Kantorovich quantiles. Definition 2.1 (Empirical Monge-Kantorovich ranks, Chernozhukov et al. (2017); Hallin et al. (2021)). Consider the reference rank vectors {Ui}n i=1 given by i {1, . . . , n}, Ui = i where θi are i.i.d. random vectors drawn uniformly on the Euclidean sphere Sd 1 = {θ Rd : θ = 1}. The Monge Kantorovich rank map is defined for any score s Rd as Rn(s) = argmax Ui:1 i n Ui, s ψn(Ui) , (3) Optimal Transport-based Conformal Prediction (a) Multivariate scores {Si}n i=1 (b) Reference rank vectors {Ui}n i=1 Figure 1. Ranking multivariate scores using optimal transport. The colormap encodes how the 2-dimensional scores {Si}n i=1 in (a) are transported onto the reference rank vectors {Ui}n i=1 in (b). with ψn the potential solving the dual of Kantorovich s OT problem, i.e., ψn = argmin φ 1 n i=1 φ(Ui) + 1 i=1 φ (Si), where the optimization is performed over the set of lower-semicontinuous convex functions, and φ (x) = supu{ x, u φ(u)} is the Legendre transform of a convex function φ. Note that Rn verifies Rn(Si) = Uσn(i) for i {1, . . . , n}, where σn is the solution of the assignment problem σn = argmin σ Pn i=1 Si Uσ(i) 2, (4) for Pn the set of all permutations of {1, , n}. This transport-based rank map echoes the one-dimensional case, with {Ui}n i=1 replacing traditional ranks {1, 2, . . . , n} based on univariate quantile levels { 1 n, . . . , 1}. By definition, Rn(s) { 1 n, . . . , 1} for any s Rd. This allows to introduce a specific ordering of Rd, namely s1 Rn s2 if, and only if, Rn(s1) Rn(s2) . This multivariate ordering is illustrated in Figure 1. A main virtue is its ability to capture the shape of the underlying probability distribution. Another notable advantage of Definition 2.1 is that the ranks are distribution-free: by construction, Rn(S1) , . . . , Rn(Sn) correspond to a random permutation of { 1 n, , 1}, regardless of the distribution of the non-conformity scores {Si}n i=1. This concept of multivariate rank allows the construction of multivariate quantile regions, which play a central role in our methodology, as we will see in the next section. Remark 2.2. The choice of reference rank vectors {Ui}n i=1 in Definition 2.1 is flexible and can be tailored to specific needs, provided that {Si}n i=1 and {Ui}n i=1 remain independent (Ghosal & Sen, 2022, Remark 3.11). The convention adopted in Definition 2.1 has the merit to fix the ideas and to be appropriate for regression tasks. 2.3. OT-based conformal prediction (OT-CP) We present a new methodology, OT-CP, that leverages optimal transport to perform (split) conformal prediction with multivariate scores. Unlike traditional CP approaches, our method relies on a multivariate perspective to quantify uncertainties and construct prediction regions through Monge Kantorovich vector quantiles. Given a confidence level α [0, 1], the proposed framework consists of three steps: 1. Multivariate score computation: Compute the multivariate scores {Si}n i=1 = {s(Xi, Yi)}n i=1 on the calibration set {(Xi, Yi)}n i=1, 2. Quantile region construction: (a) Split the calibration set into D1 and D2 of respective sizes n1 and n2 such that n1 + n2 = n, (b) Compute the MK rank map Rn1 based on D1, (c) Construct the MK quantile region, b Qn(α) = n s : Rn1(s) ρ (n2+1)α o , (5) where ρ (n2+1)α is the (n2 + 1)α -th smallest element of { Rn1(s(Xi, Yi)) : (Xi, Yi) D2}. 3. Prediction set computation: For a test input Xtest from the same distribution as the calibration set, return the prediction region, b Cα(Xtest)= n y Y : s(Xtest, y) b Qn(α) o . The key novelty of OT-CP lies in the use of multivariate scores (step 1) along with the construction of an OT-based confidence region (step 2). By definition, this region leverages multivariate quantiles of the empirical distribution of calibration scores, accounting for marginal correlations within their components. Our methodology enables the construction of confidence regions without predefined shapes, thereby aligning better with the underlying data distribution. In step 3, the prediction set for a new input Xtest is evaluated through the preimage by the score function of the quantile region. This generalizes the one-dimensional case, where classical quantiles are used to construct prediction sets in the form of intervals. Optimal Transport-based Conformal Prediction Extending conformal prediction to the multivariate setting using optimal transport quantiles poses non-trivial challenges, as standard distribution-free arguments do not directly apply. To address these subtleties, we introduce a careful splitting strategy (Kuchibhotla, 2020, 3.3.1) in step 2, which helps to provide explicit theoretical coverage guarantees for our OT-CP method. Remark 2.3 (Computational aspects). Our approach requires solving an optimal transport problem between two discrete distributions, each consisting of n points. In the case of univariate scores, this OT problem simplifies to a sorting operation, thus one recovers the standard O(n log n) cost. In the multivariate case, the OT problem is generally solved via linear programming, inducing a computational complexity of O(n3). For the numerical experiments carried out in this paper, the computational times remain reasonable, as reported in Appendix D.4. In large-scale settings, efficient approximation methods can reduce the computational complexity to O(n2) (Peyr e et al., 2019), as exploited in the entropic OT-CP methodology of Klein et al. (2025). 2.4. Coverage guarantees Next, we show that the prediction regions constructed with OT-CP are valid, meaning they satisfy the coverage property. Theorem 2.4 (Coverage guarantee). Suppose {(Xi, Yi)}n i=1 {(Xtest, Ytest)} are exchangeable. Let α (0, 1) such that α(n2 + 1) n2. The prediction region b Cα constructed on {(Xi, Yi)}n i=1 satisfies α P Ytest b Cα(Xtest) α + nties n2 + 1 , (6) where the probability is taken over the joint distribution of {(Xi, Yi)}n i=1 {(Xtest, Ytest)} and nties {1, . . . , n2 + 1} is the maximum number of ties in { Rn1(s(Xi, Yi) : (Xi, Yi) D2} { Rn1(Stest) } (i.e., each distinct value in this sample appears at most nties times). We present two proof strategies for Theorem 2.4 in Appendices B.1 and B.2. The main challenge lies in extending the desirable properties of univariate quantiles, namely distribution-freeness (see, e.g., Hallin et al., 2021) and stability, to the multivariate setting. However, the stability arguments invoked in standard proofs of the quantile lemma (see, e.g., Tibshirani et al., 2019) do not directly apply here, as they rely on unresolved theoretical questions in optimal transport. To address this difficulty, inspired by Kuchibhotla (2020), we adopt a splitting strategy that enables us to derive a multivariate analogue of the quantile lemma. Indeed, the rank of a new sample score among the n2 calibration scores D2 is guaranteed to follow a uniform distribution, ensuring valid prediction regions without distributional assumptions. Theorem 2.4 ensures that, for a given coverage level α (0, 1), the true label Ytest belongs to the OT-based prediction region b Cα(Xtest) with probability at least α. Moreover, this coverage probability is shown to be of the order of α, being upper-bounded by α + nties/(n2 + 1) with n2 the size of the subset D2 and nties {1, . . . , n2 + 1}. The factor in this upper bound naturally arises from the discrete feature of the MK rank map (3) and our splitting procedure, which may introduce ties in the ranking process. Note that a tiebreaking rule can be applied if ties occur, as usually done in CP (Angelopoulos & Bates, 2023) to enforce nties = 1. While OT-CP can be applied to any model and score function, the next sections focus on specific settings to clearly demonstrate its benefits and potential. 3. Multi-Output Regression This section examines the application of OT-CP for multioutput regression. First, we demonstrate how this approach accommodates arbitrary score distributions, enabling the creation of diverse and data-tailored prediction region shapes. Next, we introduce an extension of our method, called OTCP+, which incorporates conditional adaptivity to input covariates. We demonstrate its effectiveness both empirically and theoretically, establishing an asymptotic coverage guarantee for OT-CP+. 3.1. OT-CP can output non-convex prediction sets For any feature vector X Rp and response vector Y Rd, we aim to conformalize the prediction ˆf(X) returned by a given black-box regressor. CP methods for multi-output regression. To further motivate OT-CP in this context, we first review existing conformal strategies, that can be reinterpreted as leaning on multivariate quantile regions. A more comprehensive discussion can be found in Appendix C. One could consider vanilla CP relying on a univariate aggregated score, s(x, y) = y ˆf(x) . This yields spherical prediction regions { ˆf(x)} + Ball (τα)1 where Ball (τα) is the Euclidean ball of radius τα > 0. One can also treat the d components of Y Rd separately to produce prediction regions based on hyperrectangles, { ˆf(x)} + Qd i=1[ai, bi] (Neeven & Smirnov, 2018). However, these approaches are often ill-suited to accurately capture the geometry of multivariate distributions. In particular, the output prediction sets (whether spherical or hyperrectangles) can be too large when handling anisotropic uncertainty that varies across different output dimensions. To mitigate this, prior works have introduced scores that account for anisotropy and correlations among the residual dimensions, as with ellipsoidal prediction sets (Messoudi et al., 2020; Johnstone & Cox, 2021; Henderson et al., 2024). Still, this implicitly assumes an 1The expression involves the Minkowski sum between two sets: for two sets A and B, A + B = {a + b, a A, b B}. Optimal Transport-based Conformal Prediction 0.000.250.500.751.001.251.501.752.00 2 0 2 4 6 8 10 12 Test samples Prediction f(x) MK prediction sets (a) Prediction x 7 ˆf(x) and prediction sets b Cα(x) (b) Scores (residuals) with different quantile regions OT-CP RECT ELL 0.0 Marginal coverage (c) Coverage OT-CP RECT ELL Volume of prediction sets (d) Volumes Figure 2. Conformal multi-output regression with OT-CP on simulated data elliptical distribution for the non-conformity score, thereby compromising the distribution-free nature of the method. OT-CP for multi-output regression. Our strategy consists in applying OT-CP with a multivariate residual as the score, s(x, y) = y ˆf(x) Rd , (7) and yields the following prediction regions x Rp, b Cα(x) = ˆf(x) + b Qn(α) . (8) These sets can take on flexible, arbitrary shapes, that adapt to the calibration error distribution and the underlying data geometry. This key advantage is illustrated concretely in our numerical experiments below. Numerical experiments. In what follows, we study a practical regression problem and compare several CP methods described above: OT-CP for forming prediction regions as in (8), a CP approach producing ellipses (ELL, Johnstone & Cox, 2021), and a simple method creating hyperrectangle (REC, Neeven & Smirnov, 2018), with the miscoverage level adjusted by the Bonferroni correction. We simulate univariate inputs X Unif([0, 2]) with responses Y R2, and we assume that we are given a pre-trained predictor ˆf(x) = (2x2, (x + 1)2), x R. We interpret the score s(X, Y ) = Y ˆf(X) as a random vector ζ distributed from a mixture of Gaussians and independent of X, meaning that the distribution of s(X, Y ) remains unchanged when conditioned on X. Quantile regions for α = 0.9 are constructed using n = 1000 calibration instances. More implementation details can be found in Appendix D. As expected, OT-CP prediction regions exhibit superior adaptability to the distribution of residuals, whereas hyperrectangles and ellipses tend to be overly conservative (Figures 2(a) and 2(b)). We also compare the methods in terms of empirical coverage on test data (Figure 2(c)) and efficiency (volume of prediction regions, Figure 2(d)). While all approaches adhere to the α-coverage guarantee OT-CP achieves greater efficiency, producing smaller and more precise prediction regions. This highlights that MK quantiles help effectively address uncertainty quantification challenges for multi-output regression. 3.2. OT-CP+: an adaptive version So far, the form of the constructed prediction regions (8) does not depend on the input X, as illustrated in Figure 2(a). This uniformity stems from computing quantile regions over the distribution of scores {Si}n i=1 marginalized over {(Xi, Yi)}n i=1. In other words, {Si}n i=1 are treated as i.i.d. realizations of S = Y ˆf(X). As a result, while the quantile regions provided by OT-CP effectively capture the global geometry of the scores, they do not adapt to variations in X. This lack of adaptivity is inadequate in applications where prediction uncertainties vary between input examples, as discussed by Foygel Barber et al. (2020). The quest for adaptivity led to a rich and diverse literature, see Chernozhukov et al. (2021); Gibbs et al. (2023); Sesia & Romano (2021); Romano et al. (2020a) and references therein, focusing mainly on the design of the univariate score function. We also refer to the concomitant work of Dheur et al. (2025) for a benchmark of CP methods in the case of multi-output regression. In the following, we propose a complementary perspective through conditional MK quantiles on multivariate scores to accommodate input adaptiveness. Methodology. To account for input-dependent uncertainty in the predictions, we introduce OT-CP+, a conformal procedure that computes adaptive MK quantile region by leveraging multiple-output quantile regression (del Barrio et al., 2024). Consider the calibration data splitting into D1 and D2, as in step 2 of OT-CP. Given a test point Xtest, we compute the conditional MK rank map Rk( |Xtest) based on the k-nearest neighbors in D1 of Xtest (and a subsample of k reference vectors). Given a coverage α (0, 1), Optimal Transport-based Conformal Prediction 0.000.250.500.751.001.251.501.752.00 0 2 4 6 8 10 12 14 Test samples Prediction f(x) MK prediction sets (a) Adaptive prediction regions Marginal [0.25, 0.5] [1.5, 2] 0.0 (b) Conditional coverage Figure 3. Adaptive conformal regression with OT-CP+ we calibrate the threshold ρ (n2+1)α using the conditional rank maps Rk( |x) for the inputs x associated with D2. The obtained quantile region is given by, b Qk(α|Xtest) = s : Rk(s|Xtest) ρ (n2+1)α . We defer to Appendix A for a detailed methodology. Hence, OT-CP+ relies on the distribution of s(X, Y ) given X, which is approximated on a neighborhood of X. The prediction regions returned by OT-CP+ are thus given by b Cα,k(Xtest) = ˆf(Xtest) + b Qk(α|Xtest) . (9) Experiments on simulated data. We first consider a similar setting as that of Section 3.1, where the score s(X, Y ) is now distributed as Xζ. Consequently, the variance of the residual increases with X, which suggests that wider quantiles should be constructed for larger values of X. Figure 3 confirms that OT-CP+ effectively constructs adaptive prediction regions with the desired α-coverage. To quantify this more precisely, we evaluate the empirical coverage conditionally on X: Figure 3(b) reports box plots of P(Ytest b Cα(Xtest)|Xtest I) for several choices of subsets I [0, 2], showing that OT-CP+ satisfies approximate conditional coverage. Note that computational time is larger for OT-CP+ than OT-CP, as it requires to solve multiple OT problems (see Figure 12 in Appendix D.4). Experiments on real data. Next, we evaluate OT-CP+ on real datasets sourced from Mulan (Tsoumakas et al., 2011), with dataset statistics summarized in Table 1. We also implement a concurrent CP method (Messoudi et al., 2022), that is an adaptive extension of the previous ellipsoidal approach (Johnstone & Cox, 2021). Specifically, Messoudi et al. (2022) construct ellipsoidal prediction sets that account for local geometry, by estimating the covariance of Y |X with the k-nearest neighbors (k NN) of X. We split each dataset into training, calibration, and testing subsets (50% 25% 25% ratio) and train a random forest enb atp1d jura rf1 wq scm20d Data Worst Set Coverage (a) Low data regime ( 1500 total samples) rf1 scm20d Data Worst Set Coverage (b) Medium data regime (9000 total samples) Figure 4. Conditional coverage on real datasets of two adaptive conformal procedures for multi-output regression model as the regressor. Both methods use a k NN step that selects 10% of the calibration set as neighbors for each test point Xtest. As a coverage metric, we consider the worst-set coverage, minj {1,...,J} P(Ytest b Cα(Xtest)|Xtest Aj), with {Aj}j {1,...,J} a partition of the input space tailored to the test data. This metric is conceptually similar to the worstslab coverage (Cauchois et al., 2021), which considers specific partitions in the form of slabs. In our approach, we obtain J = 5 regions {Aj}j {1,...,5} by clustering, i.e., employing (i) a random selection of centroids, and (ii) a k NN procedure ensuring that each region contains 10% of the test samples. Empirical results presented in Figure 4 provide evidence supporting the approximate conditional coverage achieved by OT-CP+. Indeed, the worst-set coverage of OTCP+ remains consistently close to the target level α = 0.9 across all datasets, regardless of the sample size or the data dimension. This contrasts with the adaptive ellipsoidal approach, which does not achieve such α-coverage and exhibits greater variability. We note, however, that OT-CP+ tend to overcover for datasets with the lowest sample sizes (namely atp1d and jura , both with around 350 samples; see Table 1). We also report the marginal coverage, volume and computational time in Figure 13: our results show that OT-CP+ satisfies approximate conditional coverage at the price of larger set sizes on average and runtimes when compared with local ellipsoids. Asymptotic conditional coverage. In the one-dimensional case (d = 1), Lei et al. (2018) established the inherent limitation of achieving exact distribution-free conditional coverage in finite samples. However, asymptotic conditional coverage remains attainable under regularity assumptions (Lei et al., 2018; Chernozhukov et al., 2021). OT-CP+ benefits from such a guarantee, leveraging asymptotic properties of quantile regression for MK quantiles (del Barrio et al., 2024). The following assumption is needed. Assumption 3.1. Suppose that (X1, Y1), . . . , (Xn, Yn), (Xtest, Ytest) are i.i.d. Assume that for almost every x, the distribution PS|X=x of s(Xtest, Ytest) given Xtest = x Optimal Transport-based Conformal Prediction is Lebesgue-absolutely continuous on its convex support Supp PS|X=x . For any R > 0, suppose that its density p( |x) verifies for all s Supp PS|X=x Ball (R), λx R p(s|x) Λx R. Theorem 3.2. Let k be the number of nearest neighbors used to estimate Rk( |x). Assume that k + and k/n1 0 as n1 + . Under Assumption 3.1, the following holds for any α [0, 1], as n1, n2 + , P Ytest b Cα,k(Xtest) Xtest P α , (10) where b Cα,k(x) = ˆf(x) + b Qk(α|x) depends on ˆf previously learned on (fixed) training data. We note that OT-CP+ is not the only way to make OT-CP adaptive: our proposed methodology aims to demonstrate that conditional coverage can be achieved with only a slight modification of the generic OT-CP framework. In addition to being easy to implement, the added k-NN step allows us to leverage established results on the consistency of quantiles (Biau & Devroye, 2015; del Barrio et al., 2024), which serve as the foundation for our Theorem 3.2. 4. Classification In this section, we apply OT-CP to multiclass classification. Each data point consists of a feature-label pair (X, Y ) Rp {1, . . . , K}, with K 3 the number of classes. The given black-box classifier outputs, for any input X Rp, a vector ˆπ(X) of estimated class probabilities, where the k-th component ˆπk(X) is the probability estimate that X belongs to class k (hence, PK k=1 ˆπk(X) = 1). CP methods for classification. Commonly used scores for multiclass classification include the Inverse Probability (IP), s(x, y) = 1 ˆπy(x) and the Margin Score (MS), s(x, y) = maxy =y ˆπy (x) ˆπy(x) (Johansson et al., 2017). IP only considers the probability estimate for the correct class label (ˆπy(x)), whereas MS also involves the most likely incorrect class label (maxy =y ˆπy (x)). More adaptive options argue in favor of incorporating more class labels in the score function (Romano et al., 2020b; Angelopoulos et al., 2021; Melki et al., 2024). The idea is to rank the labels from highest to lowest confidence (by sorting the probability estimates as ˆπ(y1)(x) ˆπ(y2)(x) ˆπ(y K)(x)), then return the labels such that the total confidence (i.e., the cumulative sum) is at least α. It is worth noting that this strategy stems from a notion of generalized conditional quantile function (Romano et al., 2020b), by analogy with infc R{P(Y c|X = x) α}. OT-CP for multiclass classification. As an alternative CP method for this problem, we propose using OT-CP with the following multivariate score, s(X, Y ) = | Y ˆπ(X)| RK + , (11) (a) si = yi ˆπ(xi) (b) s+ i = |yi ˆπ(xi)| Figure 5. Ordering must depend on the chosen scores: (a) Centeroutward for signed errors, (b) Left-to-right for absolute errors where the absolute value is taken component-wise and Y = (1Y =k)K k=1 denotes the one-hot encoding of Y . One can remark in passing that s(x, y) 1 = 2(1 ˆπy(x)), which corresponds to the aforementioned IP scalar score. Our OT-CP procedure builds upon generalized quantiles to take into account all the components of ˆπy(x) (and not only the largest values) and to capture the correlations between them. The score in (11) takes values in RK + and naturally induces a left-to-right ordering. This contrasts with the score function used in our previous application, multi-output regression, where the ordering is center-outward. To further clarify this difference, let us focus on a single component of the score, s(x, y)k, for simplicity. A center-outward interval of the form of [qα/2, q1 α/2] applied to s(x, y)k excludes lower values from [0, qα/2) (Figure 5(a)). This exclusion is problematic for the score structure induced by (11), since lower values of s(x, y)k indicate greater conformity between x and the ground-truth y. In this context, left-to-right ordering is more appropriate, as illustrated in Figure 5(b). (a) Multivariate scores {Si}n i=1 corresponding to absolute errors in Rd + (b) Positive reference rank vectors {Ui}n i=1 Figure 6. Positive reference ranks for a left-to-right ordering. The colormap encodes how the 2-dimensional scores {Si}n i=1 in (a) are transported onto the reference rank vectors {Ui}n i=1 in (b) A left-to-right ordering can be easily achieved by making a slight adjustment to Definition 2.1: we choose the reference rank vectors as Ui = i nθ+ i , where θ+ i is uniformly sampled in {θ Rd + : θ 1 = 1}. As depicted in Figure 6, the resulting MK ranks reflect the desired left-to-right ordering. Optimal Transport-based Conformal Prediction (a) Simulated data IP MS APS OTCP Method WSC-Coverage (b) WSC coverage IP MS APS OTCP Method (c) Average size IP MS APS OTCP Method Informativeness (d) Informativeness Figure 7. Conformal classification by Quadratic Discriminant Analysis on simulated data It is worth noting that this adjustment is fully compatible with the general definition of MK quantiles, which is flexible enough to accommodate arbitrary reference distributions (see Remark 2.2). Based on this choice of score (11) and reference rank vectors, OT-CP generates the following prediction sets, b Cα(x)= n y {0, 1}K : | y ˆπ(x)| b Qn(α) o , where the region b Qn( ) is constructed from {Ui} = { i Numerical experiments. We compare OT-CP against IP, MS and APS scores in terms of worst-case coverage (WSC, measuring conditional coverage, as proposed in Romano et al. (2020b)), efficiency (average size of the predicted set) and informativeness (average number of predicted singletons). More implementation details are given in Appendix D. We start by simulating data according to a Gaussian mixture model, represented in Figure 7(a) and we consider a pretrained classifier based on Quadratic Discriminant Analysis. Figures 7(b) to 7(d) outline that OT-CP successfully retains the efficiency and informativeness hallmarks of IP and MS while simultaneously enhancing conditional coverage on X, akin to the improvements achieved by APS. These results highlight that OT-CP effectively handles arbitrary probability profiles by leveraging the entire softmax output, rather than relying solely on its sum, to construct more informative and meaningful prediction sets. The relevance of OT-CP is also confirmed on real datasets. In Figure 8 and Figure 9, we present the results for a random forest on MNIST and Fashion-MNIST. Additional numerical experiments are provided in Appendix D. Interestingly, despite not being explicitly designed for this purpose, OTCP achieves conditional coverage with respect to the label on par with APS, where IP and MS fall short, as highlighted in Figure 9. In addition, OT-CP maintains the efficiency and informativeness of IP and MS, offering a convenient balance across all the considered metrics, as one can observe in Figure 8. We finally emphasize that the numerical experiments were designed as prototypes to demonstrate how OT-CP can be seamlessly and effectively adapted to typical classification tasks. The focus is on demonstrating a useful application of our general framework, which already shows several benefits while remaining conceptually simple. 5. Conclusion and perspectives We have introduced a general and versatile framework for conformal prediction grounded in optimal transport theory. This approach not only revisits classical CP methods based on scalar scores, but also extends easily to handle multivariate scores in a novel and robust manner, thanks to the inherent properties of Monge-Kantorovich quantiles. The OT-CP methodology is flexible, enabling the construction of prediction regions tailored to diverse scenarios, besides being well-suited to capture complex uncertainty structures. However, in the setting of multi-output regression, our approach may be observed to be more conservative than the ellipsoidal one. We identify opportunities for improvement in this direction by employing more suitable reference distributions for instance. Moreover, the flexibility of the approach, rooted in the Monge-Kantorovich quantile formulation, comes at the cost of increased computational complexity compared to conformal methods based on univariate scores. An alternative transport-based method to achieve better computational and statistical efficiency with respect to the scale of the problem is to use entropic maps, as in the concomitant work by Klein et al. (2025). In addition, the OT-CP methodology could be adapted to new learning tasks. One might think of multi-label classification where the multivariate score (11) immediately applies by replacing the one-hot encoding by a multi-hot encoding, see, e.g., Katsios & Papadopoulos (2024) for related ellipsoidal inference. One could also explore the development of more sophisticated multivariate scores, potentially building on existing alternatives (e.g., Tumu et al. (2024); Optimal Transport-based Conformal Prediction MNIST FASHION Data Method IP MS APS OTCP (a) Average size MNIST FASHION Data Informativeness Method IP MS APS OTCP (b) Informativeness Figure 8. Classification metrics on MNIST and Fashion-MNIST, results averaged over the 10 labels 0 1 2 3 4 5 6 7 8 9 Label Method IP MS APS OTCP 0 1 2 3 4 5 6 7 8 9 Label Method IP MS APS OTCP (b) Fashion-MNIST Figure 9. Label-wise coverage on K = 10 classes of MNIST and Fashion-MNIST Wang et al. (2023); Plassier et al. (2024) for regression; and Angelopoulos et al. (2021); Melki et al. (2024) for classification). Indeed, our numerical experiments demonstrate that basic multivariate scores can outperform classical univariate counterparts, providing a supplementary motivation for pursuing into this direction. Impact Statement This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here. Acknowledgments The authors would like to thank Gilles Blanchard for helpful discussions regarding exchangeability and finite-sample guarantees. This work benefited from state aid managed by the National Research Agency ANR-23-IACL-0008 under France 2030, for the project PR[AI]RIE-PSAI. Angelopoulos, A. N. and Bates, S. Conformal prediction: A gentle introduction. Foundations and Trends in Machine Learning, 16(4):494 591, 2023. ISSN 19358237. doi: 10.1561/2200000101. Angelopoulos, A. N., Bates, S., Jordan, M., and Malik, J. Uncertainty sets for image classifiers using conformal prediction. In International Conference on Learning Representations, 2021. Biau, G. and Devroye, L. Lectures on the nearest neighbor method, volume 246. Springer, 2015. Bogachev, V. I. and Ruas, M. A. S. Measure theory, volume 1. Springer, 2007. Bogoya, J. M., B ottcher, A., and Maximenko, E. A. From convergence in distribution to uniform convergence. Bolet ın de la Sociedad Matem atica Mexicana, 22:695 710, 2016. Braun, S., Aolaritei, L., Jordan, M. I., and Bach, F. Mini- Optimal Transport-based Conformal Prediction mum volume conformal sets for multivariate regression. ar Xiv preprint ar Xiv:2503.19068, 2025. Carlier, G., Chernozhukov, V., and Galichon, A. Vector quantile regression: an optimal transport approach. The Annals of Statistics, 44(3):1165 1192, 2016. Cauchois, M., Gupta, S., and Duchi, J. C. Knowing what you know: valid and validated confidence sets in multiclass and multilabel prediction. Journal of machine learning research, 22(81):1 42, 2021. Chernozhukov, V., Galichon, A., Hallin, M., and Henry, M. Monge Kantorovich depth, quantiles, ranks and signs. The Annals of Statistics, 45(1):223 256, 2017. doi: 10.1214/16-AOS1450. Chernozhukov, V., W uthrich, K., and Zhu, Y. Distributional conformal prediction. Proceedings of the National Academy of Sciences, 118(48):e2107794118, 2021. Deb, N. and Sen, B. Multivariate rank-based distributionfree nonparametric testing using measure transportation. Journal of the American Statistical Association, 118 (541):192 207, 2023. del Barrio, E., Sanz, A. G., and Hallin, M. Nonparametric multiple-output center-outward quantile regression. Journal of the American Statistical Association, pp. 1 15, 2024. Dheur, V., Fontana, M., Estievenart, Y., Desobry, N., and Taieb, S. B. Multi-output conformal regression: A unified comparative study with new conformity scores. ar Xiv preprint ar Xiv:2501.10533, 2025. Eisenberg, B. and Gan, S. X. Uniform convergence of distribution functions. Proceedings of the American Mathematical Society, 88(1):145 146, 1983. Feldman, S., Bates, S., and Romano, Y. Calibrated multipleoutput quantile regression with representation learning. Journal of Machine Learning Research, 24(24):1 48, 2023. Flamary, R., Courty, N., Gramfort, A., Alaya, M. Z., Boisbunon, A., Chambon, S., Chapel, L., Corenflos, A., Fatras, K., Fournier, N., et al. Pot: Python optimal transport. Journal of Machine Learning Research, 22(78):1 8, 2021. Foygel Barber, R., Cand es, E. J., Ramdas, A., and Tibshirani, R. J. The limits of distribution-free conditional predictive inference. Information and Inference: A Journal of the IMA, 10(2):455 482, 08 2020. ISSN 2049-8772. doi: 10.1093/imaiai/iaaa017. Ghosal, P. and Sen, B. Multivariate ranks and quantiles using optimal transport: Consistency, rates, and nonparametric testing. The Annals of Statistics, 50(2):1012 1037, 2022. Gibbs, I., Cherian, J. J., and Cand es, E. J. Conformal prediction with conditional guarantees. ar Xiv preprint ar Xiv:2305.12616, 2023. Hallin, M. Measure transportation and statistical decision theory. Annual Review of Statistics and Its Application, 9(1):401 424, 2022. Hallin, M. and Paindaveine, D. Optimal tests for multivariate location based on interdirections and pseudomahalanobis ranks. The Annals of Statistics, 30(4):1103 1133, 2002. Hallin, M., del Barrio, E., Cuesta-Albertos, J., and Matr an, C. Distribution and quantile functions, ranks and signs in dimension d: A measure transportation approach. The Annals of Statistics, 49(2):1139 1165, 2021. doi: 10. 1214/20-AOS1996. Henderson, I., Mazoyer, A., and Gamboa, F. Adaptive inference with random ellipsoids through conformal conditional linear expectation. ar Xiv preprint ar Xiv:2409.18508, 2024. Hl avka, Z., Hlubinka, D., and Hudecov a, ˇS. Multivariate quantile-based permutation tests with application to functional data. Journal of Computational and Graphical Statistics, pp. 1 15, 2025. Izbicki, R., Shimizu, G., and Stern, R. B. Cd-split and hpd-split: Efficient conformal regions in high dimensions. Journal of Machine Learning Research, 23(87): 1 32, 2022. Johansson, U., Linusson, H., L ofstr om, T., and Bostr om, H. Model-agnostic nonconformity functions for conformal classification. In 2017 International Joint Conference on Neural Networks (IJCNN), pp. 2072 2079. IEEE, 2017. Johnstone, C. and Cox, B. Conformal uncertainty sets for robust optimization. In Conformal and Probabilistic Prediction and Applications, pp. 72 90. PMLR, 2021. Katsios, K. and Papadopoulos, H. Multi-label conformal prediction with a mahalanobis distance nonconformity measure. Proceedings of Machine Learning Research, 230:1 14, 2024. Klein, M., Bethune, L., Ndiaye, E., and Cuturi, M. Multivariate conformal prediction using optimal transport. ar Xiv preprint ar Xiv:2502.03609, 2025. Kuchibhotla, A. K. Exchangeability, conformal prediction, and rank tests. ar Xiv preprint ar Xiv:2005.06095, 2020. Optimal Transport-based Conformal Prediction Lei, J., G Sell, M., Rinaldo, A., Tibshirani, R. J., and Wasserman, L. Distribution-free predictive inference for regression. Journal of the American Statistical Association, 113(523):1094 1111, 2018. Luo, R. and Zhou, Z. Volume-sorted prediction set: Efficient conformal prediction for multi-target regression. ar Xiv preprint ar Xiv:2503.02205, 2025. Melki, P., Bombrun, L., Diallo, B., Dias, J., and Da Costa, J.-P. The penalized inverse probability measure for conformal classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3512 3521, 2024. Messoudi, S., Destercke, S., and Rousseau, S. Conformal multi-target regression using neural networks. In Conformal and Probabilistic Prediction and Applications, pp. 65 83. PMLR, 2020. Messoudi, S., Destercke, S., and Rousseau, S. Copula-based conformal prediction for multi-target regression. Pattern Recognition, 120:108101, 2021. Messoudi, S., Destercke, S., and Rousseau, S. Ellipsoidal conformal inference for multi-target regression. In Conformal and Probabilistic Prediction with Applications, pp. 294 306. PMLR, 2022. Neeven, J. and Smirnov, E. Conformal stacked weather forecasting. In Conformal and Probabilistic Prediction and Applications, pp. 220 233. PMLR, 2018. Park, J. W., Tibshirani, R., and Cho, K. Semiparametric conformal prediction. ar Xiv preprint ar Xiv:2411.02114, 2024. Peyr e, G., Cuturi, M., et al. Computational optimal transport: With applications to data science. Foundations and Trends in Machine Learning, 11(5-6):355 607, 2019. Plassier, V., Fishkov, A., Guizani, M., Panov, M., and Moulines, E. Probabilistic conformal prediction with approximate conditional validity. ar Xiv preprint ar Xiv:2407.01794, 2024. Puri, M. L. and Sen, P. K. Nonparametric methods in multivariate analysis. John Wiley & Sons, New York, 1971. Romano, Y., Patterson, E., and Candes, E. Conformalized quantile regression. Advances in neural information processing systems, 32, 2019. Romano, Y., Barber, R. F., Sabatti, C., and Cand es, E. With malice toward none: Assessing uncertainty via equalized coverage. Harvard Data Science Review, 2(2):4, 2020a. Romano, Y., Sesia, M., and Candes, E. Classification with valid and adaptive coverage. Advances in Neural Information Processing Systems, 33:3581 3591, 2020b. Rosenberg, A. A., Vedula, S., Romano, Y., and Bronstein, A. Fast nonlinear vector quantile regression. In The Eleventh International Conference on Learning Representations, 2023. Sesia, M. and Romano, Y. Conformal prediction using conditional histograms. Advances in Neural Information Processing Systems, 34:6304 6315, 2021. Sun, S. H. and Yu, R. Copula conformal prediction for multistep time series prediction. In The Twelfth International Conference on Learning Representations, 2024. Tibshirani, R. J., Foygel Barber, R., Candes, E., and Ramdas, A. Conformal prediction under covariate shift. Advances in neural information processing systems, 32, 2019. Timans, A., Straehle, C.-N., Sakmann, K., Naesseth, C. A., and Nalisnick, E. Max-rank: Efficient multiple testing for conformal prediction. In The 28th International Conference on Artificial Intelligence and Statistics, 2025. Tsoumakas, G., Spyromitros-Xioufis, E., Vilcek, J., and Vlahavas, I. Mulan: A java library for multi-label learning. The Journal of Machine Learning Research, 12: 2411 2414, 2011. Tumu, R., Cleaveland, M., Mangharam, R., Pappas, G., and Lindemann, L. Multi-modal conformal prediction regions by optimizing convex shape templates. In 6th Annual Learning for Dynamics & Control Conference, pp. 1343 1356. PMLR, 2024. Vovk, V., Gammerman, A., and Shafer, G. Algorithmic learning in a random world, volume 29. Springer, 2005. Wang, Z., Gao, R., Yin, M., Zhou, M., and Blei, D. Probabilistic conformal prediction using conditional random samples. In Ruiz, F., Dy, J., and van de Meent, J.-W. (eds.), Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 of Proceedings of Machine Learning Research, pp. 8814 8836. PMLR, 25 27 Apr 2023. Optimal Transport-based Conformal Prediction A. Detailed methodology for OT-CP+ The OT-CP+ procedure is described in detail below. 1. Multivariate score computation: Compute the multivariate scores s(Xi, Yi) n i=1 on the calibration set (Xi, Yi)n i=1. 2. Conditional quantile region construction: (a) Split the calibration set into D1 and D2 of respective sizes n1 and n2 such that n1 + n2 = n. In what follows, for any x, we refer to Rk( |X = x) as the conditional MK rank map based on the k-nearest neighbors of x in D1 (using k reference vectors). (b) For each (Xi, Yi) D2, compute the conditional MK rank map Rk( |X = Xi), then compute ρ (n2+1)α as the (n2 + 1)α -th smallest element of, { Rk(s(Xi, Yi)|X = Xi) : (Xi, Yi) D2} . (c) For any x, compute the conditional MK rank map Rk( |X = x) and define the conditional MK quantile region as b Qk(α|X = x) = n s : Rk(s|X = x) ρ (n2+1)α o . (12) 3. Prediction set computation: For a test input Xtest drawn from the same distribution as the calibration set, return the prediction region b Cα,k(Xtest)= n y Y : s(Xtest, y) b Qk(α|X = Xtest) o . Hence, the main difference from OT-CP lies in using the procedure by del Barrio et al. (2024) to compute conditional MK rank maps Rk( |X = x), thereby providing greater adaptivity compared to the standard MK rank map Rn( ) defined in (3). This section contains the detailed proofs of our theoretical results. For clarity, and without loss of generality, we assume that the calibration set {(Xi, Yi)}n i=1 is split into D1 = {(Xi, Yi)}n1 i=1 and D2 = {(Xn1+i, Yn1+i)}n2 i=1, where n1 + n2 = n. B.1. Proof of Theorem 2.4 (marginal coverage guarantee) The proof of Theorem 2.4 consists in extending the reasoning for the traditional quantile lemma (e.g., Lemma 2 in Romano et al. (2019)) to a multivariate setting. Proof of Theorem 2.4. For k {1, . . . , n2}, let S(k,n2) be the k-th smallest score among {Sn1+i}n2 i=1, where Sn1+i = s(Xn1+i, Yn1+i) and the ordering is induced by the transport map Rn1 computed on D1. Hence, Rn1(S(1,n2)) Rn1(S(2,n2)) Rn1(S(n2,n2)) , (13) which, using the definition of Rn1, is equivalently written as S(1,n2) Rn1 S(2,n2) Rn1 Rn1 S(n2,n2) . (14) By construction, ρ α(n2+1) = Rn1(S( α(n2+1) ,n2)) , where α (0, 1) and α(n2 + 1) n2, ensuring that the order statistic ρ α(n2+1) is well defined. The quantile region b Qn(α) can be characterized as Stest b Qn(α) Stest Rn1 S( α(n2+1) ,n2). (15) Denote by S(k,n2+1) the k-th smallest value (with respect to Rn1 ) within {Sn1+1, . . . , Sn, Stest}. Here, we view Stest as inserted into the ordered list {S(k,n2)}n2 k=1, yielding a new ordered list of size n2 + 1. Then, Stest = S(i0+1,n2+1), with S(1,n2) Rn1 Rn1 S(i0,n2) Rn1 Stest Rn1 S(i0+1,n2) Rn1 Rn1 S(n2,n2) . (16) As a direct consequence of (16), we have Optimal Transport-based Conformal Prediction for k i0, S(k,n2+1) = S(k,n2), for k = i0 + 1, S(i0+1,n2+1) = Stest Rn1 S(i0+1,n2), for k > i0 + 1, S(k,n2+1) = S(k 1,n2) Rn1 S(k,n2). Therefore, for any k {1, . . . , n2}, S(k,n2+1) Rn1 S(k,n2). Hence, if Stest Rn1 S(k,n2+1), then Stest Rn1 S(k,n2). We show that the reciprocal also holds. Assume that Stest Rn1 S(k,n2). By (16), this implies k i0 + 1, and in this case, S(k,n2+1) is the larger of Stest and S(k 1,n2) (with respect to Rn1). Putting everything together, we showed that Stest Rn1 S(k,n2) Stest Rn1 S(k,n2+1). Thus, P Stest Rn1 S(k,n2) = P Stest Rn1 S(k,n2+1) . Hence, for any α [0, 1], P Stest b Qn(α) = P Stest Rn1 S( α(n2+1) ,n2+1) . (17) Recall that nties denotes the maximum number of ties in {Rn1(Sn1+1), . . . , Rn1(Sn1+n2), Rn1(Stest)}. By definition of S(k,n2+1), the proportion of elements from {S1, , Sn, Stest} that are less than or equal to S(k,n2+1) (with respect to Rn1) lies between k/(n2 + 1) and (k 1 + nties)/(n2 + 1), due to the possible presence of ties. Since {(Xi, Yi)}n i=1 is assumed to be exchangeable, and Rn1 is computed using D1, then { Rn1(Sn1+1) , , Rn1(Sn1+n2) , Rn1(Stest) } is exchangeable (see Kuchibhotla, 2020, Proposition 3). Therefore, by (17), α(n2 + 1) n2 + 1 P Stest b Qn(α) α(n2 + 1) 1 + nties and we can conclude that α P Stest b Qn(α) α + nties B.2. Alternative proof of Theorem 2.4 We provide an alternative proof of Theorem 2.4, which is similar in essence to the previous one but based on another perspective. To this end, we recall a variant of the traditional quantile lemma adapted to our needs (i.e., when there are ties in the ranks) and detail its proof for completeness. Lemma B.1. (Quantile lemma, Lei et al., 2018) Suppose (U1, . . . , Un+1) is an exchangeable sequence of random variables in R. Then, for any β (0, 1), P(Un+1 U( β(n+1) )) β (18) Additionally, assume that the maximum number of ties (i.e., identical values) in (U1, . . . , Un+1) is nties. Then, P(Un+1 U( β(n+1) )) β + nties The probabilities are taken over the joint distribution of (U1, . . . , Un+1). Proof. By exchangeability of (U1, . . . , Un+1), for any i {1, . . . , n + 1}, P(Un+1 U( β(n+1) )) = P(Ui U( β(n+1) )) . (20) P(Un+1 U( β(n+1) )) = 1 n + 1 i=1 P(Ui U( β(n+1) )) (21) i=1 1Ui U( β(n+1) ) i=1 1Ui