# false_discovery_proportion_envelopes_with_mconsistency__cd8405da.pdf

Journal of Machine Learning Research 23 (2024) 1-52 Submitted 7/23; Revised 6/24; Published 9/24

False discovery proportion envelopes with m-consistency

Iqraa Meah iqraa.meah@inserm.fr Center for Research in Epidemiology and Statistic S (CRESS) Universit e Paris Cit e and Universit e Sorbonne Paris Nord, Inserm, INRAE F-75004 Paris, France

Gilles Blanchard gilles.blanchard@universite-paris-saclay.fr Institut Math ematiques de Orsay (IMO) CNRS, Inria, Universit e Paris-Saclay F-91405 Orsay Cedex

Etienne Roquain etienne.roquain@sorbonne-universite.fr Laboratoire de Probabilit es, Statistique et Mod elisation, CNRS, Sorbonne Universit e, Universit e de Paris. Sorbonne Universit e 4 place Jussieu, 75005, Paris

Editor: Gabor Lugosi

We provide new nonasymptotic false discovery proportion (FDP) conﬁdence envelopes in several multiple testing settings relevant for modern high dimensional-data methods. We revisit the multiple testing scenarios considered in the recent work of Katsevich and Ramdas (2020): top-k, preordered (including knockoﬀs), online. Our emphasis is on obtaining FDP conﬁdence bounds that both have nonasymptotical coverage and are asymptotically accurate in a speciﬁc sense, as the number m of tested hypotheses grows. Namely, we introduce and study the property (which we call m-consistency) that the conﬁdence bound converges to or below the desired level α when applied to a speciﬁc reference α-level false discovery rate (FDR) controlling procedure. In this perspective, we derive new bounds that provide improvements over existing ones, both theoretically and practically, and are suitable for situations where at least a moderate number of rejections is expected. These improvements are illustrated with numerical experiments and real data examples. In particular, the improvement is signiﬁcant in the knockoﬀs setting, which shows the impact of the method for a practical use. As side results, we introduce a new conﬁdence envelope for the empirical cumulative distribution function of i.i.d. uniform variables and we provide new power results in sparse cases, both being of independent interest. Keywords: Conﬁdence envelope, False discovery rate, Knockoﬀs, Posthoc inference, Online multiple testing

1. Introduction

1.1 Background

Multiple inference is a crucial issue in many modern, high dimensional, and massive data sets, for which a large number of variables are considered and many questions naturally emerge, either simultaneously or sequentially. Recent statistical inference has thus turned

c 2024 Iqraa Meah, Gilles Blanchard and Etienne Roquain.

License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v23/23-1025.html.

Iqraa Meah, Gilles Blanchard and Etienne Roquain

to designing methods that guard against false discoveries and selection eﬀect, see Cui et al. (2021); Robertson et al. (2023) for recent reviews on that topic. A key quantity is typically the false discovery proportion (FDP), that is, the proportion of false discoveries within the selection (Benjamini and Hochberg, 1995). Among classical methods, ﬁnding conﬁdence bounds on the FDP that are valid after a user data-driven selection ( post hoc FDP bounds), has retained attention since the seminal works of Genovese and Wasserman (2004, 2006); Goeman and Solari (2011). The strategy followed by these works is to build conﬁdence bounds valid uniformly over all selection subsets, which de facto provides a bound valid for any data-driven selection subset. A number of such FDP bounds have been proposed since, either based on a closed testing paradigm (Hemerik et al., 2019; Goeman et al., 2019, 2021; Vesely et al., 2023), a reference family (Blanchard et al., 2020; Durand et al., 2020), or a speciﬁc prior distribution in a Bayesian framework (Perrot-Dock es et al., 2023). It should also be noted that methods providing bounds valid uniformly over some particular selection subsets can also be used to provide bounds valid on any subsets by using an interpolation technique, see, e.g., Blanchard et al. (2020). This is the case for instance for bounds based upon an empirical distribution function conﬁdence band, as investigated by Meinshausen and B uhlmann (2005); Meinshausen (2006); Meinshausen and Rice (2006); D umbgen and Wellner (2023). Loosely, we will refer to such (potentially partial) FDP bounds as FDP conﬁdence envelopes in the sequel. Recently, ﬁnding FDP conﬁdence envelopes has been extended to diﬀerent contexts of interest in Katsevich and Ramdas (2020) (KR below for short), including knockoﬀs (Barber and Cand es, 2015; Cand es et al., 2018) and online multiple testing (Aharoni and Rosset, 2014). For this, their bounds are tailored on particular nested paths , and employ accurate martingale techniques. In addition, Li et al. (2024) have recently investigated speciﬁcally the case of the knockoﬀs setting by using a joint k-FWER error rate control (see also Genovese and Wasserman, 2006; Meinshausen, 2006; Blanchard et al., 2020), possibly in combination with closed testing.

1.2 New insight: m-consistency

The main thrust of this paper is to look at FDP conﬁdence envelopes from the angle of a particular property that we call m-consistency (m denoting the number of hypotheses under consideration). First, recall that the false discovery rate (FDR) is the expectation of the FDP, which is a type I error rate measure with increasing popularity since the seminal work of Benjamini and Hochberg (1995). Informally, an FDP conﬁdence envelope is m-consistent, in relation to a given reference FDR-controlling selection procedure, if the envelope value for the set output by that procedure converges to (or below) the corresponding nominal FDR value, at least asymptotically when the total number m of hypotheses tends to inﬁnity. This property is important for several reasons:

FDR controlling procedures output particular selection sets that are widely used in practice. Hence, it is very useful to provide an accurate FDP conﬁdence bound for these particular rejection sets. This is the case for instance for the commonly used Benjamini-Hochberg (BH) procedure at a level α for which the FDP bound should be close to α, at least in favorable cases;

Consistent FDP envelopes

a zoo of FDP conﬁdence envelopes have been proposed in previous literature, and we see m-consistency as a principled way to discard some of them while putting the emphasis on others;

searching for m-consistency can also lead to new bounds that are accurate for a moderate sample size.

It turns out that most of the existing FDP bounds, while being accurate in certain regimes, are not m-consistent with respect to classical FDR controlling procedures. In particular, this is the case for the Simes bound (Simes, 1986) and those of Katsevich and Ramdas (2020) with respect to the BH procedure, because of a an incompressible constant factor (larger than 1) in front of the FDP estimate. The present paper proposes to ﬁll this gap by introducing new envelopes that are m-consistent. In a nutshell, we replace the constant in front of the FDP estimate by a function that tends to 1 in an asymptotical regime under broad conditions including sparse asymptotical settings (i.e., the proportion of false null hypotheses tends to 0). We stress that this notion of m-consistency only concerns a vanishing gap between the FDP bound and the FDR nominal level of a reference procedure. It does not require that the individual tests are consistent in the usual sense, i.e. we do not assume that they reject a false null hypothesis with probability tending to 1; nor does it require that the reference procedure has full asymptotic power. Let us emphasize again that the envelopes developed in this work have coverage holding in a non-asymptotical sense. Here, m-consistency means that on top of this strong nonasymptotical guarantee, the FDP bound satisﬁes an additional sharpness condition in an asymptotical sense and for some scenarios of interest, including sparse ones.

1.3 Settings

Following Katsevich and Ramdas (2020), we consider the three following multiple testing settings for which a path means a (possibly random) nested sequence of candidate rejection sets:

Top-k: the classical multiple testing setting where the user tests a ﬁnite number m of null hypotheses and observes simultaneously a family of corresponding p-values. This is the framework of the seminal paper of Benjamini and Hochberg (1995) and of the majority of the follow-up papers. In that case, the path is composed of the hypotheses corresponding to the top-k most signiﬁcant p-values (i.e. ranked in increasing order), for varying k.

Pre-ordered: we observe p-values for a ﬁnite set of cardinal m of null hypotheses, which are a priori arranged according to some ordering. In that setting, the signal (if any) is primarily carried by the ordering: alternatives are expected to be more likely to have a small rank. Correspondingly the path in that case is obtained by p-value thresholding (for ﬁxed threshold) of the ﬁrst k hypotheses w.r.t. that order, for varying k. A typical instance is the knockoﬀs setting (Barber and Cand es, 2015; Cand es et al., 2018), where the null hypotheses come from a high-dimensional linear regression model and one wants to test whether each of the m variables is associated with the response. The ordering is data-dependent and comes from an ancillary

Iqraa Meah, Gilles Blanchard and Etienne Roquain

statistic independent of the tests themselves, so that one can argue conditionally and consider the ordering (and path) as ﬁxed.

Online: the null hypotheses come sequentially, and there is a corresponding potentially inﬁnite stream of p-values. An irrevocable decision (reject or not) has to be taken in turn for each new hypothesis, depending on past observations only. The path is naturally deﬁned according to the set of rejections until time t, for varying t. It is markedly diﬀerent from the pre-ordered setting because decisions are irrevocable and the set of nulls is not a pre-speciﬁed ﬁnite set.

Let us introduce notation that encompasses the three settings mentioned above: the set of hypotheses is denoted by H (potentially inﬁnite), the set of null hypotheses H0 is an unknown subset of H, and a path Π = (Rk, k 1) (with convention R0 = ) is an ordered sequence of nested subsets of H that depends only on the observations. A conﬁdence envelope is a sequence (FDPk, k 1) (with convention FDP0 = 0) of random variables valued in [0, 1], depending only on the observations, such that, for some pre-speciﬁed level δ, we have P k 1, FDP(Rk) FDPk 1 δ, (1)

where FDP(Rk) = |Rk H0|

|Rk| 1 is the FDP of the set Rk. In (1), the guarantee is non-asymptotic and uniform in k, which means that it corresponds to conﬁdence bounds valid uniformly over the subsets of the path. Also, the distribution P is relative to the p-value model, which will be speciﬁed further on and depends on the considered framework.

Remark 1 (Interpolation). Any FDP conﬁdence envelope of the type (1) also leads to a post hoc FDP bound valid uniformly for all R H: speciﬁcally, by using the interpolation method (see, e.g., Blanchard et al., 2020; Goeman et al., 2021; Li et al., 2024), if (1) holds then the same property also holds with the sharper bound (] FDPk, k 1) given by

] FDPk = mink k |Rk| |Rk | + |Rk |FDPk

|Rk| 1 FDPk, (2)

due to the fact that the number of false positives in Rk is always bounded by the number of false positives in Rk Rk plus the number of elements of Rk \ Rk .

Particular subsets of Π = (Rk, k 1) that are of interest are those controlling the FDR. Given a nominal level α, a reference procedure chooses a data-dependent ˆkα such that E FDP(Rˆkα) α. Depending on the setting, we consider diﬀerent reference procedures:

Top-k setting: the reference FDR controlling procedure is the Benjamini-Hochberg (BH) step-up procedure, see Benjamini and Hochberg (1995);

Pre-ordered setting: the reference procedure is the Lei-Fithian (LF) adaptive Selective sequential step-up procedure, see Lei and Fithian (2016) (itself being a generalization of the procedure of Li and Barber, 2017);

Online setting: the reference procedure is the (LORD) procedure, see Javanmard and Montanari (2018) and more precisely the improved version of Ramdas et al. (2017).

Consistent FDP envelopes

As announced, for all these procedures, the expectation of the FDP (that is, the FDR) is guaranteed to be below α. Additionally, in many scenarios considered as prototypical in the literature, it has been established that the FDP of a reference procedure concentrates around its expectation as the number of tested hypotheses m tends to inﬁnity, see, e.g., Genovese and Wasserman (2004); Neuvial (2008, 2013). In such a situation, it is thus natural to expect that a conﬁdence envelope on the FDP should asymptotically converge to (or below) the target level α when applied to a reference procedure. This motivates, in complement to the non-asymptotic control (1), the introduction of a notion of m-consistency of an FDP envelope in relation to a reference procedure and a sequence of models, as follows.

Deﬁnition 2 (m-consistency). Let δ, α (0, 1) be ﬁxed. For each m 1, let be

P(m) a multiple testing distribution model over the set of null hypotheses H(m) = {1, . . . , m} and a set of true null hypotheses H(m) 0 H(m) (denote H(m) 1 = H(m) \ H(m) 0 );

Π(m) = (R(m) k , k 1) a possibly random path of nested subsets of H(m);

(FDP (m) k , k 1) a conﬁdence envelope at level 1 δ over that path, i.e. satisfying (1);

ˆk(m) α a procedure choosing a rejection set from the path Π(m) with guaranteed FDR control at level α, i.e. satisfying E(m)h FDP R(m) ˆk(m) α

Then the conﬁdence envelope is said to be m-consistent with respect to the sequence (P(m), m 1) and to the FDR controlling procedure R(m) ˆk(m) α Π(m) at level α if for all ϵ > 0,

FDP (m) ˆk(m) α α ϵ = 0. (3)

When applying this deﬁnition with respect to the reference procedures described above (BH/LF/LORD), we will speak about BH/LF/LORD m-consistency for short. In addition, in the above deﬁnition, P(m) stands for a multiple testing model with m hypotheses that is to be speciﬁed. We will be interested in standard model sequences that represent relevant practical situations, and in particular sparse cases where only a vanishing proportion of null hypotheses are false when m tends to inﬁnity. The above deﬁnition applies transparently for the two ﬁrst considered settings (top-k and pre-ordered). Note that due to (1), we have

P α (0, 1), FDP(Rˆkα) FDPˆkα

Hence, (3) comes as an asymptotical accuracy guarantee in addition to the non-asymptotical coverage property (4). The uniformity in α in (4) allows for choosing α in a post hoc manner, while maintaining the false discovery control, that is, for any data-dependent choice of ˆα, FDP(Rˆkˆα) FDPˆkˆα with probability at least 1 δ. For m-consistency, a similar (but weaker) α-uniformity property can be obtained, see Remark 4 below. In the third setting (online), the deﬁnition applies in the following sense: (3) reads formally as lim m P( )

FDPm α ϵ = 0. (5)

Iqraa Meah, Gilles Blanchard and Etienne Roquain

Here, the distribution of the data P(m) does not depend on m and corresponds to P( ) the joint distribution of the countable sequence of observations. The path Π(m) = (R(m) k , k 1) = (Rk, k 1) does not depend on m and corresponds to the sequence of rejections along the time. Also, ˆk(m) α = m in that setting, that is, the bound is built for Rk for k = m. In contrast with the two ﬁrst settings, the procedure of interest is represented by the path itself rather than by some particular choice ˆk(m) α of k.

Remark 3. A notion of consistency for FDP bounds has been introduced by Goeman et al. (2019, Section 7). In our terminology their notion could be dubbed (n, m)-consistency, as both the number of hypotheses m and a parameter n modelizing signal-to-noise ratio (e.g., size of the underlying sample, or signal strength) grow to inﬁnity. The authors consider the following scenario:

the bound is considered on any set Sm such that, conditional to their index being in the set Sm, the p-values are independently drawn from a mixture

γUnif[0, 1] + (1 γ)Q1,n. (6)

Q1,n approaches a Dirac δ1 distribution as n . (Individual test consistency under the alternative)

|Sm| remains lower bounded by a linear function of m.

Under these assumptions the authors show that the Simes-based closed testing bound for Sm is consistent, in the sense that it converges asymptotically to γ. The above assumptions allow, in the non-sparse case, to take Sm = H(m) 1 and further conclude to the consistency of the bound for any rejection set of size growing linearly with m. The notion considered in the present work is of a diﬀerent nature, as at least one of the above assumptions is not met in the typical scenarios we will consider:

in the non-sparse setting, we do not require individual test consistency and consider ﬁxed signal strength.

in the sparse setting, we consider growing signal strength as m , which implies individual test consistency but in a way that is only sightly above the threshold of global signal detectability. In such a scenario H(m) 1 does not grow linearly with m and neither do the size of rejections sets |Sm| of interest.

we evaluate the bound on rejection sets from reference procedures that do not follow the posited mixture distribution (6). For example, conditional to being rejected by BH the p-values do not follow a Uniform/Alternative independent mixture distribution.

Thus, in the scenarios we will focus on, some ambiguity is asymptotically remaining and one cannot expect full asymptotic signal identiﬁability. In fact, we show (see Section 2.8) that in such a scenario, the Simes-based closed testing bound is not m-consistent in general.

Consistent FDP envelopes

1.4 Contributions

Our ﬁndings are as follows:

In each of the considered settings (top-k, pre-ordered, online), we provide new (nonasymptotical) FDP conﬁdence envelopes that are m-consistent under general conditions, including sparse conﬁgurations, see Proposition 10, Corollary 13 (top-k), Proposition 21, Corollary 24 (pre-ordered) and Proposition 30, Corollary 33 (online). Table 1 provides a summary of the considered procedures in the diﬀerent contexts, including the existing and new ones. It is worth noting that in the top-k setting, the envelope based on the DKW inequality (Massart, 1990) is consistent under moderate sparsity assumptions only, while the new envelope based on Wellner s inequality (Shorack and Wellner, 2009) covers the whole sparsity range (Corollary 13).

As a byproduct, our results provide (non-asymptotical) conﬁdence bounds on the FDP for standard FDR-controlling procedures, that are also asymptotically sharp (mconsistency) and for which a data-driven choice of the level α is allowed. In particular, in the top-k setting, this gives a new sharp conﬁdence bound for the achieved FDP of the BH procedure while tuning the level from the same data, see (19) below.

In the top-k setting, we also theoretically show that the Simes bound can be minconsistent, even after applying a closed-testing improvement (Goeman and Solari, 2011), see Section 2.8. Hence, closed-testing does not solve the m-inconsistency issue in itself. Also, we develop adaptive versions of our bounds (for which the proportion of null hypotheses is simultaneously estimated), which can be seen as an improvement stage that is less computationally demanding than closed-testing, while bringing already a large part of the latter s improvement when combined with the interpolation technique of Remark 1, see Section D.2.

In the pre-ordered setting, including the knockoﬀ case, we introduce new envelopes, called Freedman and KR-U , which are the two ﬁrst (provably) m-consistent conﬁdence bounds in that context to our knowledge. This is an important contribution since the knockoﬀmethod is one of the leading methodologies in the covariate testing literature of the last decade. In addition, KR-U is shown to behave suitably, even for moderate sample size, see Section 5.

In the online setting, the new envelopes, called Freedman and KR-U , provide an additional information on the LORD procedure via an FDP upper-envelope uniformly valid along time, and converging to the prescribed level α on the long run, see Figure 9.

Our study is based on dedicated tools of independent interest, based on uniform versions of classical deviation inequalities, see Corollary 6 (Wellner s inequality), Corollary 42 (Freedman s inequality). Both can be seen as a form of stitching together elementary inequalities, see Howard et al. (2021) for recent developments of this principle. In addition, the Freedman-type bounds in the pre-ordered and online settings are both based on a single martingale-type result, see Theorem 38.

Iqraa Meah, Gilles Blanchard and Etienne Roquain

Simes DKW KR Wellner (new) Freedman (new) KR-U (new) Top-k No Yes No Yes Pre-ordered No Yes Yes Online No Yes Yes

Table 1: m-consistency property (Yes or No) for diﬀerent envelopes, depending on the considered contexts. m-consistent means consistent at least in a particular (reasonable) conﬁguration and with respect to the corresponding reference procedure BH/LF/LORD. Blank means undeﬁned in that context.

Remark 4 (α-uniform m-consistency). In the top-k and pre-ordered setting, it is possible to show slightly more than the convergence (3), namely we can aim for

sup α [α0,1)

FDP (m) ˆk(m) α α o ϵ

for any ﬁxed α0 (0, 1). More precisely, we can easily show that this α-uniform mconsistency property can be obtained in Corollaries 13 and 24 below by monotonicity of the reference procedure. This allows to account for possible data snooping from the user, that is, the consistency property also holds for α = ˆα possibly depending on the data, provided that it is larger than some α0 > 0. However, for the online setting, such uniformity is out of reach since the full path itself already depends on α.

Remark 5 (FDP concentration and m-consistency). Given the FDR control and (3), in cases where the FDP of the reference procedure concentrates around its expectation as m grows (Genovese and Wasserman, 2004; Neuvial, 2008, 2013), we expect that bounds of the form α + m,α,δ with m,α,δ = o(1) should hold in the sense of Deﬁnition 2, and thus m-consistent bounds can be built. In such situations, using m-inconsistent bounds (such as the Simes bound in the top-k setting) is questionable.

2. Results in the top-k case

2.1 Top-k setting

We consider the classical multiple setting where we observe m independent p-values p1, . . . , pm, testing m null hypotheses H1, . . . , Hm. The set of true nulls is denoted by H0, which is of cardinal m0 and we denote π0 = m0/m (0, 1). We assume that the p-values are uniformly distributed under the null, that is, for all i H0, pi U(0, 1). We consider here the task of building a (1 δ)-conﬁdence envelope (1) for the top-k path Rk = {1 i m : pi p(k)}, k = 1, . . . , m. (7)

A rejection set of particular interest is the BH rejection set, given by Rˆkα where

ˆkα = max n k N : [ FDPk α o , [ FDPk = mpk/k, (8)

(with the convention R0 = ).

Consistent FDP envelopes

2.2 Existing envelopes

Let us ﬁrst review the prominent conﬁdence envelopes that have been considered in the literature. Let U1, . . . , Un be n 1 i.i.d. uniform random variables. For δ (0, 1), each of the following (uniform) inequalities holds with probability at least 1 δ:

Simes (1986) (or Robbins, 1954): for all t (0, 1), n 1 Pn i=1 1{Ui t} t/δ.

DKW (Massart, 1990): for all t (0, 1), n 1 Pn i=1 1{Ui t} t+ p

log(1/δ)/2n 1/2.

KR (Katsevich and Ramdas, 2020) (for δ 0.31), for all t (0, 1), n 1 Pn i=1 1{Ui t} log(1/δ) log(1+log(1/δ))(1/n + t).

Taking (U1, . . . , Un) = (pi, i H0), n = m0, and t = p(k) in the bounds above gives the following conﬁdence envelopes (in the sense of (1)) for the top-k path: for k {1, . . . , m},

FDP Simes k := 1 mp(k)

FDP DKW k := 1

0.5 log 1/δ

FDP KR k := 1 log(1/δ) log(1 + log(1/δ))

k + 1/k , (11)

the last inequality requiring in addition δ 0.31. Note that we can slightly improve these bounds by taking appropriate integer parts, but we will ignore this detail further on for the sake of simplicity.

2.3 New envelope

In addition to the above envelopes, this section presents a new one deduced from a new uniform variation of Wellner s inequality (recalled in Lemma 44). Let us ﬁrst deﬁne the function h(λ) = λ(log λ 1) + 1, λ > 1. (12)

Lemma 43 gathers some properties of h, including explicit accurate bounds for h and h 1.

Proposition 6 (Uniform version of Wellner s inequality). Let U1, . . . , Un be n 1 i.i.d. uniform random variables and κ = π2/6. For all δ (0, 1), we have with probability at least 1 δ,

t (0, 1), n 1 n X

i=1 1{Ui t} t h 1 log(κ/δ) + 2 log( log2(1/t) )

for g(t) = 2 log2(1/t) /(1 2 log2(1/t) ) t/2 and h( ) deﬁned by (12). In particular, with probability at least 1 δ,

t (0, 1), n 1 n X

i=1 1{Ui t} t h 1 2 log(κ/δ) + 4 log(1 + log2(1/t))

Iqraa Meah, Gilles Blanchard and Etienne Roquain

The proof of Proposition 6 is given in Section A.1. It immediately leads to the following result.

Theorem 7. In the top-k setting of Section 2.1, the following quantity is a (1 δ)-conﬁdence envelope in the sense of (1) for the top-k path:

FDP Well k := 1

k h 1 2 log(κ/δ) + 4 log 1 + log2(1/p(k))

with κ = π2/6.

Proof We use (14) for (U1, . . . , Un) = (pi, i H0), n = m0, and t = p(k). We conclude by using m0 m and the monotonicity property of Lemma 43.

Remark 8. Denoting by F n(t) the RHS of (14), we can easily check

sup t ((log log n)/n,1)

n F n(t) t p

t log(1 + log2(1/t))

with a constant possibly depending on δ. The iterated logarithm in the denominator is known from classical asymptotic theory (convergence to a Brownian bridge) to be unimprovable for a uniform bound in the vicinity of 0; in this sense the above is a ﬁnite law of the iterated logarithm (LIL) bound (Jamieson et al., 2014).

2.4 FDP conﬁdence bounds for BH and m-consistency

Applying the previous bounds for the particular BH rejection sets Rˆkα (see (8)) leads to the following result.

Corollary 9. In the top-k setting of Section 2.1, for any α, δ (0, 1), the following quantities are (1 δ)-conﬁdence bounds for FDP(Rˆkα), the FDP of the BH procedure at level α:

FDP Simes α := 1 n ˆkα 1 o (α/δ); (16)

FDP DKW α := 1 n ˆkα 1 o

0.5 log 1/δ

FDP KR α := 1 n ˆkα 1 o log(1/δ) log(1 + log(1/δ))

α + 1/(1 ˆkα) ; (18)

FDP Well α := 1 n ˆkα 1 o

2 log(κ/δ) + 4 log 1 + log2 m α(1 ˆkα)

where κ = π2/6, ˆkα denotes the number of rejections of the BH procedure (8) at level α, and where the KR bound requires in addition δ 0.31. Moreover, these bounds are also valid uniformly in α (0, 1), in the sense that

P( α (0, 1), FDP(Rˆkα) FDP Method α ) 1 δ, Method {Simes, DKW, KR, Well},

and thus also when using a post hoc choice α = ˆα of the level.

Consistent FDP envelopes

Proof For (19), we use (14) for (U1, . . . , Un) = (pi, i H0), n = m0, and t = α(1 ˆkα)/m.

Let us now consider the m-consistency property (3), by using BH as reference procedure. Among the four above bounds, it is apparent that Simes and KR are not BH m-consistent, because of the constant in front of α; namely, for all m,

FDP Simes α FDP KR α (1 (cα)) 1 n ˆkα 1 o ,

for some constant c > 1, which implies the BH m-inconsistency for a sequence of model such that ˆkα 1 with an asymptotically non-null probability. By contrast, FDP DKW α and FDP Well α are BH m-consistent in the following sense.

Proposition 10. Let us consider any model sequence P(m) in the top-k setting and denote by ˆkα the number of nulls rejected by the BH procedure at level α. Then the following envelopes are BH m-consistent in the sense of (3):

FDP DKW α if m1/2/ˆkα = o P (1);

FDP Well α if (log log m)/ˆkα = o P (1).

The latter result means that the BH procedure at level α should make enough rejections in order to provide m-consistency. In the two-group model of (Efron et al., 2001) with a ﬁxed proportion of alternatives in (0, 1) (that is, a dense case), under some assumptions on the alternative distribution, and assuming α above a critical value, Chi (2007) showed that ˆkα is asymptotically of the order of m and thus the DKW and Wellner bounds are both m-consistent. Another exemple, including sparse situations, is considered in the next section.

2.5 BH m-consistency in a prototypical model

Deﬁnition 11. The sparse one-sided Gaussian location model of parameter m, b, c, β, denoted as P(m) b,c,β, is deﬁned as follows: pi = Φ(Xi), 1 i m, the Xi s are independent, with Xi N(0, 1) for i H0 and Xi N(µm, 1) otherwise, for µm = 2β log m+b, b > 0, and m1 = |H1| = cm1 β , c (0, 1), β [0, 1).

Note that β = 0 is the dense case for which the alternative mean µm = b > 0 is a ﬁxed quantity, which means that the individual tests do not have full power1, even asymptotically w.r.t. m. By contrast, β > 0 corresponds to the sparse case, for which µm = 2β log m + b tends to inﬁnity. In both cases, the magnitude of alternative means is deﬁned to be on the edge of detectability where the BH procedure has some non-trivial power, see Bogdan et al. (2011); Neuvial and Roquain (2012); Abraham et al. (2021) for instance.

Theorem 12. Let α (0, 1). In the above one-sided Gaussian location model P(m) b,c,β, the

number of rejections ˆkα of the BH procedure is such that, as m grows to inﬁnity,

P(m) b,c,β(t m αˆkα/m t m) 1 2e dm1, for m1/m t m t m m1/m, (20)

1. This setting is in sharp contrast with Goeman et al. (2019, Section 7) which assumed asymptotical full power for the individual tests and the dense case, see also Remark 3.

Iqraa Meah, Gilles Blanchard and Etienne Roquain

for some constant d > 0 (depending on α, β, b), where t m (0, 1) is the unique solution of Gm(t) = 2t/α, t m (0, 1) is the unique solution of Gm(t) = 0.5t/α, and where Gm(t) = (m0/m)t + (m1/m)Φ(Φ 1(t) µm), with Φ(z) = P(Z z), z R.

Theorem 12 is proved in Section A.2. It implies ˆkα P(m) b,c,β m1 β, which leads to the

following result.

Proposition 13. Let us consider the sequence of sparse one-sided Gaussian location models (P(m) b,c,β, m 1) with ﬁxed parameters b R, c (0, 1) and a sparsity parameter β [0, 1), as deﬁned above. Then we have for all α (0, 1),

FDP DKW α α P(m) b,c,β m 1/2+β;

FDP Well α α P(m) b,c,β

log log(m) m 1/2+β/2,

where um P vm stands for um = OP (vm) and vm = OP (um). In particular, concerning the BH m-consistency (3) for the model sequence (P(m) b,c,β, m 1):

the DKW bound (10),(17) is m-consistent when β < 1/2 but fails to be for β 1/2;

the Wellner bound (15),(19) is m-consistent for any arbitrary β (0, 1).

Corollary 13 shows the superiority of the Wellner bound on the DKW bound for achieving the m-consistency property on a particular sparse sequence models: while the DKW bound needs a model dense enough (β < 1/2), the Wellner bound covers the whole sparsity range β (0, 1).

2.6 Adaptive envelopes

Let us consider the following upper-bounds for m0:

ˆm Simes 0 := m inf t (0,δ) Vt 1 t/δ; (21)

ˆm DKW 0 := m inf t (0,1)

C 4(1 t)2 + Vt 1 t

ˆm KR 0 := m inf t (0,1/C ) C + Vt 1 C t; (23)

ˆm Well 0 := m inf t (0,1)

t Ct 2(1 t)2 +

Ct 2(1 t)2 + Vt 1 t

where Vt = Pm i=1 1{pi > t}, C = log(1/δ)/2, C = log(1/δ) log(1+log(1/δ)), Ct = 2 log(κ/δ) + 4 log(1 + log2(1/t)), κ = π2/6. Since Vt/(1 t) corresponds to the so-called Storey estimator Storey (2002), these four estimators can all be seen as Storey-type conﬁdence bounds, each including a speciﬁc deviation term that takes into account the probability error δ. Note that ˆm DKW 0 was already proposed in Durand et al. (2020).

Consistent FDP envelopes

Proposition 14. In the top-k setting of Section 2.1, the envelopes deﬁned by (9), (10), (11) and (15) with m replaced by the corresponding bound ˆm Simes 0 (21), ˆm DKW 0 (22), ˆm KR 0 (23) or ˆm Well 0 (24), respectively, are also (1 δ)-conﬁdence envelopes in the sense of (1) for the top-k path.

We can easily check that these four adaptive envelopes all uniformly improve their own non-adaptive counterpart. The proof of Proposition 14 is provided in Section A.3.

Remark 15. In practice, the bounds ˆm Simes 0 (21), ˆm DKW 0 (22), ˆm KR 0 (23) or ˆm Well 0 (24) can be computed by taking an inﬁmum over t = p(k), 1 k m and by replacing Vt by m k.

Applying Proposition 14 for the BH procedure, this gives rise to the following adaptive conﬁdence bounds.

Corollary 16. In the top-k setting of Section 2.1, for any α, δ (0, 1), the following quantities are (1 δ)-conﬁdence bounds for the FDP of the BH procedure at level α:

FDP Simes-adapt α := 1 α( ˆm Simes 0 /m)/δ; (25)

FDP DKW-adapt α := 1 α( ˆm DKW 0 /m) + ( ˆm DKW 0 )1/2p

0.5 log 1/δ

FDP KR-adapt α := 1 log(1/δ) log(1 + log(1/δ))

α( ˆm KR 0 /m) + 1/(1 ˆkα) ; (27)

FDP Well-adapt α := 1

α( ˆm Well 0 /m) h 1

2 log(κ/δ) + 4 log 1 + log2 m α(1 ˆkα)

α(1 ˆkα) ˆm Well 0 /m

where κ = π2/6, ˆkα denotes the number of rejections of BH procedure (8) at level α, and where the KR-adapt bound requires in addition δ 0.31. Moreover, these bounds are also valid uniformly in α (0, 1) and thus also when using a post hoc choice α = ˆα of the level.

Proof For (28), we use (14) for (U1, . . . , Un) = (pi, i H0), n = m0, t = α(1 ˆkα)/m, and the fact that m0 ˆm Well 0 on the considered event by the proof in Section A.3. The other bounds are proved similarly.

Iqraa Meah, Gilles Blanchard and Etienne Roquain

2.7 Interpolated bounds

According to Remark 1, the coverage (1) is still valid after the interpolation operation given by (2). As a result, the above conﬁdence envelopes can be improved as follows:

] FDP Simes k := min k k{k k + k (mp(k )/δ)}/k; (29)

] FDP DKW k := min k k{k k + k (mp(k ) + m1/2p

0.5 log 1/δ)}/k; (30)

] FDP KR k := min k k

k k + k log(1/δ) log(1 + log(1/δ)) mp(k ) + 1 /k; (31)

] FDP Well k := min k k

mp(k ) h 1 2 log(κ/δ) + 4 log 1 + log2(1/p(k ))

respectively. When applied to BH rejection set, this also provides new conﬁdence bounds ] FDP Simes α , ] FDP DKW α , ] FDP KR α , ] FDP Well α , that can further be improved by replacing m by the corresponding estimator ˆm0.

2.8 Comparison to closed testing based on Simes local tests

Our bounds can be further improved by using a closed-testing approach (Goeman and Solari, 2011) (see Lemma 6 of Goeman et al., 2021 for an explicit formula). It is legitimate to ask if the m-inconsistency of the Simes bound is still true with this reﬁnement. The following result establishes that, as expected, the closed-testing version of Simes bound is still m-inconsistent. It uses the top-k setting of Section 2.1, for which we added random eﬀects for the true/false null hypotheses (two-group model of Efron et al., 2001).

Proposition 17. Consider an i.i.d. mixture model P(m) π0,G of m independent p-values with proportion of nulls π0 and marginal CDF independent of m given by

F(t) = π0t + (1 π0)G(t),

with G the cdf of an (alternative) distribution having continuous decreasing density g on [0, 1] (so that g(0) > 1). Then if α, δ are such that

0 < δ < 1 π0 + (1 π0)g(0) < α < 1,

the Simes-based closed testing bound2 at level δ is not BH(α)-m-consistent in the model P(m) π0,G, but FDP DKW α and FDP Well α are.

Since closed testing bounds are by essence more accurate than adaptive/interpolated ones, Proposition 17 also shows that the versions (25) and (29) of the Simes bound are not BH-m-consistent. Also, numerical experiments suggest that the improvement brought by closed testing is only modest when compared to the adaptive interpolated versions of our bounds (see Section D.2), which are less computationally demanding.

2. See Goeman and Solari (2011); Goeman et al. (2019) for a formal deﬁnition and useful formulations.

Consistent FDP envelopes

3. Results in the pre-ordered case

In this section, we build m-consistent envelopes in the case where the p-values are ordered a priori, which covers the famous knockoﬀ case.

3.1 Pre-ordered setting

Let π : {1, . . . , m} {1, . . . , m} be some ordering of the p-values that is considered as given and deterministic (possibly coming from independent data). The pre-ordered setting is formally the same as the one of Section 2.1, except that the p-value set is explored according to π(1), π(2), . . . , π(m). The rationale behind this is that the alternative null hypotheses H1 = {1, . . . , m}\H0 are implicitly expected to be more likely to have a small rank in the ordering π (although this condition is not needed for the future controlling results to hold). Formally, the considered path is

Rk = {π(i) : 1 i k, pπ(i) s}, k = 1, . . . , m, (33)

for some ﬁxed additional threshold s (0, 1] (possibly determined from independent data) and can serve to make a selection. The aim is still to ﬁnd envelopes (FDPk)k satisfying (1) for this path while being m-consistent. To set up properly the consistency, we should consider an FDR controlling procedure that is suitable in this setting. For this, we consider the Lei Fithian (LF) adaptive Selective sequential step-up procedure (Lei and Fithian, 2016). The latter is deﬁned by Rˆkα where

ˆkα = max n k {0, . . . , m} : [ FDPk α o , [ FDPk = s 1 λ 1 + Pk i=1 1 pπ(i) > λ

1 Pk i=1 1 pπ(i) s , (34)

where λ [0, 1) is an additional parameter. The knockoﬀ setting of Barber and Cand es (2015) can be seen as a particular case of this pre-ordered setting, where the p-values are independent and binary, the ordering is independent of the p-values and s = λ = 1/2. The LF procedure reduces in that case to the classical Barber and Cand es (BC) procedure.

3.2 New conﬁdence envelopes

The ﬁrst envelope is as follows.

Theorem 18. Consider the pre-ordered setting of Section 3.1 with s (0, 1]. For all δ (0, 1), λ [0, 1), the following is a (1 δ)-conﬁdence envelope for the ordered path (33) in the sense of (1):

FDP Freed k := 1

s 1 λ Pk i=1 1 pπ(i) > λ + (νk) Pk i=1 1 pπ(i) s , k 1, (35)

where (u) = 2 εu p

2εu, εu = log((1 + κ)/δ) + 2 log(1 + log2(u 1)), u > 0, κ = π2/6 and ν = s(1 + min(s, λ)/(1 λ)).

Iqraa Meah, Gilles Blanchard and Etienne Roquain

The proof of Theorem 18 is a direct consequence of a more general result (Theorem 38), itself being a consequence of a uniform version of Freedman s inequality (see Section B.2). The second result is based on the KR envelope (Katsevich and Ramdas, 2020):

FDP KR k := 1

a log(1 + 1 δB/a

a + s 1 λ Pk i=1 1 pπ(i) > λ

1 Pk i=1 1 pπ(i) s

where a > 0 is some parameter, B = s/(1 λ) and it is assumed λ s. While the default choice in KR is a = 1, we can build up a new envelope by taking a union bound over a N\{0}:

Theorem 19. Consider the pre-ordered setting of Section 3.1 with s (0, 1]. For all δ (0, 1) and λ [s, 1], the following is a (1 δ)-conﬁdence envelope for the ordered path (33) in the sense of (1):

FDP KR-U k := 1 min a N\{0}

a log(1 + 1 δB/a a B )

a + s 1 λ Pk i=1 1 pπ(i) > λ

1 Pk i=1 1 pπ(i) s

, k 1, (37)

for δa = δ/(κa2), a 1, for B = s/(1 λ), κ = π2/6.

The envelope (37) is less explicit than (35) but has a better behavior in practice, as we will see in the numerical experiments of Section 5.

3.3 Conﬁdence bounds for LF and m-consistency

Recall that the LF procedure (34) is the reference FDR-controlling procedure in this setting. Applying the above envelopes for the LF procedure gives the following conﬁdence bounds.

Corollary 20. In the pre-ordered setting of Section 3.1 with a selection threshold s (0, 1], for any α, δ (0, 1), λ [s, 1] the following quantities are (1 δ)-conﬁdence bounds for the FDP of the LF procedure with parameters s, λ at level α:

FDP KR α := 1

log(1 + 1 δB

B ) (α + 1/(1 ˆrα))

FDP Freed α := 1 α + (νˆkα)/(1 ˆrα) (39)

FDP KR-U α := 1 min 1 a 1 ˆrα

a log(1 + 1 δB/a a B ) (α + a/(1 ˆrα))

for ν = s(1 + s/(1 λ)), B = s/(1 λ), δa = δ/(κa2), a 1, κ = π2/6, ( ) deﬁned in

Theorem 18 and where ˆkα is as in (34) and ˆrα = Pˆkα i=1 1 pπ(i) s denotes the number of rejections of LF procedure at level α. In addition, these bounds are also valid uniformly in α (0, 1) in the sense that

P( α (0, 1), FDP(Rˆkα) FDP Method α ) 1 δ, for Method {KR, Freed, KR-U},

and thus also when using a post hoc choice α = ˆα of the level.

Consistent FDP envelopes

Proof This is direct by applying (36) (a = 1), (35) and (37) to the rejection set Rˆkα.

Let us now study the consistency property (3). It is apparent that KR is never LF m-consistent: namely, for all m 1,

FDP KR α 1 cα,

for some constant c > 1. By contrast, FDP Freed α is LF m-consistent if (νm)/ˆrα tends to 0 in probability, that is, (m log log m)1/2/ˆrα = o P (1). For FDP KR-U α , we always have

FDP KR-U α log(1/δˆa)

ˆa log(1 + 1 δB/ˆa ˆa B )

α + 1/(1 ˆrα)1/2

by considering ˆa = (1 ˆrα)1/2 . By Lemma 45, this provides consistency (3) as soon as 1/ˆrα = o P (1). This is summarized in the next result.

Proposition 21. Let us consider any model sequence P(m) in the pre-ordered setting and denote the rejection number of the LF procedure at level α by ˆrα. Then the following envelopes are LF m-consistent in the sense of (3):

FDP Freed α if (m log log m)1/2/ˆrα = o P (1);

FDP KR-U α if 1/ˆrα = o P (1).

The latter result means that the LF procedure at level α should make enough rejections in order to provide m-consistency. This is exempliﬁed in a particular model in the next section.

3.4 LF m-consistency in the generalized VCT model

We provide here a model example where conditions of Proposition 21 are satisﬁed. We consider the varying coeﬃcient two-groups (VCT) model of Lei and Fithian (2016), that we generalize to the possible sparse case. Here, without loss of generality we assume that the ordering π is identity, that is, π(i) = i for all i {1, . . . , m}. Below, with some abuse, the notation π will be re-used to stick with the notation of Lei and Fithian (2016).

Deﬁnition 22. Let m be a positive integer, β (sparsity parameter) a real in [0, 1), F0, F1 two c.d.f.s on [0, 1] with F0(t) t for all t [0, 1], and π : [0, ) [0, 1) some measurable function (instantaneous signal probability function) with π(0) > 0 and π(x) = π(1) for x 1. The generalized VCT model of parameters m, π, β, F0, F1, denoted as P(m) π,β,F0,F1, is the p-value mixture model where (pk, Hk) [0, 1] {0, 1}, 1 k m, are independent and generated as follows:

the variables Hk, 1 k m, are independent and P(Hk = 1) = πm(k/m), 1 k m, with πm(x) = π(mβx), x 0;

Iqraa Meah, Gilles Blanchard and Etienne Roquain

conditionally on H1, . . . , Hk, the p-values pk, 1 k m, are independent, with pk | {Hk = i} Fi, 1 k m, i {0, 1}.

We denote Π(t) := t 1 R t 0 π(s)ds, with Π(0) = π(0) and

Πm(t) := t 1 Z t

0 πm(s)ds = t 1 Z t

0 π(mβs)ds = m βt 1 Z mβt

0 π(s)ds = Π(mβt)

the expected fraction of signal before time mt. We also let π1 := Πm(1) = R 1 0 πm(s)ds = m βΠ(1) the overall expected fraction of signal. We consider the asymptotic where m tends to inﬁnity and F0, F1 are ﬁxed.

When β = 0, πm, Πm are ﬁxed and we recover the dense VTC model introduced in Lei and Fithian (2016) (also noting that we are slightly more general because F0 is possibly non-uniform and F1 not concave). The above formulation can also handle the sparse case for which β (0, 1) and the probability to generate a signal is shrunk to 0 by a factor mβ. For instance, if π(1) = 0, the model only generates null hypotheses and corresponding p-values pk+1, . . . , pm for k m1 β. We now analyze the asymptotic behavior of the number of rejections of the LF procedure. By following the same heuristic as in Lei and Fithian (2016) (which is justiﬁed by a concentration argument), we have from (34) that for k = mt ,

[ FDPk = s 1 λ 1 + Pk i=1 1{pi > λ}

1 Pk i=1 1{pi s}

Pk i=1(1 πm(i/m)) (1 F0(λ)) + Pk i=1 πm(i/m) (1 F1(λ)) Pk i=1(1 πm(i/m)) F0(s) + Pk i=1 πm(i/m) F1(s)

1 + Πm(t) 1 F1(λ)

1 + Πm(t) F1(s)

s 1 = FDP (mβt),

by assuming F0(s) = s, F0(λ) = λ, F1(s) > s, F1(λ) > λ and by letting

FDP (t) = 1 + Π(t) 1 F1(λ)

1 + Π(t) F1(s)

s 1 , t 0. (41)

By (34), the quantity ˆkα/m1 β should be asymptotically close to

t α = max{t [0, + ) : FDP (t) α}, (42)

with the convention t α = + if the set is not upper bounded. We should however ensure that the latter set is not empty. For this, we let

α = 1 + π(0) 1 F1(λ)

1 + π(0) F1(s)

Consistent FDP envelopes

Hence, ˆrα = Pˆkα i=1 1{pi s}, the number of rejections of the LF procedure, should be close

to Pˆkα i=1(1 πm(i/m)) F0(s) + Pˆkα i=1 πm(i/m) F1(s) ˆkαs m1 βt αs. This heuristic is formalized in the next result.

Theorem 23. Consider a generalized VCT model P(m) π,β,F0,F1 with parameters β, π, F0, F1 (see Deﬁnition 22) and the LF procedure with parameter s, λ (see (34)), with the assumptions:

(i) Π : t [0, ) R+ is continuous, decreasing, and L-Lipschitz;

(ii) F0(s) = s, F0(λ) = λ, F1(s) > s, F1(λ) > λ;

(iii) α > α where α is deﬁned by (43).

Let α = (α + α)/2 (α, α), t α (0, + ] given by (42), t m = t α mβ and let a 1 be an

integer a m1 βt m such that r = 4 a1/4 1 s + 1 1 λ is small enough to provide r (α α)/4.

Then the number of rejections ˆrα = Pˆkα i=1 1{pi s} of the LF procedure (34) is such that

P(m) π,β,F0,F1(ˆrα < r m) 2(2 + a1/2)e 2a1/2, r m = m1 βt m s/2. (44)

In particular, choosing a = 1 + (log m)2 , we have as m grows to inﬁnity, m1 β/ˆrα = OP (1).

Theorem 23 is proved in Section A.5. Condition (ii) is more general that in Lei and Fithian (2016) and allows to handle binary p-values, like in the knockoﬀs situation (for which F0 and F1 are not continuous). The condition (iii) was overlooked in Lei and Fithian (2016), but it is needed to ensure the existence of t α. It reads equivalently

1 λ + α F1(s)

which ensures that the probability to generate a false null hypothesis is suﬃciently large at the beginning of the p-value sequence, with a minimum amplitude function of F1(s) and F1(λ). Note that in the knockoﬀs case where s = λ = 1/2, we have α = 1 π(0)M

1+π(0)M , where M = 2F1(1/2) 1 > 0 can be interpreted as a margin . Hence, the critical level α is decreasing in π(0)M. Hence, the setting is more favorable either when π(0) increases, or when the margin M increases.

Corollary 24. Consider the sequence of generalized VCT models (P(m) π,β,F0,F1, m 1) deﬁned above. Assume that the parameters π, β, F0, F1 satisfy the assumptions of Theorem 23. Then the consistency (3) holds for the sequence (P(m) π,β,F0,F1, m 1) and for any LF procedure using λ s in either of the two following cases:

for the KR-U envelope (37) and the corresponding bound (40).

for the Freedman envelope (35) and the corresponding bound (39) if either λ = s or β < 1/2;

Iqraa Meah, Gilles Blanchard and Etienne Roquain

Proof This is a direct consequence of Theorem 23 because m1 β/ˆrα = OP (1) in that context and ˆrα is nondecreasing in α. To see why the Freedman envelope is consistent when λ = s, we note that in this case ˆkα = Pˆkα i=1 1 pπ(i) s + Pˆkα i=1 1 pπ(i) > λ (1 + αs/(1 λ))(1 ˆrα), hence the quantity (νˆkα)/(1 ˆrα) is o P (1) as 1/ˆrα = o P (1).

Remark 25. Similarly to Section 2.7 in the top-k setting, the bounds KR, Freedman and KR-U can be improved by performing the interpolation operation (2) in the pre-ordered setting.

4. Results in the online case

4.1 Online setting

We consider an inﬁnite stream of p-values p1, p2, . . . testing null hypotheses H1, H2, . . . , respectively. In the online setting, these p-values come one at a time and a decision should be made at each time immediately and irrevocably, possibly on the basis of past decisions. The decision at time k is to reject Hk if pk αk for some critical value αk only depending on the past decisions. An online procedure is thus deﬁned by a sequence of critical values A = (αk, k 1), that is predictable in the following sense

αk+1 Gk = σ(1{pi αi}, i k), k 1.

A classical assumption is that each null p-value is super-uniform conditionally on past decisions, that is, P(pk x | Gk) x, k H0, (46)

where H0 = {k 1 | Hk = 0}. Condition (46) is for instance satisﬁed if the p-values are all mutually independent and marginally super-uniform under the null. For a ﬁxed procedure A, we consider the path

Rk = {1 i k : pi αi}, k 1. (47)

We will also denote

R(k) = |Rk| =

i=1 1{pi αi}, k 1, (48)

the number of rejections before time k of the considered procedure. A typical procedure controlling the online FDR is the LORD procedure

αk = W0γk + (α W0)γk τ1 + α X

j 2 γk τj, (49)

where W0 [0, α], each τj is the ﬁrst time with j rejections, (γk)k is a non-negative ( spending ) sequence with P k 0 γk 1 and γk = 0 for k < 0. The latter has been extensively studied in the literature (Foster and Stine, 2008; Aharoni and Rosset, 2014; Javanmard and Montanari, 2018), and further improved by Ramdas et al. (2017). Under independence

Consistent FDP envelopes

of the p-values and super-uniformity of the p-values under the null, the LORD procedure controls the online FDR in the sense of

sup k 1 E[FDP(Rk)] α,

see Theorem 2 (b) in Ramdas et al. (2017). Here, we consider the diﬀerent (and somehow more demanding) task of ﬁnding a bound on the realized online FDP, by deriving conﬁdence envelopes (1). Note that this will be investigated for any online procedure and not only for LORD, see Section 4.2. Also, we will study the consistency of the envelope for any LORD-type procedure in Section 4.3.

4.2 New conﬁdence envelopes

The ﬁrst envelope is a consequence of the general result stated in Theorem 38.

Theorem 26. In the online setting described in Section 4.1, consider any online procedure A = (αk, k 1) and assume (46). Then for any δ (0, 1), the following is a (1 δ)- conﬁdence envelope for the path (47) in the sense of (1):

FDP Freed A,k := 1

Pk i=1 αi + Pk i=1 αi

1 R(k) , k 1, (50)

where R(k) is given by (48), (u) = 2 εu

2εu, εu = log((1+κ)/δ)+2 log(1 + log2(u 1)), u > 0 and κ = π2/6.

Proof We apply Theorem 38 in the online setting for λ = 0 (and further upper-bounding each term 1 pπ(i) > 0 by 1), π(k) = k, because (66) is satisﬁed by (46).

Next, the envelope of Katsevich and Ramdas (2020) is as follows

FDP KR A,k := 1

log(1/δ) a log(1 + log(1/δ)/a) a + Pk i=1 αi 1 R(k)

for some parameter a > 0 to choose. While the default choice in Katsevich and Ramdas (2020) is a = 1, applying a union w.r.t. a N\{0} provides the following result.

Theorem 27. In the online setting described in Section 4.1 such that (46) is satisﬁed, and for any online procedure A = (αk, k 1), for any δ (0, 1), the following is a (1 δ)- conﬁdence envelope for the path (47) in the sense of (1):

FDP KR-U A,k := 1 min a N\{0}

( log(1/δa) a log(1 + log(1/δa)/a) a + Pk i=1 αi 1 R(k)

, k 1, (52)

where R(k) is given by (48), δa = δ/(κa2), a 1, for κ = π2/6.

Remark 28. Note that in the online setting, the obtained guarantee (1) is not uniform in the procedure A (in contrast with the envelopes in top-k and preordered cases which were uniform in k and thus also in the cut-oﬀprocedure).

Iqraa Meah, Gilles Blanchard and Etienne Roquain

4.3 Conﬁdence envelope for LORD-type procedures and m-consistency

We now turn to the special case of online procedures satisfying the following condition:

i=1 αi α(1 R(k)), k 1. (53)

Classically, this condition is suﬃcient to control the online FDR (if the p-values are independent and under an additional monotonicity assumption), see Theorem 2 (b) in Ramdas et al. (2017). In particular, it is satisﬁed by LORD (49).

Corollary 29. In the online setting described in Section 4.1, consider any online procedure A = (αk, k 1), satisfying (53) for some α (0, 1), and assume (46). Then for any δ (0, 1), the following quantities are (1 δ)-conﬁdence bounds for the FDP of the procedure: for all k 1,

FDP KR α,k := 1 log(1/δ) log(1 + log(1/δ))(α + 1/(1 R(k)) ; (54)

FDP Freed α,k := 1 α + (α(1 R(k)))

, k 1; (55)

FDP KR-U α,k := 1 min a 1

log(1/δa) a log(1 + log(1/δa)/a)(α + a/(1 R(k)) , (56)

for δa = δ/(κa2), a 1, κ = π2/6, ( ) deﬁned in Theorem 27 and where R(k) is given by (48).

Proof This is direct by applying (51) (a = 1), (55) and (56) and by using the inequality (53) in the corresponding bound.

Let us now consider these bounds for the LORD procedure (49), and study the LORD m-consistency property for each envelope FDPα,k, k 1: for all ϵ > 0,

FDPα,k α ϵ = 0. (57)

where the asymptotics is when the time k tends to inﬁnity. Clearly, we have FDP KR α 1 (cα) for all k 1, where c > 1 is a constant. Hence, the KR envelope is not LORD m-consistent. By contrast, it is apparent that both the Freedman envelope and the uniform KR envelope are LORD m-consistent provided that 1/R(k) = o P (1) as k tends to inﬁnity (consider a = p

1 R(k) and use Lemma 45 for the KR-U envelope). This is summarized in the next result.

Proposition 30. Let us consider any online model P for which (46) is satisﬁed and the LORD procedure at level α which rejects R(k) nulls at time k, then the envelopes (FDP Freed α,k , k 1) and (FDP KR-U α,k , k 1) are LORD m-consistent in the sense of (57) provided that 1/R(k) = o P (1) as k tends to inﬁnity.

The latter result means that the LORD procedure at level α should make enough rejections in order m-consistency to be guaranteed. This condition is met in classical online models, as the next section shows.

Consistent FDP envelopes

4.4 LORD m-consistency in a vanilla online model

Deﬁnition 31. The online one-sided Gaussian mixture model of parameters π1, F1, denoted by Pπ1,F1, is given by the i.i.d. p-value stream (pk, Hk) [0, 1] {0, 1}, k 1, with

P(Hk = 1) = π1 for some ﬁxed π1 (0, 1);

p-values are uniform under the null: pk | Hk = 0 U(0, 1);

p-values have the same alternative distribution: pk|Hk = 1 F1, where F1 is the c.d.f. corresponding to the one-sided Gaussian problem, that is, F1(x) = Φ( Φ 1(x) µ), x [0, 1], for some µ > 0.

Here, we make no sparsity assumption: π1 is assumed to be constant across time. This will ensure that the online procedure maintains a chance to make discoveries even when the time grows to inﬁnity.

Theorem 32. Consider the one-sided Gaussian online mixture model and the LORD procedure with W0 (0, α) and a spending sequence γk = 1 k(log(k))γ , γ > 1. Then its rejection number R(k) at time k satisﬁes: for all a (0, 1), k 1,

P(R(k) < k1 a) ck a, (58)

where c is some constant only depending on α ,W0, γ, µ and π1. In particular, k1 a/R(k) = OP (1) when k tends to inﬁnity, for any a > 0.

Theorem 32 is proved in Section A.6.

Corollary 33. Consider the online one-sided Gaussian mixture model Pπ1,F1 deﬁned above and the LORD procedure with W0 (0, α) and a spending sequence γk = 1 k(log(k))γ , k 1 for γ > 1. Then both the Freedman envelope (55) and the uniform KR envelope (56) are consistent in the sense of (57) for the model Pπ1,F1.

Proof This is a direct consequence of Theorem 32, which provides that k1/2/R(k) = OPπ1,F1(1) when k tends to inﬁnity.

Remark 34. Similarly to Section 2.7 in the top-k setting, the bounds KR, Freedman and KR-U can be improved by performing the interpolation operation (2) in the online setting.

5. Numerical experiments

In this section, we illustrate our ﬁndings by conducting numerical experiments3 in each of the considered settings: top-k, pre-ordered and online. Throughout the experiments, the default value for δ is 0.25 and the default number of replications to evaluate each FDP bound is 1000.

Iqraa Meah, Gilles Blanchard and Etienne Roquain

α = 0.05 α = 0.1

α = 0.15 α = 0.2

Figure 1: Top-k dense case (π0 = 0.5, µ = 1.5).

Here, we consider the top-k setting of Section 2.1, for alternative p-values distributed as F1(x) = Φ(Φ 1(x) µ) (one-sided Gaussian location model), and for diﬀerent values of µ and of π0. To investigate the consistency property, we take m varying in the range {10i, 2 i 6}, and we consider the FDP bounds FDP Simes α (16), FDP DKW α (17), FDP KR α (18), FDP Well α (19) for α {0.05, 0.1, 0.15, 0.2}. We also add for comparison the hybrid bound

FDP Hybrid α,δ := min

FDP KR α,δ/2, FDP Well α,δ/2 ,

which also provides the correct coverage while being close to the best between the Wellner and KR bounds. Figure 1 displays boxplots of the diﬀerent FDP bounds in the dense case for which π0 = 1/2, µ = 1.5. When m gets large, we clearly see the inconsistency of the bounds Simes, KR and the consistency of the bounds Wellner, Hybrid, DKW, which corroborates the

3. All our numerical experiments are reproducible from the code provided in the repository https://github.com/iqm15/Consistent FDP.

Consistent FDP envelopes

Figure 2: Top-k sparse case π0 = 1 0.5m 0.25, µ = p

2 log(m) (left) π0 = 1 0.5m 0.55, µ = 10 (larger than p

2 log(106)) (right), α = 0.2.

theoretical ﬁndings (Corollary 13). In sparser scenarios, Figure 2 shows that the consistency is less obvious for the Wellner and Hybrid bounds and gets violated for the DKW bound when m1 m0.55, as predicted from Corollary 13 (regime β 1/2). Overall, the new bounds are expected to be better as the number of rejections gets larger and KR bounds remain better when the number of rejections is expected to be small. The hybrid bound hence might be a good compromise for a practical use. The adaptive versions of the bounds (Section 2.6) are displayed on Figure 3. By comparing the left and the right panels, we see that the uniform improvement can be signiﬁcant, especially for the Wellner and DKW bounds. By contrast, the improvement for KR is slightly worse. This can be explained from Figure 4, that evaluates the quality of the different π0 estimators. DKW, which is close to an optimized Storey-estimator, is the best, followed closely by the Wellner estimator.

Non adaptive Adaptive

Figure 3: Top-k dense case with nonadaptive bounds (left) and adaptive bounds (right) (π0 = 0.5, α = 0.2).

Iqraa Meah, Gilles Blanchard and Etienne Roquain

Figure 4: Boxplots of the estimators ˆπ0 in the top-k dense case (π0 = 0.5, α = 0.2).

Remark 35. For clarity, the bounds are displayed without the interpolation improvement (2) (for top-k and preordered). The ﬁgures are reproduced together with the interpolated bounds in Appendix D for completeness. In a nutshell, the interpolation operation improves signiﬁcantly the bounds mainly when they are not very sharp (typically small m or very sparse scenarios). Hence, while it can be useful in practice, interpolation does not seem particularly relevant to study the consistency phenomenon.

5.2 Pre-ordered

We consider data generated as in the pre-ordered model presented in Section 3.1 and more speciﬁcally as in the VCT model of Section 3.4. The trueness/falseness of null hypotheses are generated independently, and the probability of generating an alternative is decreasing with the position 1 k m, and is given by π(mβ 1k), where π : [0, ) [0, 1) is some function (see below) and β [0, 1) is the sparsity parameter. Once the process of true/false nulls is given, the p-values are generated according to either:

LF setting: π(t) = π1e bt b 1 e b , t 0, so that Π(1) = π1. Here π1 is equal to 0.4 and b, measuring the quality of the prior ordering, is equal to 2. In addition, the alternative p-values are one-sided Gaussian with µ = 1.5. Note that this is the setting considered in the numerical experiments of Lei and Fithian (2016).

Consistent FDP envelopes

α = 0.05 α = 0.1

α = 0.15 α = 0.2

Figure 5: Preordered dense (β = 0) LF setting with LF procedure (s = 0.1α, λ = 0.5).

Knockoﬀsetting: π(t) = 1/2 + (0 1/2( z t

z 1)), t 0, with z > 1 a parameter that determines how slowly the probability of observing signal deteriorates, taken equal to 30. Then, the binary p-values are as follows: under the null pi = 1/2 or 1 with equal probability. Under the alternative, pi = 1/2 with probability 0.9 and pi = 1 otherwise.

In both settings, the dense (resp. sparse) case refers to the sparsity parameter value β = 0 (resp. β = 0.25). We consider the bounds FDP KR α (38), FDP Freed α (39) and FDP KR-U α (40) for the LF procedure across diﬀerent values of (λ, s) {(1/2, 0.1α), (1/2, 1/2)}, m {10i, 2 i 6}, and α {0.05, 0.1, 0.15, 0.2}. The procedure LF with (λ, s) = (1/2, 1/2) is referred to as the Barber and Cand es (BC) procedure.

Figure 5 displays the boxplots of these FDP bounds for the LF procedure with (λ, s) = (1/2, 0.1α) in the LF setting with β = 0 (dense case). It is apparent that KR is not

Iqraa Meah, Gilles Blanchard and Etienne Roquain

α = 0.05 α = 0.1

α = 0.15 α = 0.2

Figure 6: Preordered sparse (β = 0.25) LF setting with LF procedure (s = 0.1α, λ = 0.5).

α = 0.15 α = 0.2

Figure 7: Pre-ordered dense (β = 0) knockoﬀsetting with BC procedure (i.e., LF procedure with s = λ = 0.5).

consistent, while the new bounds Freedman and KR-U are. Also, the bound KR-U is overall the best, losing almost nothing w.r.t. KR when the number of rejections is very small (say m = 100 and α = 0.05 or 0.1) and making a very signiﬁcant improvement otherwise.

Consistent FDP envelopes

α = 0.15 α = 0.2

Figure 8: Pre-ordered sparse (β = 0.25) knockoﬀsetting with BC procedure (i.e., LF procedure with s = λ = 0.5).

Similar conclusions hold for the case of BC procedure, see Figure 7. Next, to stick with a very common scenario, we also investigate the sparse situation where the fraction of signal is small in the data, see Figures 6 and 8. As expected, while the conclusion is qualitatively the same, the rejection number gets smaller so that the consistency is reached for largest values of m (i.e., the convergence is slowed down ).

We now consider the online case, by applying our method to the real data example coming from the International Mice Phenotyping Consortium (IMPC) (Mu noz-Fuentes et al., 2018), which is a consortium interested in the genotype eﬀect on the phenotype. This data is collected in an online fashion for each gene of interest and is classically used in online detection works (see Ramdas et al. (2017) and references therein). Figure 9 displays the FDP time-wise envelopes k 7 FDP KR α,k (54), k 7 FDP Freed α,k (55) and k 7 FDP KR-U α,k (56), for the LORD procedure (49) (W0 = α/2 with the spending sequence γk = k 1.6, k 1). As we can see, the Freedman and KR-U envelopes both tend to the nominal level α, as opposed to the KR envelope, which is well expected from the consistency theory. In addition, KR-U seems to outperform the Freedman envelope and while KR is (slightly) better than KR-U in the initial segment of the process (k < 300), we can see that KR-U gets rapidly more accurate.

5.4 Comparison to Li et al. (2024)

In this section, we compare the performances of the KR-U bound with respect to the recent bounds proposed in Li et al. (2024). For this, we reproduce the high dimensional Gaussian linear regression setting of Section 5.1 (a) therein, which generates binary pvalues by applying the ﬁxed-X sdp knockoﬀs and the signed maximum lambda knockoﬀ

Iqraa Meah, Gilles Blanchard and Etienne Roquain

Figure 9: Online FDP envelopes for LORD applied on IMPC data for four values of α {0.05, 0.1, 0.15, 0.2} (horizontal black bars). The interpolated bounds are displayed for each procedure as a gray dashed line.

statistic of Barber and Cand es (2015). Doing so, the p-values follow the preordered setting of Section 3.1 and thus our bounds are non-asymptotically valid (note however that the p-values do not follow strictly speaking the VCT model of Section 3.4). To be more speciﬁc, the considered Gaussian linear model Y N(Xβ, In) is obtained by ﬁrst generating X and β as follows: the correlated design matrix X of size n m is obtained by drawing n = 1500 i.i.d. samples from the multivariate m-dimensional distribution Nm(0, Σ) where Σi,j = 0.6|i j|, 1 i, j m; the signal vector β Rm is obtained by ﬁrst randomly sampling a subset of {1, . . . , m} of size π1m for the non-zero entries of β and then by setting all non-zero entries of β equal to a/ n for a given amplitude a > 0.

Consistent FDP envelopes

Figure 10: Comparing the envelope ] FDP KR-U k , k 1 given by (2)-(37) (s = λ = 0.5) to those of Li et al. (2024) in the Gaussian linear regression setting of Section 5.4 for m = 1000 (see text for more details).

First, in the spirit of Figure 3 in Li et al. (2024), we display in Figure 10 the envelope (] FDP KR-U k , k 1) given by the interpolation (2) of the envelope (FDP KR-U k , k 1) deﬁned by (37) (with s = λ = 1/2), and compare it to those obtained in Li et al. (2024) (namely, KJI A/B/C/D) for π1 {0.1, 0.5}, a {15, 25}. We also set here δ = 0.05 to stick with the choice of Li et al. (2024) (note that this requires to further calibrate the parameters of their method according to this value of δ) and the number of replications is here only taken equal to 10 for computational reasons. Markedly, the KR-U envelope becomes much better than KR and is competitive w.r.t. KJI A/B/C/D, at least when k is moderately large. As expected, the most favorable case for KR-U is when the signal has a large amplitude and is dense. Second, to stick with the consistency-oriented plots of the previous sections, we also display the corresponding FDP bounds for the BC procedure at level α {0.15, 0.2} in Figure 11. The conclusions are qualitatively similar.

Iqraa Meah, Gilles Blanchard and Etienne Roquain

Figure 11: Comparing the FDP bound ] FDP KR-U ˆkα for ˆkα the BC procedure (34) (s = λ = 0.5) to those of Li et al. (2024) with respect to α {0.15, 0.2} in the Gaussian linear regression setting of Section 5.4 for m = 1000 (see text for more details).

6. Conclusion

The main point of this paper is to provide another point of view on FDP conﬁdence bounds: we introduced a notion of m-consistency, a desirable asymptotical property which should act as a guiding principle when building such bounds, by ensuring that the bound is sharp enough on particular FDR controlling rejection sets. Doing so, some previous bounds were shown to be inconsistent, including the original KR bounds. While some other known FDP conﬁdence bounds, in particular based on the DKW inequality, are m-consistent under certain assumptions, we have introduced new ones shown to satisfy this condition under more general conditions (in particular high sparsity). New bounds based on the classical Wellner/Freedman inequalities showed interesting behaviors, however simple modiﬁcations

Consistent FDP envelopes

of KR bounds Hybrid/KR-U by stitching have been shown to be the most eﬃcient, both asymptotically and for moderate sample size. Overall, this work shows that m-consistency is a simple and fruitful criterion, and we believe that using it will be beneﬁcial in the future to make wise choices among the rapidly increasing literature on FDP bounds.

Acknowledgements

GB acknowledges support from: Agence Nationale de la Recherche (ANR), ANR-19-CHIA0021-01 (Bi SCott E), ANR-21-CE23-0035 (ASCAI), and IDEX REC-2019-044; Deutsche Forschungsgemeinschaft (DFG) - SFB1294 - 318763901. IM and ER have rbeen supported by ANR-21-CE23-0035 (ASCAI) and by the GDR ISIS through the projets exploratoires program (project TASTY). It is part of project DO 2463/1-1, funded by the Deutsche Forschungsgemeinschaft.

K. Abraham, I. Castillo, and E. Roquain. Sharp multiple testing boundary for sparse sequences. ar Xiv preprint ar Xiv:2109.13601, 2021.

E. Aharoni and S. Rosset. Generalized α-investing: deﬁnitions, optimality results and application to public databases. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 76(4):771 794, 2014.

R. F. Barber and E. J. Cand es. Controlling the false discovery rate via knockoﬀs. The Annals of Statistics, 43(5):2055 2085, 2015.

Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B, 57(1):289 300, 1995.

G. Blanchard, P. Neuvial, and E. Roquain. Post hoc conﬁdence bounds on false positives using reference families. The Annals of Statistics, 48(3):1281 1303, 2020.

M. Bogdan, A. Chakrabarti, F. Frommlet, and J. K. Ghosh. Asymptotic bayes-optimality under sparsity of some multiple testing procedures. The Annals of Statistics, 39(3):1551 1579, 2011.

E. Cand es, Y. Fan, L. Janson, and J. Lv. Panning for gold: model-x knockoﬀs for high dimensional controlled variable selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80(3):551 577, 2018.

Z. Chi. On the performance of FDR control: Constraints and a partial solution. The Annals of Statistics, 35(4):1409 1431, 2007.

X. Cui, T. Dickhaus, Y. Ding, and J. C. Hsu. Handbook of multiple comparisons. CRC Press, 2021.

Iqraa Meah, Gilles Blanchard and Etienne Roquain

L. D umbgen and J. A. Wellner. A new approach to tests and conﬁdence bands for distribution functions. The Annals of Statistics, 51(1):260 289, 2023.

G. Durand, G. Blanchard, P. Neuvial, and E. Roquain. Post hoc false positive control for structured hypotheses. Scandinavian journal of Statistics, 47(4):1114 1148, 2020.

B. Efron, R. Tibshirani, J. D. Storey, and V. Tusher. Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc., 96(456):1151 1160, 2001.

D. P. Foster and R. A. Stine. Alpha-investing: a procedure for sequential control of expected false discoveries. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(2):429 444, 2008.

D. A. Freedman. On tail probabilities for martingales. The Annals of Probability, pages 100 118, 1975.

C. Genovese and L. Wasserman. A stochastic process approach to false discovery control. The annals of statistics, 32(3):1035 1061, 2004.

C. R. Genovese and L. Wasserman. Exceedance control of the false discovery proportion. Journal of the American Statistical Association, 101(476):1408 1417, 2006.

J. J. Goeman and A. Solari. Multiple testing for exploratory research. Statistical Science, 26(4):584 597, 2011.

J. J. Goeman, R. J. Meijer, T. J. Krebs, and A. Solari. Simultaneous control of all false discovery proportions in large-scale multiple hypothesis testing. Biometrika, 106(4):841 856, 2019.

J. J. Goeman, J. Hemerik, and A. Solari. Only closed testing procedures are admissible for controlling false discovery proportions. The Annals of Statistics, 49(2):1218 1238, 2021.

J. Hemerik, A. Solari, and J. J. Goeman. Permutation-based simultaneous conﬁdence bounds for the false discovery proportion. Biometrika, 106(3):635 649, 2019.

S. R. Howard, A. Ramdas, J. Mc Auliﬀe, and J. Sekhon. Time-uniform, nonparametric, nonasymptotic conﬁdence sequences. The Annals of Statistics, 49(2):1055 1080, 2021.

K. Jamieson, M. Malloy, R. Nowak, and S. Bubeck. Lil UCB: An optimal exploration algorithm for multi-armed bandits. In Conference on Learning Theory, pages 423 439. PMLR, 2014.

A. Javanmard and A. Montanari. Online rules for control of false discovery rate and false discovery exceedance. The Annals of statistics, 46(2):526 554, 2018.

E. Katsevich and A. Ramdas. Simultaneous high-probability bounds on the false discovery proportion in structured, regression and online settings. The Annals of Statistics, 48(6): 3465 3487, 2020.

L. Lei and W. Fithian. Power of ordered hypothesis testing. In International conference on machine learning, pages 2924 2932. PMLR, 2016.

Consistent FDP envelopes

A. Li and R. F. Barber. Accumulation tests for fdr control in ordered hypothesis testing. Journal of the American Statistical Association, 112(518):837 849, 2017.

J. Li, M. H. Maathuis, and J. J. Goeman. Simultaneous false discovery proportion bounds via knockoﬀs and closed testing. Journal of the Royal Statistical Society Series B: Statistical Methodology, 2024.

P. Massart. The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. Ann. Probab., 18(3):1269 1283, 1990.

N. Meinshausen. False discovery control for multiple tests of association under general dependence. Scandinavian Journal of Statistics, 33(2):227 237, 2006.

N. Meinshausen and P. B uhlmann. Lower bounds for the number of false null hypotheses for multiple testing of associations under general dependence structures. Biometrika, 92 (4):893 907, 2005.

N. Meinshausen and J. Rice. Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. The Annals of Statistics, 34(1):373 393, 2006.

V. Mu noz-Fuentes, P. Cacheiro, T. F. Meehan, J. A. Aguilar-Pimentel, S. D. M. Brown, A. M. Flenniken, P. Flicek, A. Galli, H. H. Mashhadi, M. Hrabˇe De Angelis, J. K. Kim, K. C. K. Lloyd, C. Mc Kerlie, H. Morgan, S. A. Murray, L. M. J. Nutter, P. T. Reilly, J. R. Seavitt, J. K. Seong, M. Simon, H. Wardle-Jones, A.-M. Mallon, D. Smedley, and H. E. Parkinson. The International Mouse Phenotyping Consortium (IMPC): a functional catalogue of the mammalian genome that informs conservation the IMPC consortium. Conservation Genetics, 3(4):995 1005, 2018.

P. Neuvial. Asymptotic properties of false discovery rate controlling procedures under independence. Electron. J. Statist., 2:1065 1110, 2008.

P. Neuvial. Asymptotic results on adaptive false discovery rate controlling procedures based on kernel estimators. Journal of Machine Learning Research, 14:1423 1459, 2013.

P. Neuvial and E. Roquain. On false discovery rate thresholding for classiﬁcation under sparsity. The Annals of Statistics, 40(5):2572 2600, 2012.

M. Perrot-Dock es, G. Blanchard, P. Neuvial, and E. Roquain. Post hoc false discovery proportion inference under a hidden Markov model. TEST, (32):1365 1391, 2023.

A. Ramdas, F. Yang, M. J. Wainwright, and M. I. Jordan. Online control of the false discovery rate with decaying memory. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.

H. Robbins. A one-sided conﬁdence interval for an unknown distribution function. Annals of Mathematical Statistics, 25(2):409 409, 1954.

Iqraa Meah, Gilles Blanchard and Etienne Roquain

D. S. Robertson, J. M. Wason, and A. Ramdas. Online multiple hypothesis testing. Statistical science, 38(4):557, 2023.

G. R. Shorack and J. A. Wellner. Empirical processes with applications to statistics. SIAM, 2009.

R. J. Simes. An improved Bonferroni procedure for multiple tests of signiﬁcance. Biometrika, 73(3):751 754, 1986.

J. D. Storey. A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(3):479 498, 2002.

A. Vesely, L. Finos, and J. J. Goeman. Permutation-based true discovery guarantee by sum tests. Journal of the Royal Statistical Society Series B: Statistical Methodology, 85(3): 664 683, 2023.

Appendix A. Proofs

A.1 Proof of Proposition 6

For j 1, let δj = δj 2, τj = 2 j and

t [τj, 1], n 1 n X

i=1 1{pi t} t λj

λj = h 1 log(1/δj)

so that by Wellner s inequality, we have P(Aj) 1 δj and with a union bound P( j 1Aj) 1 δπ2/6. Now let t (0, 1) and j0 = min{j 1 : t τj} = min{j 1 : j log2(1/t)}, so that j0 = log2(1/t) 1. This yields

log(1/δj0) = log(1/δ) + 2 log( log2(1/t) ).

On the event j 1Aj, we have, since t [τj0, 1] by deﬁnition,

i=1 1{pi t} t λj0 = th 1 log(1/δj0) nτj0/(1 τj0)

= th 1 log(1/δ) + 2 log( log2(1/t) )

because τj0 = 2 log2(1/t) . The result then comes from replacing δ by δ6/π2.

A.2 Proof of Theorem 12

First let Fm(t) = Φ(Φ 1(t) µm), Ψm(t) = Fm(t)/t and observe that Ψm is continuous decreasing on (0, 1] with lim0 Ψm = + . This implies that t m, t m (0, 1) as described in the statement both exist, with

t m = Ψ 1 m (τm(α/2)), t m = Ψ 1 m (τm(2α)), τm(α) = m

Consistent FDP envelopes

We ﬁrst establish

t m m1/m (59)

t m m1/m. (60)

If β = 0, then m0/m = 1 c, m1/m = c, µm = b, τm > 0, Fm(t) = Φ(Φ 1(t) b), Ψm(t) = Φ(Φ 1(t) b)/t, τm(α) all do not depend on m. Hence, t m and t m are both constant, which establishes (59) and (60). Let us now turn to the sparse case, for which β (0, 1). The inequality (60) follows from the upper bound

0.5t m/α = Gm(t m) t m + m1/m.

For (59), the analysis is slightly more involved. We ﬁrst prove that for m large enough

Φ 1(t m) µm b. (61)

This will establish (59), since it implies Fm(t m) Fm(Φ(µm b)) = Φ( b) > 0 and also t m = (τm(α/2)) 1Fm(t m) m1/m. On the one hand,

Ψm(Φ(µm b)) = Φ( b) Φ(µm b) Φ( b) µm b

φ(µm b) = Φ( b)mβp

because µm b = 2β log m and φ(µm b) = m β, and by using Φ(x) φ(x)/x for all x > 0. On the other hand,

Ψm(t m) = τm(α/2) 2

Hence, for m large enough, we have Ψm(Φ(µm b)) Ψm(t m) = Ψm(Φ(Φ 1(t m))), which in turn implies (61). We now turn to prove the result (20) and follow for a classical concentration argument. Let

ˆGm(t) = m 1 m X

i=1 1{pi t}, t [0, 1],

so that Gm(t) = E ˆGm(t) for all t [0, 1]. Hence, for all t (0, 1),

P(αˆkα/m < t) P ˆGm(t) t/α

= P ˆGm(t) Gm(t) t/α Gm(t) ,

because αˆkα/m = max{t (0, 1) : ˆGm(t) t/α} by deﬁnition of ˆkα. Applying this with t = t m, this gives

P(αˆkα/m < t m) = P ˆGm(t m) Gm(t m) Gm(t m)

exp( cm Gm(t m)) exp( Cm1Fm(t m)),

Iqraa Meah, Gilles Blanchard and Etienne Roquain

for some constant C > 0, by applying Bernstein s inequality. Since Fm(t m) Φ( b) > 0, this gives P(αˆkα/m < t m) e dm1 for m large enough and some constant d > 0. Next, for all t [t m, 1), still applying Bernstein s inequality,

P(αˆkα/m > t)

k=1 1{αk/m > t}P ˆGm(αk/m) Gm(αk/m) k/m Gm(αk/m)

k=1 1{αk/m > t} exp m (k/m Gm(αk/m))2

Gm(αk/m) + (1/3)(k/m Gm(αk/m))

m exp Cmt m ,

because for all αk/m t m, k/m Gm(αk/m) Gm(αk/m) Gm(t m) = 0.5t m/α (given the monotonicity of t 7 Gm(t)/t). Applying this for t = t m (0, 1), we obtain

P(αˆkα/m > t m) e dm1,

because t m t m m1/m. This proves the result.

A.3 Proof of Proposition 14

Let us prove it for the adaptive uniform Wellner envelope (the other ones being either simpler or provable by using a similar argument). The idea is to prove that on an event where the (non-adaptive) Wellner envelope (15) is valid, we also have m0 ˆm Well 0 . The result is implied just by monotonicity (Lemma 43). For this, we come back to apply (14) with (U1, . . . , Un) = (pi, i H0), n = m0. Hence, on an event with probability at least 1 δ, we have for all t (0, 1),

i H0 1{pi t} t h 1 Ct

Ct/(2tm0) 2 ,

where we apply an upper bound coming from Lemma 43. This gives

Vt/m0 1 t 1 + p

Ct/(2tm0) 2 = 1 t p

2t Ct/m0 Ct/(2m0).

As a result, Vt m0(1 t) 2t Ctm0 Ct/2 and thus (1 t)m0 2t Ctm1/2 0 Ct/2 Vt 0, which gives

2t Ct + 4(1 t)(Ct/2 + Vt)

t Ct 2(1 t)2 +

Ct 2(1 t)2 + Vt 1 t

Since this is uniform in t, we can take the minimum over t, which gives the m0 conﬁdence bound ˆm Well 0 .

Consistent FDP envelopes

A.4 Proof of Proposition 17

Classically, the Simes-based closed testing bound on R = BH(α) is trivial (i.e. equal to |R|) if the global intersection hypothesis [m] = {1, . . . , m} is not rejected by the local Simes test, that is, if min1 k m(p(k)/k) > δ/m (Goeman and Solari, 2011). By the assumptions on G (G is concave, G(0) = 0 and G has derivative g(0) at 0), we have G(t) min(1, g(0)t) for all t [0, 1] and thus F(t) min(1, (π0 + (1 π0)g(0))t) for all t [0, 1]. Hence, the p-values are stochastically lower bounded by a Unif[0, γ] variable, with γ = (π0 + (1 π0)g(0)) 1. Let us denote Pγ the joint probability of i.i.d. Unif[0, γ] p-values. We therefore have

P min k (p(k)/k) δ/m Pγ

min k (p(k)/k) δ/m

min k (γ 1p(k)/k) δ/(γm)

where the last equality is from Simes (1986), since under Pγ the rescaled p-values γ 1pk are i.i.d. Unif[0, 1]. Thus, as soon as δ < γ, the Simes-based closed testing bound is trivial with probability bounded away from 0 for all m, and thus cannot be m-consistent.

On the other hand, if α > γ, then there is a non-zero solution t to the equation F(t) = t/α (due to strict concavity of F, this solution is unique). It is well-known (see Chi, 2007) that asymptotically as m , the BH(α) rejection threshold will tend to t in probability. Therefore, bk BH α grows to inﬁnity at a rate of order m in probability (in the sense bk BH α P(m) π0,G m by using the notation of Proposition 13), which by Proposition 10 implies

the BH(α)-consistency of FDP DKW α and FDP Well α .

A.5 Proof of Theorem 23

First note that FDP (t) is an decreasing function of Π(t) because 1 F1(λ)

1 λ < 1 < F1(s)

s , see (41). Since Π(t) is decreasing from π(0) to π(1) = Π(+ ), we have that FDP : [0, + ) [α, α] is continuous increasing, where α = 1 + π(1) 1 F1(λ)

1 λ 1 )/ 1 + π(1) F1(s)

Hence, if α < α, we have 0 < t α < + , t m = t α for m large enough, and thus FDP (t m) = α . If α α, t α = + , t m = mβ and FDP (t m) α . Both cases are considered in what follows. Consider the events

i=1 1{pi > λ} k 1 k X

i=1 P(pi > λ)

i=1 1{pi s} k 1 k X

i=1 P(pi s)

Iqraa Meah, Gilles Blanchard and Etienne Roquain

By Lemma 37, the event Ω1 Ω2 occurs with probability larger than 1 2(2 + a1/2)e 2a1/2. Let

e1 = 1 + Πm(m βt m) 1 F1(λ)

1 λ 1 = 1 + Π(t m) 1 F1(λ)

e2 = 1 + Πm(m βt m) F1(s)

s 1 = 1 + Π(t m) F1(s)

be the numerator and denominator of FDP (t α ), so that e1/e2 = FDP (t m) α . Let k0 = m1 βt m m. Provided that k0 a, we have k 1 0

i=1 P(pi > λ) (1 λ)e1

i=1 πm(i/m) Πm(m βt m)

|(1 F1(λ)) (1 λ)|

i=1 πm(i/m) Πm(k0/m)

+ Πm(k0/m) Πm(m βt m)

1/a + L/m1 β,

by applying Lemma 36 and using that Π( ) is L-Lipschitz. Similarly, k 1 0

i=1 P(pi s) se2

1/a + L/m1 β.

We deduce that on Ω1 Ω2 and when k0 a, we have

[ FDPk0 e1 + 1 a(1 λ) + L m1 β(1 λ) + 1 k0(1 λ) + 1 a1/4(1 λ)

as L m1 βs 1 k0s 1 a1/4s

provided that e2 2r, because e1 1, e2 1, and by considering r as in the statement. Since e1/e2 α α 4r and e2 1 2r by assumption, we have [ FDPk0 α and

thus ˆkα k0 on Ω1 Ω2. The result is proved by noting that ˆrα = Pˆkα i=1 1{pi s} Pk0 i=1 1{pi s} (e2 r)k0s k0s/2 on this event.

Lemma 36. In the setting of Theorem 23, we have for all a 1, m a,

i=1 πm(i/m) Πm(k/m)

Proof First note that because πm is nonnegative continuous decreasing, we have for all k 1,

i=1 πm(i/m) Πm(k/m) = (m/k) Z k/m

0 πm(s)ds (1/k)

i=0 πm(i/m).

Since πm(0) 1, the result is clear.

This following lemma is similar to Lemma 1 in Lei and Fithian (2016).

Consistent FDP envelopes

Lemma 37. Let Xi B(pi), 1 i m, be independent Bernoulli variables for pi [0, 1], 1 i m. Then we have for all a 1 and m a,

(2 + a1/2)e 2a1/2. (63)

Proof By Hoeﬀding s inequality, we have for all x > 0,

i=1 (Hi πm(i/m))

k a e 2kx2 = 2 1 e 2x2 e 2ax2 (2 + 1/x2)e 2ax2.

We deduce the result by considering x = 1/a1/4.

A.6 Proof of Theorem 32

We get inspiration from the power analysis of Javanmard and Montanari (2018). Let c = min(α W0, W0). By deﬁnition (49), the LORD procedure makes (point-wise) more rejections than the procedure given by the critical values

αT = c max{γT τj, j 0}, (64)

where, for any j 1, τj is the ﬁrst time that the procedure makes j rejections, that is,

τj = min{t 0 : R(t) j} (τj = + if the set is empty), (65)

(note that τ0 = 0) for R(T) = PT t=1 1{pt αt}. Let j = τj τj 1 the time between the j-th rejection and the (j 1)-th rejection. It is clear that (R(t))t 1 is a renewal process with holding times ( j)j 1 and jump times (τj)j 1. In particular, the j s are i.i.d. As a result, we have for all r, k 1,

P(R(k) < r) P(τr k) = P( 1 + + r k) r E 1/k,

m 1 P( 1 m) = X

ℓ=1 (1 G(cγℓ)) X

m 1 e m G(cγm).

In addition, since G is concave,

x g (x) = π0 + π1c eµ Φ 1(x) ec

2 log(1/x) (log(1/x))γ+2,

for x small enough and c, c > 0 some constants. This gives for large m M, e m G(cγm) e cmγm(log(1/(cγm)))2+γ e 2 log m, for some M > 0, by the choice made for γm. As a result,

m M e m G(cγm) C+ X

m M e cmγm(log(1/(cγm)))γ+2 C+ X

m 1 e 2 log m = C+π2/6,

for some constant C > 0. This gives

P(R(k) < r) r(C + π2/6)/k.

and taking r = k1 a gives (58).

Iqraa Meah, Gilles Blanchard and Etienne Roquain

Appendix B. Tools of independent interest

B.1 A general envelope for a sequence of tests

An important basis for our work is the following theorem, which has the ﬂavor of Lemma 1 of Katsevich and Ramdas (2020), but based on a diﬀerent martingale inequality, derived from a Freedman type bound (see Section B.2). Also, while the pre-ordered and online settings are diﬀerent, this result can be applied to both settings.

Theorem 38. Consider a potentially inﬁnite set of null hypotheses H1, H2, . . . for the distribution P of an observation X, with associated p-values p1, p2, . . . (based on X). Consider an ordering π(1), π(2), . . . (potentially depending on X) and a set of critical values α1, α2, . . . (potentially depending on X). Let λ [0, 1) be a parameter and assume that there exists a ﬁltration

Fk = σ (π(i))1 i k, (1 pπ(i) αi )1 i k, (1 pπ(i) > λ )1 i k , k 1,

such that for all k 2,

P(pπ(k) t | Fk 1, Hπ(k) = 0) t for all t [0, 1]. (66)

Then, for any δ (0, 1), with probability at least 1 δ, it holds

i=1 (1 Hπ(i))1 pπ(i) αi V k,

i=1 (1 Hπ(i))1 pπ(i) > λ αi

i=1 (1 Hπ(i))νi

where (u) = 2 εu

2εu, εu = log((1 + κ)/δ) + 2 log(1 + log2(u 1)), u > 0, κ = π2/6. and νi = αi(1 + min(αi, λ)/(1 λ)), for i 1.

Proof By Lemma 39, we can apply Corollary 42 (it self coming from Freedman s inequality) with

ξi = (1 Hπ(i))

1 pπ(i) αi Fi(αi)1 pπ(i) > λ

where Fi(αi) and Fi(λ) are deﬁned by (69). First note that ξi 1 =: B almost surely. Let us now prove E(ξ2 i | Fi 1) (1 Hπ(i))νi. (68)

Indeed, assuming ﬁrst αi λ, we have by (66),

E(ξ2 i | Fi 1) = (1 Hπ(i))

E(1 pπ(i) αi | Fi 1) + (Fi(αi))2 E(1 pπ(i) > λ | Fi 1) (1 Fi(λ))2

(1 Hπ(i))(αi + α2 i /(1 λ)) = (1 Hπ(i))νi.

Consistent FDP envelopes

which gives (68). Now, if αi > λ, still by (66),

E(ξ2 i | Fi 1) = (1 Hπ(i))

E(1 pπ(i) αi | Fi 1) + (Fi(αi))2 E(1 pπ(i) > λ | Fi 1) (1 Fi(λ))2

1 Fi(λ)E(1 λ < pπ(i) αi | Fi 1)

= (1 Hπ(i)) Fi(αi) + (Fi(αi))2/(1 Fi(λ)) 2Fi(αi)(Fi(αi) Fi(λ))/(1 Fi(λ))

= (1 Hπ(i))Fi(αi) 1 + (2Fi(λ) Fi(αi))/(1 Fi(λ))

(1 Hπ(i))Fi(αi) 1 + Fi(λ)/(1 Fi(λ)) (1 Hπ(i))νi,

which implies (68) also in that case. Finally, (68) is established, which yields

k 1, Sk 2 p

i=1 (1 Hπ(i))νi + 4εk(δ)

and thus (67).

Lemma 39. In the setting of Theorem 38, let

Fk(αk) = P(pπ(k) αk | Fk 1, Hπ(k) = 0), Fk(λ) = P(pπ(k) λ | Fk 1, Hπ(k) = 0) (69)

the process (Sk)k 1 deﬁned by

i=1 (1 Hπ(i))

1 pπ(i) αi Fi(αi)1 pπ(i) > λ

is a martingale with respect to the ﬁltration (Fk)k 1.

Proof First, Sk is clearly Fk measurable. Second, we have for all k 2,

E(Sk | Fk 1) = E

Sk 1 + (1 Hπ(k))

1 pπ(k) αk Fk(αk)1 pπ(k) > λ

= Sk 1 + (1 Hπ(k))(Fk(αk) Fk(αk)) = Sk 1.

B.2 Uniform-Empirical version of Freedman s inequality

We establish a time-uniform, empirical Bernstein-style conﬁdence bound for bounded martingales. Various related inequalities have appeared in the literature, in particular in the online learning community. The idea is based on stitching together time-uniform bounds that are accurate on diﬀerent segments of (intrinsic) time. The use of the stitching principle has been further pushed and developed into many reﬁnements by Howard et al. (2021),

Iqraa Meah, Gilles Blanchard and Etienne Roquain

who also propose a uniform empirical Bernstein bound as a byproduct. The version given here, based on a direct stitching of Freedman s inequality, has the advantage of being selfcontained with an elementary proof (though the numerical constants may be marginally worse than Howard et al. s). We ﬁrst recall Freedman s inequality in its original version (Freedman, 1975). Let (ξi, Fi)i 1 be a supermartingale diﬀerence sequence, i.e. E[ξi|Fi 1] 0 for all i. Deﬁne Sn := Pn i=1 ξi (then (Sn, Fn) is a supermartingale), and Vn := Pn i=1 Var[ξi|Fi 1].

Theorem 40 (Freedman s inequality; Freedman, 1975, Theorem 4.1). Assume ξi 1 for all i 1. Then for all t, v > 0:

P[Sn t and Vn v for some n 1] exp( ϕ(v, t)), (70)

ϕ(v, t) := (v + t) log 1 + t

We establish the following corollary (deferring the proof for now):

Corollary 41. Assume ξi 1 for all i 1. Then for all δ (0, 1) and v > 0:

2v log δ 1 + log δ 1

2 and Vn v for some n 1 δ. (72)

Following the stitching principle applied to the above we obtain the following.

Corollary 42. Assume ξi B for all i 1, where B is a constant. Put e Vk := (Vk B2) and κ = π2/6. Then for all δ (0, 1/(1 + κ)), with probability at least 1 (1 + κ)δ it holds

k 1 : Sk 2 q

e Vkε(δ, k) + 1

where ε(δ, k) := log δ 1 + 2 log(1 + log2(e Vk/B2)).

Proof Denote v2 j := 2j B2, δj := (j 1) 2δ, j 0, and deﬁne the nondecreasing sequence of stopping times τ 1 = 1 and τj := min k 1 : Vk > v2 j for j 0. Deﬁne the events for j 0:

Aj := k 1 : Sk q

2v2 j log δ 1 j + 1

2B log δ 1 j and Vk v2 j

A j := k with τj 1 k < τj : Sk 2 q

e Vkε(δ, k) + 1

2Bε(δ, k) .

From the deﬁnition of v2 j , δj, we have j = log2(v2 j /B2) for j 1. For j 1, τj 1 k < τj implies e Vk = Vk, v2 j 1 = v2 j /2 < e Vk v2 j , and further

log δ 1 j = log δ 1 + 2 log log2(v2 j /B2) ε(δ, k).

Consistent FDP envelopes

Therefore it holds A j Aj. Furthermore, for j = 0, we have v2 0 = B2, δ0 = δ. Further, if k < τ0 it implies Vk < B2 and therefore e Vk = B2, thus ε(δ, k) = log δ 1. Hence

A 0 k with k < τ0 : Sk 2 q

B2 log δ 1 0 + 1

2B log δ 1 0

2v2 0 log δ 1 0 + 1

2B log δ 1 0 and Vk v2 0

Therefore, since by (72) it holds P[Aj] δj for all j 0:

P h k n : Sk 2 p

Vkε(δ, k) + Bε(δ, k) i = P

j 0 (j 1) 2 3δ.

Proof [Proof of Corollary 41] It can be easily checked that ϕ(v, t) is increasing in t (for v, t > 0). Thus Sn t ϕ(p, (Sn)+) ϕ(p, t). Since ϕ(v, 0) = 0, and limt ϕ(v, t) = , it follows that for any δ (0, 1], there exists a unique real t(v, δ) such that ϕ(v, t(v, δ)) = log δ. It follows that (70) is equivalent to:

v > 0, δ (0, 1] : P[Av,δ] δ, (73)

where Av,δ := {ϕ(v, (Sn)+) log δ and Tn v for some n 1}.

Observe that ϕ(v, t) = vh v+t

v , where h is the function deﬁned by (12). Since h(λ) 2(

λ 1)2 from Lemma 43, we deduce ϕ(v, t) 2( v + t v)2 thus, whenever ϕ(v, (Sn)+) log δ, we have: p

v + (Sn)+ v +

taking squares on both sides entails

2v log δ 1 + log δ 1

proving (72).

Appendix C. Auxiliary results

Lemma 43. The function h deﬁned by (12) is increasing strictly convex from (1, ) to (0, ), while h 1 is increasing strictly concave from (0, ) to (1, ). The functions h and h 1 satisfy the following upper/lower bounds:

λ 1)2 h(λ) (λ 1)2/2, λ > 1

2y h 1(y) (1 + p

y/2)2, y > 0

Iqraa Meah, Gilles Blanchard and Etienne Roquain

In particular, h 1(y) 1 2y+O(y) as y 0. In addition, for any c > 0, x (1, + ) 7 xh 1(c/x) is increasing.

Proof Clearly, h = log, which is positive and increasing on (1, ). This gives the desired property for h and h 1. Next, the bounds can be easily obtained by studying the functions λ 7 (λ 1)2/2 h(λ) and λ 7 h(λ) 2(

λ 1)2. For the last statement, since h 1 is strictly concave and h 1(0) = 1, we have that y (0, ) 7 (h 1(y) 1)/y is decreasing. Since y (0, ) 7 1/y is also decreasing, this gives that y (0, ) 7 h 1(y)/y is decreasing. This gives the last statement.

1.00 1.05 1.10 1.15 1.20 1.25 1.30

0.00 0.01 0.02 0.03 0.04 0.05

0.00 0.01 0.02 0.03 0.04 0.05

1.00 1.10 1.20 1.30

Figure 12: Displaying h (left) and h 1 (right). Bounds of Lemma 43 are displayed in blue.

Lemma 44 (Wellner s inequality, Inequality 2, page 415, with the improvement of Exercise 3 page 418 of Shorack and Wellner, 2009). Let U1, . . . , Un be n 1 i.i.d. uniform random variables. For all λ 1, a [0, 1), we have

t [a, 1] : n 1 n X

i=1 1{Ui t}/t λ

e nah(λ)/(1 a),

for h( ) deﬁned by (12).

Lemma 45. The KR constants in (36) and (51) satisfy, as a ,

a log(1 + 1 δB/a a B ) = 1 + O log(a)

log(1/δa) a log(1 + log(1/δa)/a) = 1 + O log(a)

where δa = cδ/a, c = π2/6 and the O( ) depends only on the constants δ > 0 and B > 0.

Consistent FDP envelopes

α = 0.05 α = 0.1

α = 0.15 α = 0.2

Figure 13: Figure 1 where we have superposed in each case the (median of the) interpolated bounds (star symbols). Top-k dense case (π0 = 0.5, µ = 1.5).

Appendix D. Additional experiments

D.1 Interpolated bounds

We reproduce here the ﬁgures of the numerical experiments in the top-k and preordered settings, by adding the interpolated bounds. On each graph, the median of the generated interpolated bound is marked by a star symbol, which is given in addition to the former boxplot (of the non-interpolated bound). By doing so, we can evaluate the gain brought by the interpolation operation in each case. Note that the interpolated bound is not computed for m 105 for computational cost reasons.

Iqraa Meah, Gilles Blanchard and Etienne Roquain

Figure 14: Figure 2 where we have superposed in each case the (median of the) interpolated bounds (star symbols). Top-k sparse case π0 = 1 0.5m 0.25, µ = 2 log m (left) π0 = 1 0.5m 0.55, µ =10 (right), α = 0.2.

Non adaptive Adaptive

Figure 15: Figure 3 where we have superposed in each case the (median of the) interpolated bounds (star symbols). Top-k dense case with nonadaptive bounds (left) and adaptive bounds (right) (π0 = 0.5, α = 0.2).

D.2 Closed testing bounds

Let us consider the top-k setting with m null hypotheses, and consider any nonnegative sequence ℓi,k [0, 1], 1 i k, 1 k m. Let ℓi,k = ℓi,i for i > k 1 and ℓi,0 = 1. Assume that ℓi,k ℓi,k for 1 k k m for all i 1. It includes the following cases:

Simes: ℓi,k = δi/k;

KR: ℓi,k = log(1+log(1/δ))

log(1/δ) i/k 1/k, (for δ 0.31);

DKW: ℓi,k = i/k p

log(1/δ)/2 k 1/2.

Consistent FDP envelopes

α = 0.05 α = 0.1

α = 0.15 α = 0.2

Figure 16: Figure 5 where we have superposed in each case the (median of the) interpolated bounds (star symbols). Preordered dense (β = 0) LF setting with LF procedure (s = 0.1α, λ = 0.5).

Theorem 46 (Lemma 6 in Goeman et al. (2021)). In the top-k setting, consider any sequence (ℓi,k)i,k as above and assume that for all S H0, P( i {1, . . . , |S|} : p(i:S) ℓi,|S|) δ. Then the closed-testing FDP envelope

FDPk = min 1 k k

1 j k 1 p(j) > ℓk , ˆm0 + k 1 o /k; (74)

ˆm0 = max{0 j m : for all i {1, . . . , j}, p(m j+i) > ℓi,j} (75)

is valid in the sense of (1).

The form of the closed-testing FDP bound (74) turns out to coincide with the adaptive interpolated bounds of Section 2.7 (improved by adding an integer part). This is exempliﬁed in the next result for the Simes sequence.

Lemma 47. Consider the Simes sequence ℓi,k = δi/k. Then, on the event where all p-values are diﬀerent from all thresholds ℓi,k, the closed-testing bound (74) is equal to

k min 1 k k{k k + k ˆm0p(k )/δ },

Iqraa Meah, Gilles Blanchard and Etienne Roquain

α = 0.05 α = 0.1

α = 0.15 α = 0.2

Figure 17: Figure 6 where we have superposed in each case the (median of the) interpolated bounds (star symbols). Preordered sparse (β = 0.25) LF setting with LF procedure (s = 0.1α, λ = 0.5).

α = 0.15 α = 0.2

Figure 18: Figure 7 where we have superposed in each case the (median of the) interpolated bounds (star symbols). Pre-ordered dense (β = 0) knockoﬀsetting with BC procedure (i.e., LF procedure with s = λ = 0.5).

Consistent FDP envelopes

α = 0.15 α = 0.2

Figure 19: Figure 8 where we have superposed in each case the (median of the) interpolated bounds (star symbols). Pre-ordered sparse (β = 0.25) knockoﬀsetting with BC procedure (i.e., LF procedure with s = λ = 0.5).

Lemma 47 shows that, for the Simes threshold, the closed testing bound improves the interpolated one only in the way m0 is estimated. The closed-testing m0 estimator (75) is by essence more accurate than those that we proposed in Section 2.6, but is also more computationally demanding. In addition, in our experiments, the improvement is modest in general, as shown in Figure 20. We see the closed-testing versions of our bounds as advisable when m is small, because the improvement seems to be the most signiﬁcant in that case while the complexity is still low. In addition, this ﬁgure also suggests that the closed testing versions of Simes and KR bounds are m-inconsistent, which corroborates the theoretical ﬁndings of Corollary 17 in the Simes case. Proof Deﬁne U = {u(k ), 1 k k}, with u(k ) = ˆm0p(k )/δ . We have

k min 1 k k{k k + k ˆm0p(k )/δ } = k min 1 k k{k k + ˆm0p(k )/δ }

= k min u U

j=1 1 u(j) u + u o

= k min u U

j=1 1 u(j) > u + u o

= k min u U

j=1 1 u(j) u + 1 + u o

= k min u U

j=1 1 ˆm0p(j)/δ u + 1 + u o

= k min v U+1

j=1 1 p(j) vδ/ ˆm0 + v 1 o .

Iqraa Meah, Gilles Blanchard and Etienne Roquain

Fix now w {1, . . . , k}, and let us prove

j=1 1 p(j) wδ/ ˆm0 + w 1 k min v U+1

j=1 1 p(j) vδ/ ˆm0 + v 1 o . (76)

First observe that for all j {1, . . . , k},

p(j) wδ/ ˆm0 ˆm0p(j)/δ w ˆm0p(j)/δ w u(j) + 1 > w. (77)

Hence if for all v U + 1 we have v > w, (76) is satisﬁed. Otherwise, there is one v U + 1 such that v w and we can consider vw = max{v U + 1 : v w} the maximum of the elements of U + 1 that are below w. From (77), we have for all j {1, . . . , k}, p(j) wδ/ ˆm0 u(j) + 1 > vw p(j) vwδ/ ˆm0, which means Pk j=1 1 p(j) wδ/ ˆm0 = Pk j=1 1 p(j) vwδ/ ˆm0 and thus since w vw, the inequality (76) is also satisﬁed. This establishes in any case

k min v U+1

j=1 1 p(j) vδ/m + v 1 o = k min 1 w k

j=1 1 p(j) wδ/m + w 1 o .

This gives the result.

α = 0.1 α = 0.2

Figure 20: Median of the adaptive bounds of Section 2.6 (plain circles), median of interpolated bounds of Section 2.7 (hollow star), and median of closed-testing bounds given by (74) (asterix) in function of m {100, 1000, 10000}. The closed-testing is only computed for Simes and KR bounds. The simulation setting is the same as the one used for the right panel of Figure 3 (π0 = 0.5).