# fullyadaptive_composition_in_differential_privacy__eaf709df.pdf Fully-Adaptive Composition in Differential Privacy Justin Whitehouse 1 Aaditya Ramdas 1 Ryan Rogers 2 Zhiwei Steven Wu 1 Composition is a key feature of differential privacy. Well-known advanced composition theorems allow one to query a private database quadratically more times than basic privacy composition would permit. However, these results require that the privacy parameters of all algorithms be fixed before interacting with the data. To address this, Rogers et al. (2016) introduced fully adaptive composition, wherein both algorithms and their privacy parameters can be selected adaptively. They defined two probabilistic objects to measure privacy in adaptive composition: privacy filters, which provide differential privacy guarantees for composed interactions, and privacy odometers, time-uniform bounds on privacy loss. There are substantial gaps between advanced composition and existing filters and odometers. First, existing filters place stronger assumptions on the algorithms being composed. Second, these odometers and filters suffer from large constants, making them impractical. We construct filters that match the rates of advanced composition, including constants, despite allowing for adaptively chosen privacy parameters. En route we also derive a privacy filter for approximate z CDP. We also construct several general families of odometers. These odometers match the tightness of advanced composition at an arbitrary, preselected point in time, or at all points in time simultaneously, up to a doubly-logarithmic factor. We obtain our results by leveraging advances in martingale concentration. In sum, we show that fully adaptive privacy is obtainable at almost no loss. 1Carnegie Mellon University 2Linked In. Correspondence to: Justin Whitehouse . Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright 2023 by the author(s). 1. Introduction Differential privacy (Dwork et al., 2006b) is an algorithmic criterion that provides meaningful guarantees of individual privacy for conducting analysis on sensitive data. Intuitively, an algorithm is differentially private if similar inputs induce similar distributions on outputs. More formally, an algorithm A : X Y is differentially private if, for any set of outcomes G Y and any neighboring inputs x, x X, P(A(x) G) eϵP(A(x ) G) + δ, (1) where ϵ and δ are the privacy parameters of the algorithm. A key property of differential privacy is graceful composition. Suppose A1, . . . , An are algorithms such that each Am is (ϵm, δm)-differentially private. Advanced composition (Dwork et al., 2010; Kairouz et al., 2015) states that, for any δ > 0, the composed sequence of algorithms is (ϵ, δ)-differentially private, where δ = δ +P m n δm, and v u u t2 log 1 m n ϵ2m + X When all privacy parameters are the same and small, we roughly have ϵ = O( nϵm). Hence, analysts can make use of sensitive datasets with a slow degradation of privacy. However, there is a major disconnect between most existing results on privacy composition and modern data analysis. As analysts view the outputs of algorithms, the future manner in which they interact with the data changes. Advanced composition allows analysts to adaptively select algorithms, but not privacy parameters. In many cases, analysts may wish to choose the subsequent privacy parameters based on the outcomes of the previous private algorithms. For example, if an analyst learns, from past computations, that they only need to run one more computation, they should be able to use the remainder of their privacy budget in the final round. Likewise, if an analyst is having a hard time deriving conclusions, they should be allowed to adjust privacy parameters to extend the allowable number of computations. This desideratum has motivated the study of fully adaptive composition, wherein one is allowed to adaptively select the privacy parameters of the algorithms. Rogers et al. (2016) define two probabilistic objects which can be used to ensure Fully-Adaptive Composition in Differential Privacy privacy guarantees in fully adaptive composition. The first, called a privacy filter, is an adaptive stopping condition that ensures an entire interaction between an analyst and a dataset retains a pre-specified target privacy level, even when the privacy parameters are chosen adaptively. The second, called a privacy odometer, provides a sequence of high-probability upper bounds on how much privacy has been lost up to any point in time. While this work took the first steps towards fully adaptive composition, their filters and odometers suffered from large constants and the latter suffered from sub-optimal asymptotic rates. We show that, as long as a target privacy level is prespecified, one can obtain the same rate as advanced composition, including constants. We also construct families of privacy odometers that are not only tighter than the originals, but can be optimized for various target levels of privacy. Overall, we show that full adaptivity is not a cost but rather a feature of differential privacy. 1.1. Related Work Privacy Composition: There is a long line of work on privacy composition. The basic composition theorem states that, when composing private algorithms, the privacy parameters (both ϵ and δ) add up linearly (Dwork et al., 2006b;a; Dwork and Lei, 2009). The advanced composition theorem allows the total ϵ to grow sublinearly with a small degradation on δ (Dwork et al., 2010). Later work (Kairouz et al., 2015; Murtagh and Vadhan, 2016) studies optimal composition, a computationally intractable formula that tightly characterizes the overall privacy of composed mechanisms. More recently, several variants of privacy have been studied including (zero)-concentrated differential privacy (z CDP) (Bun and Steinke, 2016; Dwork and Rothblum, 2016), Renyi differential privacy (RDP) (Mironov, 2017), and f-differential privacy (f-DP) (Dong et al., 2021). These all exhibit tighter composition results than differential privacy, but for restricted classes of mechanisms. These results do not allow adaptive choices of privacy parameters. Privacy Filters and Odometers: Rogers et al. (2016) originally introduced privacy filters and odometers, which allow privacy composition with adaptively selected privacy parameters. While their contributions provide a decent approximation of advanced composition, their bounds suffer from large constants, which prevents practical usage. Our work directly improves over these initial results. First, we construct privacy filters essentially matching advanced composition. We also provide flexible families of privacy odometers that outperform those of Rogers et al. (2016). Feldman and Zrnic (2021) leverage RDP to construct R enyi filters, where they require individual mechanisms to satisfy RDP. Since our proof establishes a new privacy filter for approximate z CDP (Bun and Steinke, 2016), our results also extend to approximate RDP (Papernot and Steinke, 2022), which directly generalizes their R enyi filter. Even though it is also possible to obtain a privacy filter for (ϵ, δ)-DP through R enyi filters (Feldman and Zrnic, 2021), this result requires a stronger assumption that algorithms being composed satisfy probabilistic (i.e.point-wise) differential privacy (Kasiviswanathan and Smith, 2014). Since converting from differential privacy to probabilistic differential privacy can be costly (see Lemma 2), our filters demonstrate an improvement by avoiding the conversion cost. More recently, Koskela et al. (2022) and Smith and Thakurta (2022) provide privacy filters for Gaussian DP (GDP) (Dong et al., 2021). However, their results do not hold for more general mechanisms under f-DP and therefore cannot handle algorithms with rare catastrophic privacy failure events, in which the privacy loss goes to infinity. Both of our (ϵ, δ)- filter and approximate z CDP filters can handle such events. Feldman and Zrnic (2021) and L ecuyer (2021) construct RDP odometers. The former work sequentially composes R enyi filters and the latter work simultaneously runs multiple R enyi filters and takes a union bound. Neither odometer provides high probability, time-uniform bounds on privacy loss, making these results incomparable to our own. We believe our notion of odometers, which aligns with that of Rogers et al. (2016), is more natural. To prove our results, we leverage time-uniform concentration results for martingales (Howard et al., 2020; 2021). The bounds in these papers directly improve over related selfnormalized concentration results (de la Pena et al., 2004; Chen et al., 2014). These latter bounds were leveraged in Rogers et al. (2016) to construct filters and odometers. 1.2. Summary of Contributions In this work, we provide two primary contributions. We present these results in full rigor following a brief discussion of privacy basics and martingale theory in Section 2. Privacy Filters: In Theorem 2 of Section 3, we construct privacy filters that match the rate of advanced composition. Our filters significantly improve over those of Rogers et al. (2016). In fact, our proof first derives a privacy filter for approximate z CDP (Bun and Steinke, 2016) (and also approximate RDP (Papernot and Steinke, 2022)), which implies a filter for (ϵ, δ)-differential privacy using known conversion results. Our results then extend the existing filters for pure RDP in Feldman and Zrnic (2021). This extension allows us to capture a broader class of algorithms and avoids the conversion loss when translating bounds between pure RDP and (ϵ, δ)-differential privacy. We state our result in Informal Fully-Adaptive Composition in Differential Privacy (a) Comparing lower order terms (b) Comparing privacy odometers Figure 1: Figure 1a compares the lower order terms of advanced composition and our privacy filter. Figure 1b compares the original odometer of Rogers et al. (2016) with our odometers (filter, mixture, and stitched). Theorem 1 below.1 Privacy Odometers: In Theorem 3 of Section 4, we construct improved privacy odometers that is, sequences of upper bounds on privacy loss which are all simultaneously valid with high probability. Our three families of odometers theoretically and empirically outperform those of Rogers et al. (2016). See Figure 1b for a comparison. Informal Theorem 1 (Improved Privacy Filter). Fix target privacy parameters ϵ > 0 and δ > 0, and suppose (An)n 1 is an adaptively selected sequence of algorithms. Assume that An is (ϵn, δn)-DP conditioned on the outputs of the first n 1 algorithms, where ϵn and δn may depend on outputs of A1, . . . , An 1. If a data analyst stops in- teracting with the data before q m n+1 ϵ2m + m n+1 ϵ2 m > ϵ, then the entire interaction is (ϵ, δ)-DP. Informal Theorem 1 almost recovers advanced composition when all parameters ϵn and δn are fixed prior to interacting with the dataset. The only difference is a slight gap in the lower order term, as ϵ eϵ 1 eϵ+1 1 2ϵ2. (In fact the difference between the left and right hand sides is O(ϵ4), as can be checked with a Taylor expansion.) Figure 1a demonstrates that this gap is negligible for small values of ϵ, which is the natural setting for differential privacy. Our second major contribution is the construction of several families of privacy odometers. These odometers give a running bound on privacy loss in settings where a target level of overall privacy is not known. Our constructed odometers are significantly tighter than the originals (Rogers et al., 1In Appendix D, we provide an alternative proof for our privacy filter result through reductions to generalized randomized response. While it gives the exact same rates, we believe it could be of independent interest. For example, it may be useful for obtaining filters with rates like the optimal composition (Murtagh and Vadhan, 2016; Kairouz et al., 2015), which used a similar reduction to randomized response in their analysis. 2016), as can be seen in Figure 1b. Our key insight is to view adaptive privacy composition as depending not on the number of algorithms being composed, but rather on the sums of squares of privacy parameters, P m n ϵ2 m. This shift to looking at intrinsic time allows us to apply recent advances in time-uniform concentration (Howard et al., 2020; 2021) to privacy loss martingales. Overall, our results show that their is essentially no cost for fully adaptive private data analysis. 2. Background on Differential Privacy Throughout, we assume all algorithms map from a space of datasets X to outputs in a measurable space, typically either denoted (Y, G) or (Z, H). For a sequence of algorithms (An)n 1, we often consider the composed algorithm A1:n := (A1, . . . , An). For more background on measuretheoretic matters, as well as on the notion of neighboring datasets, see Appendix A. We start by formalizing a generalization of differential privacy in which the privacy parameters of an algorithm An can be functions of the outputs of A1, . . . , An 1. In particular, we replace the probabilities in Equation (1) with conditional probabilities given relevant random variables. Definition 1 (Conditional Differential Privacy). Suppose A and B are algorithms mapping from a space X to measurable spaces (Y, G) and (Z, H) respectively. Suppose ϵ, δ : Z R 0 are measurable functions. We say the algorithm A is (ϵ, δ)-differentially private conditioned on B if, for any neighbors x, x X and for all measurable sets G G, we have P (A(x) G | B(x)) eϵ(B(x))P (A(x ) G | B(x)) + δ(B(x)). For conciseness, we will write either ϵ or ϵ(x) for ϵ(B(x)) Fully-Adaptive Composition in Differential Privacy and likewise δ or δ(x) for δ(B(x)). In the nth round of adaptive composition, we will set A := An and B := A1:n 1. In this setting, the analyst has functions ϵn, δn : Yn 1 R 0 and takes the nth round privacy parameters to be ϵn(A1:n 1(x)) and δn(A1:n 1(x)). In other words, the analyst uses the outcome of the first n 1 algorithms to decide the level of privacy for the nth algorithm, ensuring that An is (ϵn, δn)-differentially private conditioned on A1:n 1. We will also leverage the notion of zero-concentrated differential privacy (z CDP) (Bun and Steinke, 2016), which often provides a cleaner analysis for privacy composition. First, we will recall the definition of R enyi divergence. Definition 2. The R enyi divergence from P to Q of order λ 1 is defined as Dλ(P Q) := 1 λ 1 log The notion of z CDP bounds the R enyi divergence from A(x) to A(x ) for any neighbors x and x . We will focus on a more general definition called approximate z CDP (Bun and Steinke, 2016; Papernot and Steinke, 2022) that permits a small probability of unbounded R enyi divergence. For the purpose of adaptive composition, we will state the conditional counterpart of this definition.2 Definition 3 (Conditional approximate z CDP). Supppose A and B are algorithms with inputs in space X and outputs in measurable spaces (Y, G) and (Z, H). Suppose δ, ρ : Z R 0 are measurable. We say the algorithm A is δ-approximate ρ-z CDP conditioned on B if, for any neighboring datasets x, x , there exist distributions P , P , Q , Q such that the conditional outputs are distributed according to the following mixture distributions: A(x) | B(x) (1 δ(B(x)))P + δ(B(x))P A(x ) | B(x) (1 δ(B(x))Q + δ(B(x))Q , where for all λ 1, Dλ(P Q ) ρ(B(x))λ and Dλ(Q P ) ρ(B(x))λ. For succinctness, we will write ρ(x) for ρ(B(x)) and δ(x) for δ(B(x)). We will also use the notions of filtration and martingales. Filtration and Martingales: A process (Xn)n N is said to be a martingale with respect to a filtration (Fn)n N if, for all n N, (a) Xn is Fn-measurable, (b) E|Xn| < , and (c) E(Xn | Fn 1) = Xn 1. Correspondingly, (Xn)n N 2The approximate z CDP definition we state uses the convex mixture formulation adapted from Papernot and Steinke (2022), since it is more convenient for our proof. In Appendix C.1, we will show that this definition is equivalent to the original definition in Bun and Steinke (2016). is a supermartingale if E(Xn | Fn 1) Xn 1. In our context, we will consider the natural filtration (Fn(x))n N generated by (An(x))n 1. In our proofs, we construct the appropriate (super)martingales so that we can leverage the optional stopping theorem and time-uniform concentration to obtain privacy filters and odometers (Ville, 1939; Howard et al., 2020; 2021). We present a full exposition of the mathematical tools in Appendix A and B. 3. Privacy Filters We now provide our main results on privacy filter. In general, a privacy filter is a function N that takes the privacy parameters of a sequence of private algorithms as input and decides to stop at some point so that the composition of these algorithms satisfies a pre-specified level of privacy. We will first present a privacy filter for approximate z CDP (Theorem 1), which will immediately imply the privacy filter result for (ϵ, δ)-DP (Theorem 2). Since approximate z CDP bounds R enyi divergence of all orders λ, our proof for Theorem 1 also directly implies a privacy fiter for approximate RDP (Papernot and Steinke, 2022), which generalizes the RDP filter by Feldman and Zrnic (2021). Our (ϵ, δ)-DP filter improves on the rate of the original filter presented in Rogers et al. (2016) and matches the rate of advanced composition that requires pre-fixed choices of privacy parameters. Even though it is also possible to obtain an (ϵ, δ)-DP filter through the result of Feldman and Zrnic (2021), our privacy filters avoid their conversion costs and provide a tighter bound.3 We can now state our general privacy filter in terms of approximate z CDP. Theorem 1 (Approximate z CDP filter). Let (An)n 1 be an adaptive sequence of algorithms, and, for any x, let F (Fn(x))n N be the natural filtration generated by (An(x))n N. Assume that δn, ρn are predictable with respect to F, meaning that they are Fn 1(x)-measurable. For any n 1, assume that An is δn-approximate ρn-z CDP conditioned on Fn 1(x). Consider the stopping function N : R 0 R 0 N given by N((ρn)n 1, (δn)n 1) := m n+1 ρm or δ < X Then A1:N( )( ): X Y is δ-approximate ρ-z CDP. We note that the above theorem immediately implies a privacy filter for approximate RDP, and thus Theorem 1 can be 3Feldman and Zrnic (2021, Section 4.3) apply R enyi filters to algorithms which satisfy (conditional) probabilistic differential privacy (p DP). In general, a lossy conversion from (ϵ, δ)-DP to (ϵ, δ)-p DP is required to apply their filter. Fully-Adaptive Composition in Differential Privacy viewed as a strict generalization of the work of Feldman and Zrnic (2021). Further, Theorem 1 implies a privacy filter under (ϵ, δ)-differential privacy. To show this implication, we will use the following conversion results. Lemma 1 ((Bun and Steinke, 2016)). If A satisfies (ϵ, δ)-DP, then A satisfies δ-approximate 1 2ϵ2-z CDP. If A satisfies δ-approximate ρ-z CDP, then A satisfies (ρ + 2 p ρ ln(1/δ ), δ + (1 δ)δ )-DP. We can now obtain our (ϵ, δ)-privacy filter by a conversion of individual approximate differential privacy parameters to approximate z CDP ones, application of the approximate z CDP filter, and the conversion of approximate z CDP back to approximate differential privacy. Theorem 2 ((ϵ, δ)-DP filter). Suppose (An)n 1 is a sequence of algorithms such that, for any n 1, An is (ϵn, δn)-differentially private conditioned on A1:n 1. Let ϵ > 0 and δ = δ + δ be target privacy parameters such that δ > 0, δ 0. Let N : R 0 R 0 N be given by N((ϵn)n 1, (δn)n 1) := v u u t2 log 1 m n+1 ϵ2m + 1 m n+1 ϵ2 m or δ < X Then, the algorithm A1:N( )( ) : X Y is (ϵ, δ)-DP, where N(x) := N((ϵn(x))n 1, (δn(x))n 1). Now we provide a proof for Theorem 1. Recall that the output under a privacy filter is a random vector A1:N(x)(x) = (A1(x), . . . , AN(x)(x)). In our proof, we will also consider the unstopped process A(x) = (A1(x), A2(x), . . . ). Proof of Theorem 1. Let x, x be neighbors, and P, Q denote the likelihoods of the observed output A(x) when the inputs are x, x respectively. The likelihoods of observing the stopped process A1:N(x)(x) under x and x are: P(A1:N(x)(x)) = n=1 P(An(x) | Fn 1(x)), (3) Q(A1:N(x)(x)) = n=1 Q(An(x) | Fn 1(x)). (4) It suffices to show that the two likelihoods can be decomposed as weighted mixtures of P and P , and Q and Q respectively such that the mixture weights on P and Q are at least (1 δ) and for all λ 1, max n Dλ P (A1:N(x)(x)) Q (A1:N(x)(x)) , Dλ Q (A1:N(x)(x)) P (A1:N(x)(x)) o ρλ. (5) By our assumption of conditional approximate z CDP at each step n, we can write P(An(x) | Fn 1(x)) and Q(An(x) | Fn 1(x)) as the following convex combinations: P(An(x) | Fn 1(x)) = (1 δn(x))P n(An(x) | Fn 1(x)) + δn(x)P n (An(x) | Fn 1(x)), Q(An(x) | Fn 1(x)) = (1 δn(x))Q n(An(x) | Fn 1(x)) + δn(x)Q n(An(x) | Fn 1(x)), such that for all λ 1, we have both Dλ P n(An(x) | Fn 1(x)) Q n(An(x) | Fn 1(x)) ρn(x)λ, (6) Dλ Q n(An(x) | Fn 1(x)) P n(An(x) | Fn 1(x)) ρn(x)λ. (7) Now consider the product measures P and Q such that for any n 1, P (A1:n(x)) = m=1 P m(Am(x) | Fm 1(x)) and Q (A1:n(x)) = m=1 Q m(Am(x) | Fm 1(x)). (8) We will establish inequality (5). For any fixed λ 1, consider the following processes: log P m(Am(x) | Fm 1(x)) Q m(Am(x) | Fm 1(x)) Xn := exp ((λ 1)Mn) . (10) By Lemma 6, Xn is a nonnegative P -supermartingale with respect to (Fn(x))n N. By the optional stopping theorem for nonnegative supermartingales (Lemma 5), we have EP [XN(x)] EP [X0] = 1. (11) By plugging in the definition of Xn and the stopping criterion of N, we can bound the R enyi divergence Dλ P (A1:N(x)(x)) Q (A1:N(x)(x)) ρλ (see Lemma 7), and so inequality (5) holds by symmetry. Finally, by Lemma 8, we can rewrite both P and Q as weighted mixtures containing P and Q , with weights at least 1 δ. This completes the proof. 4. Privacy Odometers Previously, we constructed privacy filters that matched the rate of advanced composition while allowing both algorithms and privacy parameters to be chosen adaptively. While privacy filters require the total level of privacy to be fixed in advance, it is desirable to track the privacy loss at all steps without a pre-fixed budget (Ligett et al., 2017). We now study privacy odometers which provide sequences of upper bounds on accumulated privacy loss that are valid at all points in time simultaneously with high probability. Fully-Adaptive Composition in Differential Privacy 4.1. Background on Privacy Loss and Odometers To formally introduce privacy odometers, we will first revisit the notion of privacy loss, which measures how much information is revealed about the underlying input dataset. For neighbors x, x X, let px and px be the densities of A(x) and A(x ) respectively. The privacy loss between A(x) and A(x ) is defined as L(x, x ) := log px(A(x)) By Equation (12), a negative privacy loss suggests that the input is more likely to be x , and likewise a positive privacy loss suggests that the input is more likely to be x. We now generalize privacy loss to its conditional counterpart. Definition 4 (Conditional Privacy Loss). Suppose A and B are as in Definition 1. Suppose x, x X are neighbors. Let px( | ), px ( | ) : Y Z R 0 be conditional densities for A(x) and A(x ) respectively given B(x).4 The privacy loss between A(x) and A(x ) conditioned on B is given by LB(x, x ) := log px(A(x)|B(x)) px (A(x)|B(x)) Suppose An is the nth algorithm being run and we have already observed A1:n 1(x) for some unknown input x X. If we are trying to guess whether x or a neighbor x produced the data, we would consider the privacy loss between An(x) and An(x ) conditioned on A1:n 1(x). It is straightforward to characterize the privacy loss of a composed algorithm A1:n in terms of the privacy loss of each constituent algorithm A1, , An. Namely, from Bayes rule, L1:n(x, x ) = X m n Lm(x, x ), (13) where Lm(x, x ) is shorthand for the conditional privacy loss between Am(x) and Am(x ) given A1:m 1(x), per Definition 4. Equation (13) also holds at arbitrary random times N(x) that only depend on the dataset x X through observed algorithm outputs. The simple decomposition of privacy loss noted above motivates the study of an alternative , probabilistic definition of differential privacy. Intuitively, an algorithm should be differentially private if, with high probability, the privacy loss is small. More formally, an algorithm A : X Y is said to be (ϵ, δ)-probabilistically differentially private, or (ϵ, δ)-p DP for short, if, for all neighboring inputs x, x X, 4To ensure the existence of conditional densities, it suffices to assume that Y and Z are Polish spaces under some metrics d Y and d Z, and that G and H are the corresponding Borel σ-algebras associated with d Y and d Z (Durrett, 2019). These measurability assumptions are not restrictive, as Euclidean spaces, countable spaces, and Cartesian products of the two satisfy these assumption. we have P (|L(x, x )| > ϵ) δ. In the previous line (as well as in the remainder of the section), the randomness in L(x, x ) comes from the randomized algorithm A. Unfortunately, as noted by Kasiviswanathan and Smith (2014) (in which p DP is called point-wise indistinguishability), p DP is a strictly stronger notion than DP. In particular, if an algorithm is (ϵ, δ)-p DP, it is also (ϵ, δ)-DP. The converse in general requires a costly conversion. Lemma 2 (Conversions between DP and p DP (Kasiviswanathan and Smith, 2014)). If A is (ϵ, δ)-p DP, then A is also (ϵ, δ)-DP. Conversely, if A is (ϵ, δ)-DP, then A is (2ϵ, 2δ ϵeϵ )-p DP. We note that that Guingona et al. (2023) have recently shown that other possible conversion rates from probabilistic differential privacy to approximate differential privacy are possible. However, we note that these conversions require trading off tightness in the approximation parameter ϵ and the approximation parameter δ. In particular, a fully tight conversion from probabilistic differenial privacy to approximate differential privacy is not possible. We will work with the conditional counterpart of probabilistic differential privacy (p DP). Definition 5 (Conditional Probabilistic Differential Privacy). Suppose A : X Y and B : X Z are algorithms, and ϵ, δ : Z R 0 are measurable. Then, A is said to be (ϵ, δ)-probabilistically differentially private conditioned on B if, for any neighbors x, x X, we have P (|LB(x, x )| > ϵ(B(x))|B(x)) δ(B(x)). While in Theorem 2 we assumed that the algorithms being composed were conditionally differentially private, here, we need to assume conditional probabilistic privacy. This is because our goal is not differential privacy, but rather tight control over privacy loss. We conjecture that a version of our privacy odometer (in Theorem 3) that replaces p DP by DP and leaves all else identical does not hold. Our intuition for this conjecture is that there exist simple examples of algorithms satisfying (ϵ, δ)-DP that don t satisfy (ϵ, δ)-p DP (see Appendix F, for instance). We believe that, by sequentially composing such algorithms and using anticoncentration results, one can show that some odometers fail to be valid. We leave this as potential future work. In sequential composition, we would assume the nth algorithm An is (ϵn, δn)-p DP conditioned on A1:n 1. The privacy parameters would be given as functions of A1:n 1(x). Now we state the definition of privacy odometer, which provides bounds on privacy loss under arbitrary stopping conditions (e.g.conditions based on model accuracy). Definition 6 (Privacy Odometer (Rogers et al., 2016)). Let (An)n 1 be an adaptive sequence of algorithms such that, for all n 1, An is (ϵn, δn)-p DP conditioned on Fully-Adaptive Composition in Differential Privacy A1:n 1. Let (un)n 1 be a sequence of functions where un : Rn 1 0 Rn 1 0 R 0. Let δ (0, 1) be a target confidence parameter. For x X, n 1, define Un(x) := un(ϵ1:n 1(x), δ1:n 1(x)). Then, (un)n 1 is called a δ-privacy odometer if, for all x, x X neighbors, we have P ( n 1 : L1:n(x, x ) > Un(x)) δ. 4.2. Improved Privacy Odometers We construct our privacy odometers in Theorem 3. Our technical centerpiece is time-uniform concentration inequalities for martingales (Ville, 1939; Howard et al., 2020; 2021). For a martingale (Mn)n N and confidence level δ > 0, time-uniform concentration inequalities provides bounds (Un)n N satisfying P( n N : Mn > Un) δ. Thus, if we can create a martingale from privacy loss, we can use time-uniform concentration to construct odometers. Our proof first considers the case where each An is (ϵn, 0)-p DP and the privacy loss martingale (Mn)n N (Dwork et al., 2010) is given by M0 = 0 and: Mn := Mn(x, x ) := L1:n(x, x ) X m n E Lm(x, x )|Fn 1(x) (14) We then extend to the case of δn 0 via conditioning. To construct their filters and odometers, Rogers et al. (2016) use self-normalized concentration inequalities (de la Pena et al., 2004; Chen et al., 2014). We instead use advances in time-uniform martingale concentration (Howard et al., 2020; 2021), which yields tighter results. Theorem 3. Suppose (An)n 1 is a sequence of algorithms such that, for any n 1, An is (ϵn, δn)-p DP conditioned on A1:n 1. Let δ = δ + δ be a target approximation parameter such that δ > 0, δ 0. Define N := N((δn)n 1) := inf n n N : δ < P and Vn := P m n ϵ2 m. Define the following: 1. Filter odometer. For any ϵ > 0, let y := q δ + ϵ 2 . Define the functions (u F n )n 1 by u F n (ϵ1:n, δ1:n) := δ ) 2 y Vn + 1 2Vn otherwise. 2. Mixture odometer. For any γ > 0, define the sequence of functions (u M n )n 1 by u M n (ϵ1:n, δ1:n) := 2 log 1 δ q γ (γ + Vn) + 1 2Vn otherwise. 3. Stitched odometer. For any v0 > 0, define the sequence of functions (u S n)n 1 by u S n(ϵ1:n, δ1:n) := n > N or Vn < v0 Vn log log 2Vn + 0.72 log 5.2 Then, any of the sequences (u F n )n 1, (u M n )n 1, or (u S n)n 1 is a δ-privacy odometer. The proof of Theorem 3 can be found in Appendix E. We now provide intuition for our odometers, which are plotted in Figure 3. Our insight is to view odometers not as functions of the number of algorithms being composed, but rather as functions of the intrinsic time P m n ϵ2 m. This reframing allows us to leverage the various time-uniform concentration inequalities discussed in Appendix B. The filter odometer is the tightest odometer when the value P m n ϵ2 m is close to a fixed time y , but the tightness drops off precipitously when P m n ϵ2 m is far from y . The mixture odometer, which is named after the the method of mixtures (Robbins, 1970; de la Pe na et al., 2007; Howard et al., 2021), sacrifices tightness at any fixed point in time to obtain overall tighter bounds on privacy loss. This odometer can be numerically optimized, in terms of ρ, for tightness at a predetermined value P m n ϵ2 m. The stitched odometer, whose name derives from Theorem 6, is similarly tight across time. This odometer requires that P m n ϵ2 m exceed some pre-selected variance v0 before becoming nontrivial (i.e. finite). Larger values of v0 will yield tighter odometers, albeit at the cost of losing bound validity when accumulated variance is small. With this intuition, we can compare our odometers to the original presented in Rogers et al. (2016). Lemma 3 (Theorem 6.5 in Rogers et al. (2016)). Assume the same setup as Theorem 3, and fix δ = δ + δ , where 1 e δ > 0 and δ 0. Define the sequence of functions (u R n )n 1 by u R n (ϵ1:n, δ1:n) := 2Vn log(110e) + 2 log log(|x|) 2 Vn n N, Vn 1 |x|2 , 1 2 1 |x|2 + Vn 2 log 1 + |x|2Vn log log 4 δ log2(|x|) + 1 where |x| denotes the number of elements in dataset x. Then, (u R n )n 1 is a δ-privacy odometer. Our new odometers improve over the one presented in Lemma 3. First, the above odometer has an explicit dependence on dataset size. In learning settings, datasets are large, degrading the quality of the odometer. Secondly, the tightness of the odometer drops off outside of the interval h 1 |x|2 , 1 i . If any privacy parameter of an algorithm Fully-Adaptive Composition in Differential Privacy (a) Comparing filter odometers (b) Comparing mixture odometers (c) Comparing stitched odometers Figure 2: Comparison of filter, mixture, and stitched odometers plotted as functions of P m n ϵ2 m. We set δ = 10 6 and assume all algorithms being composed are purely differentially private for simplicity. (a) New odometers vs. original (b) New odometers vs. pointwise advanced composition Figure 3: Figure 3a compares our odometers to the original. Figure 3b compares them with advanced composition optimized point-wise. The curve plotted for advanced composition is valid at any fixed time, but not uniformly over time. Our odometers nevertheless provide a close approximation. being composed exceeds 1, the bound becomes significantly looser. Lastly, and perhaps most simply, the form of the odometer is complicated. Our odometers all have relatively straightforward dependence on the intrinsic time P We now examine the rates of all odometers. For simplicity, let v := P m n ϵ2 m. The stitched odometer has a rate of O( p v log log(v)) in its leading term, asymptotically matching the law of the iterated logarithm (Robbins, 1970) up to constants. Both the original privacy odometer and the mixture odometer have a rate of O( p v log (v)), demonstrating worse asymptotic performance. The filter odometer has the worst asymptotic performance, growing linearly as O (v). This does not mean the stitched odometer is the best odometer, since target levels of privacy are often kept small. To empirically compare odometers, it suffices to consider the setting of pure differential privacy, as the odometers identically depend on (δn)n 1. Each presented odometer can be viewed as a function of v, allowing us to compare odometers by plotting their values for a continuum of v. Figure 3a shows that there is no clearly tightest odometer. All odometers, barring the original, dominate for some window of values of v. While the stitched odometer is asymptotically best, the mixture odometer is tighter for small values of v. Likewise, if one knows an approximate target privacy level, the filter odometer is tightest. This behavior is expected from our understanding of martingale concentration (Howard et al., 2020; 2021): there is no uniformly tightest boundary containing (with probability 1 δ) the entire path of a martingale; boundaries that are tight early must be looser later, and vice versa. In fact, we conjecture that our bounds are essentially unimprovable in general this conjecture stems from the fact that the time-uniform martingale boundaries employed have error probability essentially equal to δ, which in turn stems from the deep fact that for continuous-path (and thus continuous-time) martingales, Ville s inequality (Fact 4) that underlies the derivation of these boundaries holds with exact equality. Since we operate in discrete-time, the only looseness in Ville s inequality stems from lower-order terms that reflect the possibility that at the stopping time, the value of the stopped martingale may not be exactly the value at the boundary. In Figure 3b, we compare our odometers with advanced composition optimized in a point-wise sense for all values of v simultaneously. This boundary is not a valid odometer, as advanced composition only holds at a prespecified point in intrinsic time v. Our odometers are almost tight with advanced composition for the values of v plotted. Our filter Fully-Adaptive Composition in Differential Privacy odometer lies tangent to the advanced composition curve, as expected from Section 5.2 of Howard et al. (2020). 5. Future Directions There are many open problems related to fully adaptive composition. For example, even though privacy filters has been studied under the notion of Gaussian DP (Smith and Thakurta, 2022; Koskela et al., 2022), privacy filters and odometers have not been studied for general f-DP (Dong et al., 2021). It also has not been investigated whether adaptivity in privacy parameter selection improves the performance of iterative algorithms such as private SGD. Intuitively, it should be beneficial to let the iterates of an algorithm guide future choices of privacy parameters. Optimal composition results (Kairouz et al., 2015; Murtagh and Vadhan, 2016; Zhu et al., 2022) have yet to be considered in a setting where privacy parameters are adaptively selected. In Appendix D, we provide another proof of Theorem 2, which leverages a reduction of private algorithms to generalized randomized response. Since such a reduction was used in the proofs of Kairouz et al. (2015) and Murtagh and Vadhan (2016), we believe this proof can be useful for optimal composition with adaptively chosen privacy parameters. 6. Acknowledgements AR acknowledges support from NSF DMS 1916320 and an ARL Io BT CRA grant. Research reported in this paper was sponsored in part by the DEVCOM Army Research Laboratory under Cooperative Agreement W911NF-17-2-0196 (ARL Io BT CRA). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein. ZSW and JW were supported in part by the NSF CNS2120667, NSF Award #2120667, a Cy Lab 2021 grant, a Google Faculty Research Award, and a Mozilla Research Grant. JW acknowledges support from NSF GRFP grants DGE1745016 and DGE2140739. David Blackwell. Equivalent comparisons of experiments. The annals of mathematical statistics, pages 265 272, 1953. Mark Bun and Thomas Steinke. Concentrated differential privacy: simplifications, extensions, and lower bounds. In Theory of Cryptography Conference, pages 635 658. Springer, 2016. Shanshan Chen, Zhenping Wang, Wenfei Xu, and Yu Miao. Exponential inequalities for self-normalized martingales. Journal of Inequalities and Applications, 2014(289):1 12, 2014. Victor H de la Pena, Michael J Klass, and Tze Leung Lai. Self-normalized processes: exponential inequalities, moment bounds, and iterated logarithm laws. Annals of Probability, pages 1902 1933, 2004. Victor H. de la Pe na, Michael J. Klass, and Tze Leung Lai. Pseudo-maximization and self-normalized processes. Probability Surveys, 4:172 192, 2007. doi: 10.1214/ 07-PS119. Jinshuo Dong, Aaron Roth, and Weijie J Su. Gaussian differential privacy. In Journal of the Royal Statistical Society: Series B, pages 1 35, 2021. Richard Durrett. Probability: theory and examples. Duxbury Press, Belmont, CA, second edition, 1996. ISBN 0-534-24318-5. Rick Durrett. Probability: theory and examples, volume 49. Cambridge university press, 2019. Cynthia Dwork and Jing Lei. Differential privacy and robust statistics. In Proceedings of the Forty-first Annual ACM Symposium on Theory of Computing, pages 371 380, 2009. Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4):211 407, 2014. Cynthia Dwork and Guy N. Rothblum. Concentrated differential privacy. Co RR, abs/1603.01887, 2016. Cynthia Dwork, Krishnaram Kenthapadi, Frank Mc Sherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 486 503. Springer, 2006a. Cynthia Dwork, Frank Mc Sherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Shai Halevi and Tal Rabin, editors, Theory of Cryptography, pages 265 284, Berlin, Heidelberg, 2006b. Springer Berlin Heidelberg. Cynthia Dwork, Guy N Rothblum, and Salil Vadhan. Boosting and differential privacy. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 51 60. IEEE, 2010. Fully-Adaptive Composition in Differential Privacy Vitaly Feldman and Tijana Zrnic. Individual privacy accounting via a R enyi filter. Advances in Neural Information Processing Systems, 2021. Vincent Guingona, Alexei Kolesnikov, Julianne Nierwinski, and Avery Schweitzer. Comparing approximate and probabilistic differential privacy parameters. Information Processing Letters, page 106380, 2023. Steven R. Howard, Aaditya Ramdas, Jon Mc Auliffe, and Jasjeet Sekhon. Time-uniform Chernoff bounds via nonnegative supermartingales. Probability Surveys, 17:257 317, 2020. doi: 10.1214/18-PS321. Steven R. Howard, Aaditya Ramdas, Jon Mc Auliffe, and Jasjeet Sekhon. Time-uniform, nonparametric, nonasymptotic confidence sequences. The Annals of Statistics, 49 (2):1055 1080, 2021. doi: 10.1214/20-AOS1991. Peter Kairouz, Sewoong Oh, and Pramod Viswanath. The composition theorem for differential privacy. In International Conference on Machine Learning, pages 1376 1385. PMLR, 2015. Shiva P Kasiviswanathan and Adam Smith. On the semantics of differential privacy: A Bayesian formulation. Journal of Privacy and Confidentiality, 6(1), 2014. Emilie Kaufmann and Wouter M Koolen. Mixture martingales revisited with applications to sequential tests and confidence intervals. Journal of Machine Learning Research, 22(246):1 44, 2021. Antti Koskela, Marlon Tobaben, and Antti Honkela. Individual privacy accounting with gaussian differential privacy. Co RR, abs/2209.15596, 2022. doi: 10.48550/ar Xiv. 2209.15596. URL https://doi.org/10.48550/ ar Xiv.2209.15596. Mathias L ecuyer. Practical privacy filters and odometers with R enyi differential privacy and applications to differentially private deep learning. ar Xiv Preprint ar Xiv:2103.01379, 2021. Katrina Ligett, Seth Neel, Aaron Roth, Bo Waggoner, and Steven Z Wu. Accuracy first: Selecting a differential privacy level for accuracy constrained erm. Advances in Neural Information Processing Systems, 30, 2017. Ilya Mironov. R enyi differential privacy. In 2017 IEEE 30th Computer Security Foundations Symposium (CSF), pages 263 275. IEEE, 2017. Jack Murtagh and Salil Vadhan. The complexity of computing the optimal composition of differential privacy. In Theory of Cryptography Conference, pages 157 175. Springer, 2016. Nicolas Papernot and Thomas Steinke. Hyperparameter tuning with renyi differential privacy. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. Open Review.net, 2022. URL https://openreview.net/forum? id=-70L8lpp9DF. Herbert Robbins. Statistical methods related to the law of the iterated logarithm. The Annals of Mathematical Statistics, 41(5):1397 1409, 1970. Ryan M Rogers, Aaron Roth, Jonathan Ullman, and Salil Vadhan. Privacy odometers and filters: pay-as-you-go composition. In Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016. Adam D. Smith and Abhradeep Thakurta. Fully adaptive composition for gaussian differential privacy. Co RR, abs/2210.17520, 2022. doi: 10.48550/ar Xiv. 2210.17520. URL https://doi.org/10.48550/ ar Xiv.2210.17520. Jean Ville. Etude critique de la notion de collectif. Bull. Amer. Math. Soc, 45(11):824, 1939. Yuqing Zhu, Jinshuo Dong, and Yu-Xiang Wang. Optimal accounting of differential privacy via characteristic function. In International Conference on Artificial Intelligence and Statistics, pages 4782 4817. PMLR, 2022. A. Measure-Theoretic Formalism Below, we provide some measure-theoretic formalisms and details regarding datasets and neighboring relations. Neighboring Datasets: Roughly speaking, an algorithm is differentially private if it difficult to distinguish between output distributions when the algorithm is run on similar inputs. In general, this notion of similarity amongst inputs is defined as a neighboring relation between elements on the input space X. In particular, if two inputs (also referred to as datasets or databases) x, x X satisfy the neighboring relation x x , the we say x and x are neighbors. There are several canonical examples of neighboring relations on the space of inputs X. One example is where X = Xn for some data domain X. The data domain can be viewed as the set of all possible individual entries for a dataset, and the space Xn correspondingly contains all possible n element datasets. In this setting, databases x, x X may be considered neighbors if x and x differ in exactly one entry. Another slightly more general setting is when X = X , i.e., all possible datasets of finite size. In this situation, the earlier notion of neighboring still makes sense. However, in addition, we may say input datasets x and x are neighbors if x can be obtained from x by either adding Fully-Adaptive Composition in Differential Privacy or deleting an element. This is a very natural notion of neighboring, as under such a relation an algorithm would be differentially private if it were difficult to determine the presence or absence of an individual. Our work is agnostic to the precise choice of neighboring relation. As such, we choose to leave the notion as general as possible. Algorithms and Random Variables: We will consider algorithms as randomized mappings A : X Y taking inputs from X to some output space Y. To be fully formal, we consider the output space Y as a measurable space (Y, G), where G is some σ-algebra denoting possible events. Recall that a σ-algebra S for a set S is simply a subset of 2S containing S and that is closed under countable union, intersection, and complements. When we say A is an algorithm having inputs in some space X, we really mean A(x) is a Y-valued random variable for any x X. The space X need not have an associated σ-algebra, as algorithm inputs are essentially just indexing devices. Given a sequence of algorithms (An)n 1, (An(x))n 1 is a sequence of Y-valued random variables, for any x X.5 Since we are dealing with the composition of algorithms, we write A1:n(x) as shorthand for the random vector of the first n algorithm outputs, i.e. A1:n(x) = (A1(x), . . . , An(x)). Formally, the random vector A1:n(x) takes output values in the product measurable space (Yn, G n) where G n denotes the n-fold product σ-algebra of G with itself. Likewise, since the number of algorithm outputs one views in fully-adaptive composition may be random, if N is a random time (i.e. a N-valued random variable), we will often consider the random vector A1:N(x) = (A1(x), . . . , AN(x)). Filtrations and Stopping Times: Since privacy composition involves sequences of random outputs, we will use the measure-theoretic notion of a filtration. If we have fixed an input x X, we can assume the random sequence (An(x))n 1 is defined on some probability space (Ω, F, P). Given such a probability space, a filtration (Fn)n N of F is a sequence of σ-algebras satisfying: (i) Fn Fn+1 for all n N, and (ii) Fn F for all n N. Given an arbitrary Y-valued discrete-time stochastic process (Xn)n 1, it is often useful to consider the natural filtration (Fn)n N given by Fn := σ(Xm : m n) and F0 = { , Ω}. Intuitively, a filtration formalizes the notion of accumulating information over time. In particular, in the context of the natural filtration generated by a stochastic process, the nth σ-algebra in the filtration Fn essentially represents the entirety of information contained in the first n random variables. In other words, if one is given Fn, they would know all pos- 5Even if algorithms have different types of outputs (maybe some algorithms have categorical outputs while others output realvalued vectors), Y can still be made appropriately large to contain all possible outcomes. sible events/outcomes that could have occurred up to and including timestep n. Lastly, we briefly mention the notion of a stopping time, as this measure-theoretic object is necessary to define privacy filters. Given a filtration (Fn)n N, a random time N is said to be a stopping time with respect to (Fn)n N if, for any n, the event {N n} Fn. In words, a random time N is a stopping time if given the information in Fn we can determine whether or not we should have stopped by time n. Stopping times are essential to the study of fullyadaptive composition, as a practitioner of privacy will need to use the adaptively selected privacy parameters to determine whether or not to stop interacting with the underlying sensitive database. B. Martingale Inequalities In this appendix, we provide a thorough exposition into the concentration inequalities leveraged in this paper. First, at the heart of supermartingale concentration is Ville s inequality (Ville, 1939), which can be viewed as a time-uniform version of Markov s inequality. Lemma 4 (Ville s Inequality (Ville, 1939)). Let (Xn)n N be a nonnegative supermartingale with respect to some filtration (Fn)n N. Then, for any confidence parameter δ (0, 1), we have P n N : Xn EX0 We do not directly leverage Ville s inequality in this work, but all inequalities we use can be directly proven from Lemma 4 (Howard et al., 2020; 2021). In short, each inequality in this supplement is proved by carefully massaging a martingale of interest into a non-negative supermartingale. Another useful tool we will leverage is Doob s optional stopping theorem. Lemma 5 (Optional stopping theorem (Durrett, 1996)). Let (Xn)n N be a nonnegative supermartingale with respect to some filtration (Fn)n N. Then E [Xτ] E [X0] for all stopping times τ that are potentially infinite. For our alternative proof of the privacy filter (in Section D), we leverage the following special case of a recent advance in time-uniform martingale concentration (Howard et al., 2020). The following Theorem 4 is just a special case of the main result in Howard et al. (2020), and we include the proof for completeness. When we say a random variable X is σ2-sub Gaussian conditioned on some sigma-algebra G, we mean that, for all λ 0, E eλX | G eλ2σ2/2. In particular, if X is σ2-sub Gaussian as above, this does not imply that X is σ-sub Gaussian (because the condition is only assumed for λ 0). In general, X can have different behaviors in its left and right tail, see for example the Fully-Adaptive Composition in Differential Privacy discussion of the differing tails of the empirical variance of Gaussians in Howard et al. (2021). Theorem 4. Let (Mn)n N be a martingale with respect to some filtration (Fn)n N such that M0 = 0 almost surely. Moreover, let (σn)n 1 be a (Fn)n N-predictable sequence of random variables such that, conditioned on Fn 1, Mn := Mn Mn 1 is σ2 n-sub Gaussian. Define Vn := P m n σ2 m. Then, we have, for all a, b > 0, P n N : Mn b Proof of Theorem 4. Let (Mn)n N be the martingale listed in the theorem statement. Observe that, for any a, b > 0, the process (Xn)n N given by is a non-negative supermartingale. As such, applying Ville s inequality (Lemma 4) yields P n N : Xn > exp b2 Now, on such event, taking logs and rearranging yields Multiplying both sides by a b finishes the proof. The predictable process (Vn)n N is a proxy for the accumulated variance of (Mn)n N up to any fixed point in time. In particular, the process (Vn)n N can be thought of as yielding the intrinsic time of the process. The free parameters a and b thus allow us to optimize the tightness of the boundary for some intrinsic moment in time. This is ideal for us, as, for the sake of composition, the target privacy parameter ϵ can guide us in finding a point in intrinsic time (that is, in terms of the process (Vn)n N) to optimize for. We discuss how to apply this inequality to prove privacy composition results both in this supplement and in Section 3. We also leverage the following martingale inequalities from Howard et al. (2021) in Section 4, where we construct various families of time-uniform bounds on privacy loss in fully-adaptive composition. These inequalities take on a more complicated form than Theorem 4, but we explain the intuition behind them in the sequel. The first bound we present relies on the method of mixtures for martingale concentration, which stems back to Robbins work in the 1970s (Robbins, 1970). There are many good resources providing an introduction to the method of mixtures (de la Pe na et al., 2007; Kaufmann and Koolen, 2021; Howard et al., 2021). Theorem 5. Let (Mn)n N be a martingale with respect to some filtration (Fn)n N such that M0 = 0 almost surely. Moreover, let (σn)n 1 be a (Fn)n N-predictable sequence of random variables such that, conditioned on Fn 1, Mn := Mn Mn 1 is σ2 n-sub Gaussian. Define Vn := P m n σ2 m and choose a tuning parameter γ > 0. Then, for any δ > 0, we have v u u t2(Vn + γ) log The next inequality relies on the recent technique of boundary stitching, first presented in Howard et al. (2021). Intuitively, the technique works by breaking intrinsic time that is, time according to the accumulated variance process (Vn)n N into roughly geometrically spaced pieces. Then, one optimizes a tight-boundary in each region and takes a union bound. The actual details are more technical, but are not needed in this work. Theorem 6. Let (Mn)n N be a martingale with respect to (Fn)n N such that M0 = 0 almost surely. Moreover, let (σn)n 1 be a (Fn)n N-predictable sequence of random variables such that, conditioned on Fn 1, both Mn := Mn Mn 1 and Mn are σ2 n-sub Gaussian. Define Vn := P m n σ2 m and choose a starting intrinsic time v0 > 0. Then, for any δ (0, 1), we have n N : Mn 1.7 log log 2Vn + .72 log 5.2 Note that the original version of Theorem 6 as found in Howard et al. (2021) has more free parameters to optimize over, but we have already simplified the expression to make the result more readable. The free parameter v0 > 0 in the above boundary gives the intrinsic time at which the boundary becomes non-trivial (i.e., the tightest available upper bound before Vn v0 is ). We qualitatively compare these bounds in Section 4, wherein we construct various time-uniform bounds on privacy loss processes. For now, Theorem 4 can be thought of as providing a tight upper bound on a martingale at a single point in intrinsic time, providing loose guarantees elsewhere. On the other hand, Theorems 5 and 6 provide decently tight control over a martingale at all points in intrinsic time simultaneously, although at the cost of sacrificing tightness at any given fixed point. Fully-Adaptive Composition in Differential Privacy C. Details in Proof of Approx-z CDP Filter C.1. Equivalence of Approximate z CDP Definitions We will show that our definition of approximate z CDP is equivalent to the original definition of approximate z CDP due to Bun and Steinke (2016). Let us first restate their definition as a condition on a private algorithm A. Condition 1 (Original definition of Bun and Steinke (2016)). For any neighboring datasets x, x , there exist events E and E such that for all λ 1, Dλ(A(x) | E A(x ) | E ) ρλ, Dλ(A(x ) | E A(x) | E) ρλ, P(A(x) E) 1 δ, and P(A(x ) E ) 1 δ. Our definition is adapted from the approximate R enyi differential privacy definition due to Papernot and Steinke (2022). We restate the (unconditional) definition below. Condition 2 (Adapted from Papernot and Steinke (2022)). For any neighboring datasets x, x , there exist distributions P , P , Q , Q such that the outputs are distributed according to the following mixture distributions: A(x) (1 δ)P + δP , A(x ) (1 δ)Q + δQ with for all λ 1, Dλ(P Q ) ρλ and Dλ(P Q ) ρλ. Theorem 7. Conditions 1 and 2 are equivalent. Proof of Theorem 7. Fix any neighbors x, x . Suppose an algorithm A satisfies Condition 1 for some events E, E . Then we could let P and Q be the conditional distributions P(A(x) | A(x) E) and P(A(x ) | A(x ) E ) respectively. Then let P(A(x) | A(x) Ec)P(A(x) Ec) + P ( ) (P(A(x) E) (1 δ)) , P(A(x ) | A(x ) E c)P(A(x ) E c) + Q ( ) (P(A(x ) E ) (1 δ)) . Then A(x) is distributed according to the mixture (1 δ)P + δP , and A(x ) is distributed according to the mixture (1 δ)Q + δQ . Thus, A also satisfies condition 2 given that Dλ(P Q ) λρ and Dλ(Q P ) λρ by our assumption of Condition 1. Now suppose A satisfies Condition 2 for some pairs of distributions (P , P ) and (Q , Q ). Then we can view the output distribution of A(x) as generating a Bernoulli random variable C such that with probability (1 δ), C = 1 and A(x) draws an outcome from P and with probability C = 0 and A(x) draws an outcome from P . Similarly, we can view A(x ) as flipping a coin C such that A(x ) draws an outcome from Q when C = 1. Then letting the events E be all the randomness of A(x) such that C = 1 and E be all the randomness of A(x ) such that C = 1 satisfies condition 1. C.2. Missing Proofs Lemma 6. The process {Xn}n 1 defined in (10) is a P - nonnegative supermartingale with respect to (Fn(x))n N. Proof of Lemma 6. For any t 0, EP [Xt+1 | Ft(x)] (λ 1) log P t+1(At+1(x) | Ft(x)) Q t+1(At+1(x) | Ft(x)) λ(λ 1)ρt+1(x) " P t+1(At+1(x) | Ft(x)) Q t+1(At+1(x) | Ft(x)) (λ 1) | Ft(x) exp( λ(λ 1)ρt+1(x)) Xt exp(λ(λ 1)ρt+1(x)) exp( λ(λ 1)ρt+1(x)) where the last inequality follows from the Renyi divergence bound due to approximate z CDP. Lemma 7. Consider measures P and Q defined in (8). Their R enyi divergence satisfies Dλ P (A1:N(x)(x)) Q (A1:N(x)(x)) ρλ. Proof of Lemma 7. By the definition of Xn and that EP [XN(x)] EP [X0] = 1, we have EA1:N(x)(x) P exp (λ 1)MN(x) 1 log P m(Am(x) | Fm 1(x)) Q m(Am(x) | Fm 1(x)) " P (A1:N(x)) Q (A1:N(x)) m N(x) ρm(x) By the definition of stopping time N, we have P m N(x) ρm(x) ρ, which implies the stated Renyi divergence bound. Fully-Adaptive Composition in Differential Privacy Lemma 8. Let likelihood functions P, Q, P , Q be defined in (3), (4), and (8). Then there exists likelihood functions P and Q such that P = (1 δ)P + δP , Q = (1 δ)Q + δQ . Proof of Lemma 8. We will show the decomposition for P, and the proof follows identically for the decomposition of Q. First, we can express the likelihood P(A1:N(x)(x)) as follows: P(A1:N(x)(x)) = n=1 P(An(x) | Fn 1(x)) (1 δn(x))P n(An(x) | Fn 1(x)) + δn(x)P n (An(x) | Fn 1(x)) S {1,...,N(x)} w S f S(A1:N(x)(x)) f S(A1:N(x)(x)) := Y n S P n (An(x) | Fn 1(x)) Y n N(x),n/ S P n(An(x) | Fn 1(x)) and w S = Q n S δn(x) Q n N\S(1 δn(x)) . Note that each f S is a likelihood of the stopped process A1:N(x)(x) under input data set x, and f = P (A1:N(x)(x)). Thus, it suffices to show that w 1 δ almost surely. To see this, we have n N(x) (1 δn(x)) 1 X n N(x) δn(x) 1 δ. D. An Alternative Proof for Theorem 2 As a first step in our alternative proof of Theorem 2, it is easier to consider the case where each algorithm An satisfies conditional (ϵn, δn)-p DP, as this condition provides a highprobability bound on the privacy loss. This allows us to use the martingale machinery in Appendix B to prove tight composition results. Lemma 9. Theorem 2 holds under the stronger assumption that, for any n 1, An is (ϵn, δn)-p DP conditioned on A1:n 1. Before we can prove Lemma 9, we need to following bound on the conditional expectation of privacy loss, which can be immediately obtained from the bound on expected privacy loss presented in Bun and Steinke (2016). Lemma 10 (Proposition 3.3 in Bun and Steinke (2016)). Suppose A and B are algorithms such that A is ϵdifferentially private conditioned on B. Then, for any input dataset x X and neighboring dataset x x, we have that E (L(x, x )|B(x)) 1 2 (ϵ(B(x)))2 . Now, we prove Lemma 9. Proof of Lemma 9. To begin, we assume that the algorithms (An)n 1 satisfy (ϵn, 0)-p DP conditioned on A1:n 1. We will show how to alleviate this assumption on the approximation parameter in the second half of the proof. Fix an input database x X. For convenience, we denote by (Fn(x))n N the natural filtration generated by (An(x))n 1. Since we have fixed x X, for notational simplicity, we write ϵn for the random variable ϵn(A1:n 1(x)) and define δn similarly. Additionally, by N we mean the stopping time N((ϵn)n N, (δn)n N). Recall that we have already argued that, for any neighboring dataset x x, the process Mn := Mn(x, x ) = L1:n(x, x ) X m n E Lm(x, x )|Fm 1(x) is a martingale with respect to (Fn(x))n N. Further observe that its increments Mn := Ln(x, x ) E (Ln(x, x )|Fn 1(x)) are ϵ2 n-sub Gaussian conditioned on Fn 1(x). Thus, by Theorem 4, we know that, for any b, a > 0, we have P n N : Mn b where the process (Vn)n N given by Vn := P m n ϵ2 m is the accumulated variance up to and including time n. Thus, it suffices to optimize the free parameters a and b to prove the result. To do this, consider the following function f : R 0 R 0 given by Clearly, f is a quadratic polynomial in y which is strictly increasing. In particular, one can readily check that solves the equation f(y) = ϵ, where ϵ > 0 is the target privacy parameter. As such, setting a := y and b := q Fully-Adaptive Composition in Differential Privacy Furthermore, expanding the definition of (Mn)n N, we see that for the selected parameters the parameters yield, with probability at least 1 δ , for all n N we have: L1:n(x, x ) b m n E Lm(x, x ) | Fm 1 m n ϵ2 m + 1 m n ϵ2 m + 1 Thus, we have proven the desired result in the case where all algorithms have δn = 0. Now, we show how to generalize our result to the case where the approximation parameters δn are not identically zero. Define the events A := { n N : L1:n(x, x ) > ϵ} , and B := { n N : Ln(x, x ) > ϵn} . Our goal is to show that, with N defined as in the statement of Theorem 2, that P(A) δ. Simply using Bayes rule, we have that P(A) = P(A Bc)+P(A B) P(A|Bc)+P(B) δ +P(B), where the second inequality follows from our alreadycompleted analysis in the case that δn = 0. Now, we show that P(B) δ , which suffices to prove the result as we have, by assumption, δ = δ + δ . Define the modified privacy loss random variables ( e Ln(x, x ))n N by e Ln(x, x ) := ( Ln(x, x ) n N 0 otherwise . Likewise, define the modified privacy parameter random variables eϵn and eδn in an identical manner. Then, we can bound P(B) in the following manner: P( n N : Ln(x, x ) > ϵn) = P n N : e Ln(x, x ) > eϵn n=1 P e Ln(x, x ) > eϵn = n=1 EP e Ln(x, x ) > eϵn|Fn 1 n=1 Eeδn = E Thus, we have have proven the desired result in the general case. Our key insight above is to view filters as functions of the intrinsic time determined by privacy parameters, P m n ϵ2 m. Lemma 9 can also be obtained leveraging the analysis for R enyi filters (Feldman and Zrnic, 2021). However, our approach to proving Theorem 2 has the advantage that it does not require reductions between different modes of privacy. While Lemma 10, which bounds expected privacy loss, does require some complicated analysis, we only ever need to apply Lemma 9 to instances of randomized response, in which case computing the privacy loss bound is trivial. We now use Lemma 9 to prove Theorem 2. Recall that Lemma 2 shows that algorithms that satisfy p DP also satisfy DP, but the converse is not true and may require a conversion cost. To avoid this cost, we define following generalization of randomized response. Definition 7 (Conditional Randomized Response). Let R := {0, 1, , } and 2R be the corresponding power set of R. Then, R taking inputs in {0, 1} to outputs in the measurable space (R, 2R) is an instance of (ϵ, δ)-randomized response if, for b {0, 1}, R(b) outputs the following: b with probability (1 δ) eϵ 1+eϵ 1 b with probability (1 δ) 1 1+eϵ with probability δ if b = 1 with probability δ if b = 0. More generally, suppose B : {0, 1} Z is a randomized algorithm. For functions ϵ, δ : Z R 0, we say R is an instance of (ϵ, δ)-randomized response conditioned on B if, for any true input b {0, 1} and hypothesized alternative b {0, 1}, the conditional probability P(R(b) |B(b ) = z) is the same as the law of (ϵ(z), δ(z))-randomized response with input bit b. Conditional (ϵ, δ)-randomized response satisfies both conditional (ϵ, δ)-DP and conditional (ϵ, δ)-p DP. We will leverage the fact that it satisfies both privacy definitions with the same parameters. A surprising result in the nonadaptive setting is that any (ϵ, δ)-DP algorithm can be viewed as a randomized post-processing of (ϵ, δ)-randomized response (Kairouz et al., 2015). We generalize this result to the adaptive conditional setting below. In the language of Blackwell s comparison of experiments (Blackwell, 1953), instances of randomized response are sufficient for instances of arbitrary DP algorithms, and we prove that the same is true for conditional randomized response and conditionally DP algorithms. In what follows, by a transition kernel ν, we mean that for any b Z and r R, ν( , r | b) is a probability measure on (Y, G). Lemma 11 (Reduction to Conditional Randomized Response). Let A and B map from X to measurable spaces (Y, G) and (Z, H), respectively. Suppose A is (ϵ, δ)- differentially private conditioned on B. Fix neighbors Fully-Adaptive Composition in Differential Privacy x0, x1 X, and let R be an instance of (ϵ, δ)-randomized response conditioned on B , where B : {0, 1} Z is the restricted algorithm satisfying B (b) = B(xb). Then, there is a transition kernel ν : G R Z [0, 1] such that, for all b, b {0, 1}, P (A(xb) | B (b )) = νb,b , where νb,b = E (ν( , R(b) | B (b )) | B (b )).6 Lemma 11 tells us that the conditional distribution obtained by averaging the kernel ν( , R(b) | B (b )) over the randomness in R(b) matches the conditional distribution of A(xb). To prove Lemma 11, first recall the important fact that any differentially private algorithm can be viewed as a post-processing of randomized response (Kairouz et al., 2015), as stated in Lemma 12 below. Lemma 12 (Reduction to Randomized Response (Kairouz et al., 2015)). Let algorithm A : X Y be (ϵ, δ)-DP. Let R be an instance of (ϵ, δ)-randomized response. Then, for any neighbors x0, x1 X, there is a transition kernel ν : G R [0, 1] such that for b {0, 1}, we have P(A(xb) ) = νb, where7 νb = Eν( , R(b)). In Lemma 11 of Section 3, we generalized Lemma 12 to the case of conditional differential privacy. To do this, we introduced conditional randomized response in Definition 7. In conditional randomized response, on the event {B = z}, the conditional laws of R(0) and R(1) just become that of regular randomized response with some known privacy parameters ϵ(z) and δ(z). We now prove Lemma 11. Proof of Lemma 11. Let b, b {0, 1} be arbitrary. For any outcome {B (b ) = z}, let Pz(A(xb) ) be the probability measure P(A(xb) |B (b ) = z). In particular, this measure does not depend on the input bit b . By the assumptions of conditional differential privacy (Definition 1), it follows that under the probability measure Pz, A(xb) is (ϵ(z), δ(z))-differentially private. Moreover, it also follows that R is an instance of (ϵ(z), δ(z))-randomized response 6By νb,b ( ) := E (ν( , R(b) | B (b )) | B (b )), we mean that νb,b is the (random) averaged probability measure: νb,b ( ) = P(R(b) = 1 | B (b ))ν( , 1 | B (b )) + P(R(b) = 0 | B (b ))ν( , 0 | B (b )) + P(R(b) = | B (b ))ν( , | B (b )) + P(R(b) = | B (b ))ν( , | B (b )). 7 By νb( ) := Eν( , R(b)), we mean νb is the averaged probability measure given by νb( ) = P(R(b) = 1)ν( , 1) + P(R(b) = 0)ν( , 0) + P(R(b) = )ν( , ) + P(R(b) = )ν( , ). under Pz. Consequently, Lemma 12 yields the existence of a kernel νz such that Pz(A(xb) ) = Ezνz( , R(b)), where the averaged measure is as defined in Footnote 7. Setting ν( , R(b)|z) := νz( , R(b)), we see that P(A(xb) | B (b ) = z) = E (ν( , R(b) | z) | B (b ) = z) , which thus yields P(A(xb) | B (b )) = E (ν( , R(b) | B (b )) | B (b )) , where the conditionally averaged measure is as described in Footnote 6 in the main body of the paper. This proves the desired result. Lastly, before proving Theorem 2, we need the following lemma. This lemma essentially tells us that if A is (ϵ, δ)-p DP conditioned on B, and A is a randomized postprocessing algorithm, then releasing the vector (A, A ) is also (ϵ, δ)-p DP conditioned on B. Note that this is not in contradiction with the converse direction of Lemma 2, as releasing the output of A alone may not satisfy conditional (ϵ, δ)-p DP. But once we observe A, since A is a post-processing, we can gleam no more information about the true underlying dataset. Lemma 13. Suppose A, B are algorithms with inputs in X and outputs in measurable spaces (Y, G) and (Z, H) respectively. Assume A is (ϵ, δ)-p DP conditioned on B. Let (S, S) be a measurable space and suppose µ : S Y Z [0, 1] is a conditional transition kernel. Suppose A : X S is an algorithm satisfying P (A (x) |A(x ) = y, B(x ) = z) = µ( , y | z), (16) for all y Y, z Z, and x, x X. Then, the joint algorithm (A, A ) : X Y S is also (ϵ, δ)-p DP conditioned on B. Proof of Lemma 13. Let x, x X be arbitrary neighboring datasets. Let qx B, qx B be the corresponding conditional joint densities of (A(x), A (x)) and (A(x ), A (x )) given B(x) respectively. Likewise, let px B, px B be the corresponding conditional densities of A(x) and A(x ) respectively conditioned on B(x), and qx B,A, qx B,A the conditional densities of A (x) and A (x ) given A(x) and B(x). Let L(A,A ) B (x, x ) denote the joint privacy loss between (A(x), A (x)) and (A(x ), A (x )) given B(x), while LA B(x, x ) denotes the privacy loss between A(x) and A(x ) given B(x). We have, using Bayes rule, L(A,A ) B (x, x ) = log qx B(A(x), A (x) | B(x)) qx B (A(x), A (x) | B(x)) px B(A(x) | B(x)) px B (A(x) | B(x)) qx B,A(A (x) | B(x), A(x)) qx B,A(A (x) | B(x), A(x)) = log px B(A(x) | B(x)) px B (A(x) | B(x)) = L(A) B (x, x ), Fully-Adaptive Composition in Differential Privacy The first equality on the second line follows from the assumption outlined in Equation (16). More specifically, since we have P (A (x) |A(x), B(x)) = µ( , A(x) | B(x)) = P (A (x ) |A(x), B(x)) , it follows that the conditional densities qx B,A and qx B,A are equal almost surely. Since A is (ϵ, δ)-p DP conditioned on B, the result now follows. We now can prove Theorem 2 using these tools. Proof of Theorem 2. Fix arbitrary neighbors x0, x1 X. Let (Rn)n 1 be a sequence of algorithms such that Rn is an instance of (ϵn, δn)-randomized response conditioned on A 1:n 1 : {0, 1} Yn 1, where A m : {0, 1} Y is the restricted algorithm given by A m(b) := Am(xb), for all m 1. Lemma 11 guarantees the existence of a sequence of transition kernels (νn)n 1, νn : G R Yn 1 [0, 1] such that, for all n 1 and b, b {0, 1}, we have P(A n(b) | A 1:n 1(b )) = ν(n) b,b almost surely. Here, ν(n) b,b is the averaged conditional probability, as defined in terms of νn in Lemma 11 and Footnote 6. This equality means we can find an underlying probability space (i.e. a coupling) such that the random post-processing draws from the kernel νn( , Rn(b) | A 1:n 1(b )) equal A n(b) almost surely, for all n 1. Now, for any n 1, since Rn is an instance of (ϵn, δn)- randomized response conditioned on A 1:n 1, it follows that Rn is in fact (ϵn, δn)-p DP conditioned on A 1:n 1. Moreover, this also implies that Rn is (ϵn, δn)-p DP conditioned on (A 1:n 1, R1:n 1), since, by definition, ϵn and δn only depend on the realizations of R1:n 1 through the outputs of A 1:n 1. By Lemma 13, it follows that for all n 1, the algorithm (Rn, A n) is (ϵn, δn)-p DP conditioned on (R1:n 1, A 1:n 1). Thus, by Lemma 9, it follows that the composed algorithm (R1:N ( )( ), A 1:N ( )( )) is (ϵ, δ)-DP, where N (b) := N(xb) and ϵ, δ and N, are as outlined in the statement of Theorem 2. Lastly, since differential privacy is closed under arbitrary post-processing (Dwork and Roth, 2014), it follows that A 1:N ( )( ) is (ϵ, δ)-differentially private. Since x0 and x1 were arbitrary neighboring inputs, the result follows, i.e. A1:N( )( ) : X Y is (ϵ, δ)-differentially private. E. Proof for Privacy Odometers in Theorem 3 We now show the formal proof for our privacy odometers presented in Theorem 3 in Section 4. Theorem 3. As in the proof of Lemma 9, we first consider the case where δn = 0 for all n 1. In this case, fix an input dataset x X and a neighboring dataset x X. Let (Mn)n N be the corresponding privacy loss martingale as outlined in Equation (14), where we implicitly hide the dependence on x, x , which are fixed. Let (un)n 1 be one of the sequences outlined in the theorem statement, and define Un := un(ϵ1:n, δ1:n) for all n 1, where once again we write ϵn and δn for ϵn(A1:n 1(x)) and δn(A1:n 1(x)) respectively. It follows from Theorems 4, 5, and 6 that P ( n N : Mn > Bn) δ, for Bn = Un 1 2 P m n ϵ2 m. Recalling that Mn = P m n{Lm(x, x ) E(Lm(x, x )|Fn 1(x))} and that E(Ln(x, x )|Fn 1(x)) 1 2ϵ2 n for all n N, it thus follows that P ( n N : L1:n(x, x ) > Un) δ, where (Fn(x))n 1 is again the natural filtration generated by (An(x))n 1. Thus, since x x were arbitrary, we have shown that (un)n 1 is a δ-privacy odometer in the case δn = 0 for all n 1. To generalize to the case where δn may be nonzero, we can apply precisely the same argument used in the second part of the proof of Lemma 9, thus proving the general result. F. An Algorithm Satisfying (ϵ, δ)-DP but not (ϵ, δ)-p DP In this appendix, we construct a simple algorithm taking binary inputs that satisfies (ϵ, δ)-DP but not (ϵ, δ)-p DP. In particular, this provides intuition as to why we conjecture our odometers constructed in Section 4 would not hold under the assumption that the algorithms being composed satisfy (ϵ, δ)-DP in general. To this end, fix a privacy parameter ϵ > 0 and an approximation parameter δ (0, 1). Let A : {0, 1} {0, 1, , } be an instance of (ϵ, δ)-randomized response, and let B : {0, 1} {0, 1} be defined by ( 1 if A(b) {1, }, 0 otherwise. Since differential privacy is closed under arbitrary postprocessing, it follows that the constructed algorithm B is (ϵ, δ)-differentially private. On the other hand, setting x = 1, Fully-Adaptive Composition in Differential Privacy x = 0, we note that on the event {B(1) = 1}, LB(1, 0) = log P(B(1) = 1) P(B(0) = 1) = log P(A(1) = 1) + P(A(1) = ) P(A(0) = 1) + P(A(0) = ) δ + (1 δ) eϵ 1+eϵ (1 δ) 1 1+eϵ = log δ + eϵ Since straightforward calculation yields P(B(1) = 1) = (1 δ) eϵ 1 + eϵ + δ > δ, we see that B does not satisfy (ϵ, δ)-p DP.