# autostep_locally_adaptive_involutive_mcmc__9861ff9b.pdf

Auto Step: Locally adaptive involutive MCMC

Tiange Liu 1 Nikola Surjanovic 1 Miguel Biron-Lattes 1 Alexandre Bouchard-Cˆot e 1 Trevor Campbell 1

Abstract Many common Markov chain Monte Carlo (MCMC) kernels can be formulated using a deterministic involutive proposal with a step size parameter. Selecting an appropriate step size is often a challenging task in practice; and for complex multiscale targets, there may not be one choice of step size that works well globally. In this work, we address this problem with a novel class of involutive MCMC methods Auto Step MCMC that selects an appropriate step size at each iteration adapted to the local geometry of the target distribution. We prove that under mild conditions Auto Step MCMC is π-invariant, irreducible, and aperiodic, and obtain bounds on expected energy jump distance and cost per iteration. Empirical results examine the robustness and efficacy of our proposed step size selection procedure, and show that Auto Step MCMC is competitive with stateof-the-art methods in terms of effective sample size per unit cost on a range of challenging target distributions.

1. Introduction

Markov chain Monte Carlo (MCMC) (Metropolis et al., 1953; Hastings, 1970) is an effective tool for approximating integrals arising in Bayesian inference problems. The performance of MCMC is often sensitive to the choice of tuning parameters in the Markov kernel. In particular, methods that propose a new state followed by an accept/reject step e.g., random-walk Metropolis Hastings (RWMH) (Hastings, 1970), the Metropolis-adjusted Langevin algorithm (MALA) (Rossky et al., 1978), and Hamiltonian Monte Carlo (HMC) (Duane et al., 1987; Neal, 1996) often involve a scalar step size parameter θ 0 that governs the distance of the proposed next state from the current state. Too large a choice of θ results in distant proposals that are

1Department of Statistics, University of British Columbia, Canada. Correspondence to: Trevor Campbell <trevor@stat.ubc.ca>.

Proceedings of the 42 nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025. Copyright 2025 by the author(s).

often rejected, while too small a choice leads to nearby proposals that do not explore the state space quickly; either case results in slow convergence of the chain. For certain multiscale targets (e.g. Bayesian posteriors with scale priors (Polson & Scott, 2012)) there may not even be a single good choice of step size throughout the whole state space.

Existing methods for selecting step size parameters fall generally into three categories: adaptive MCMC, discrepancy minimization, and locally-adaptive kernels. Adaptive MCMC algorithms (Haario et al., 2001; Atchad e, 2006; Andrieu & Thoms, 2008; Marshall & Roberts, 2012) tune the proposal distribution using previous draws from the chain, often targeting a particular acceptance rate derived from high-dimensional asymptotics (Roberts et al., 1997; Roberts & Rosenthal, 1998). Obtaining theoretical guarantees on estimates produced by adaptive MCMC targeting a distribution π is technically difficult and often requires strict conditions on the adaptation process, such as increasingly infrequent adaptation (Chimisov et al., 2018). Discrepancy minimization (Neklyudov et al., 2018; Coullon et al., 2023) involves tuning using a divergence between the empirical distribution of draws and the target π, which requires multiple MCMC runs to estimate the divergence for each candidate step size. Both approaches also identify only a single step size value, which may not be appropriate for the whole state space.

Locally-adaptive kernels, in contrast, select a value for the step size at each iteration based on the current state. Because the step size depends on the current state, these kernels can adapt to the local shape of the target π; and because they depend only on the current state, they are Markovian and one can use standard tools to prove π-invariance. There are many approaches to locally adaptive step size selection in the literature. Mixture kernels with state-dependent weights (Maire & Vandekerkhove, 2022) and delayed-rejection (Tierney & Mira, 1999; Green & Mira, 2001) are both general approaches, but each requires a prespecified maximum number of step sizes to consider at each iteration. There are also numerous methods specific to certain samplers, e.g., HMC and MALA (Girolami & Calderhead, 2011; Nishimura & Dunson, 2016; Kleppe, 2016; Modi et al., 2024; Biron Lattes et al., 2024; Turok et al., 2024), RWMH (see Livingstone, 2021), or slice sampling (Neal, 2003). Of these, the method most related to the present work is auto MALA

Auto Step: Locally adaptive involutive MCMC

(Biron-Lattes et al., 2024), which chooses a step size in MALA using a doubling/halving procedure that targets a Metropolis Hastings acceptance ratio in a randomized range (a, b) [0, 1]. While the method was shown to be πinvariant, it was crucially not shown to be either irreducible or aperiodic. In practice, Auto MALA indeed can get stuck in certain states, especially when the initialization point is far from the mass of the target distribution, as is commonly the case in Bayesian inference; to address this, Auto MALA requires inexact unadjusted steps (see Algorithm 3 in Biron Lattes et al. (2024)).

In this work, we develop a novel method, Auto Step MCMC, for locally-adaptive step size selection in the broad class of involutive MCMC methods (Tierney, 1998; Andrieu et al., 2020; Neklyudov et al., 2020). We show that these Markov kernels are π-invariant, irreducible, and aperiodic under mild assumptions on the target distribution, thereby substantially generalizing and improving upon Auto MALA. We further provide bounds on energy jump distances and expected cost per iteration, demonstrating the robustness of the method to the setting of an initial step size parameter. Empirical results confirm the theory and demonstrate that Auto Step is stable, reliable, and competitive with other adaptive methods. Proofs of all theoretical results are provided in Appendix A.

Note that concurrent work on Gibbs self-tuning (Bou-Rabee et al., 2024b;a) introduces the same general technique for locally-adaptive involutive MCMC, proves π-invariance, and develops specific adaptation schemes for HMC and NUTS (Neal, 2011; Hoffman & Gelman, 2014). In this work, we provide an analysis of π-invariance, irreducibility, aperiodicity, and expected cost and performance, with a specific focus on involutions parametrized by a step size.

2. Background

Let π be a given target probability distribution on an open subset X Rd. With a slight abuse of notation, we assume that π admits a density π(x) with respect to the Lebesgue measure on Rd, and that we can evaluate a function γ(x) π(x) pointwise so that

π(x) = γ(x) R γ(x)dx,

where R γ(x)dx is the unknown normalizing constant.

Involutive Markov Chain Monte Carlo (Tierney, 1998; Andrieu et al., 2020; Neklyudov et al., 2020) is an MCMC method that uses involutions, i.e., functions f where f 1 = f, to generate new proposals. While there are many possible variations of involutive MCMC, in this work we use the following formulation. Fix a distribution m on an open subset Z Rp with density m(z) with respect to the

Lebesgue measure, and a family of differentiable involutions fθ : X Z X Z parametrized by θ Θ. Then, starting from a state xt, we draw zt m and the proposal

x t+1, z t+1 = fθ(xt, zt). (1)

We set the next state to xt+1 = x t+1 with probability

min{1, exp(ℓ(xt, zt, θ))}, (2)

ℓ(xt, zt, θ) = log π(x t+1)m(z t+1) π(xt)m(zt) | fθ(xt, zt)| ,

and otherwise set it to xt+1 = xt. The sequence xt is a Markov chain and has stationary distribution π if fθ is continuously differentiable (Tierney, 1998, Thm. 2).

Choosing different families of involutions {fθ} and auxiliary distributions m yields different MCMC algorithms. For example, random walk Metropolis Hastings (RWMH) (Hastings, 1970) with step size θ > 0 and mass matrix M is obtained by setting

fθ(x, z) = (x + θM 1z, z) m = N(0, M). (3)

The Metropolis-adjusted Langevin algorithm (MALA) (Rossky et al., 1978) with step size θ > 0 and mass matrix M is obtained by setting

fθ(x, z) = (x , z ) m = N(0, M),

where x , z are computed via the leapfrog map

x x + θM 1z1/2

2 log π(x ).

Finally, Hamiltonian Monte Carlo (HMC) (Duane et al., 1987; Neal, 1996) with step size θ > 0, mass matrix M, and path length L is obtained by setting

fθ(x, z) = (x , z ) m = N(0, M),

where x , z are computed via L leapfrogs.

Many involutive MCMC methods including the above three examples have a positive scalar tuning parameter θ > 0 that can be interpreted as a form of step size : larger values result in more distant proposals, while smaller values result in nearby proposals. Too large a choice of θ results in many rejected proposals, while too small a choice results in proposals that are accepted but explore the state space slowly. Furthermore, there may not be a single choice of θ that applies globally, e.g., in the case of multiscale targets (Polson & Scott, 2012). This work resolves this problem by selecting an appropriate θ at each iteration depending on the local behaviour of the augmented target π m.

Auto Step: Locally adaptive involutive MCMC

Algorithm 1 One iteration of Auto Step MCMC Require: Initial x with π(x) > 0, target π, auxiliary distribution m, step size distribution η, involutions {fθ}θ Θ 1: z m auxiliary refreshment 2: (a, b) Unif( ) soft acceptance bounds 3: θ η(dθ | x, z, a, b) refresh tuning parameter 4: s (x, z, a, b, θ) augmented state 5: s f(s) involutive proposal 6: U Unif[0, 1]

7: if U min n 1, π(s )

π(s) J(s) o then

8: return x accept 9: else 10: return x reject 11: end if

3. Auto Step MCMC

In this section, we develop Auto Step MCMC, a family of modified involutive MCMC methods that automatically select appropriate tuning parameter values at each iteration. The key technique in developing Auto Step MCMC is to formulate the sampler on an augmented space that includes the tuning parameter θ Θ as well as other auxiliary quantities. For a given family of continuously differentiable involutions {fθ : θ Θ} on X Z, define the augmented space S as

where X Z is the original augmented space for involutive MCMC, := a, b (0, 1)2 : a < b is a set of acceptance ratio thresholds (a, b), and Θ is the set of tuning parameters θ for the involutions. We assume Θ is a standard Borel space, such that S is standard Borel as well. Let f : S S denote the augmented involution fθ: for a point s = (x, z, a, b, θ) S, define

f(s) = (fθ(x, z), a, b, θ) J(s) = | fθ(x, z)|.

Note that f is itself an involution on S. We then define the augmented target density

π(s) = 2π(x) m(z) 1 (a, b) η(θ | x, z, a, b),

where we assume that π(x) and m(z) are with respect to the Lebesgue measure on X Z, that both m and η admit i.i.d. draws, and that there exists a σ-finite measure dθ on Θ such that for all x X, z Z and (a, b) , η( | x, z, a, b) is a density with respect to dθ, but otherwise may depend arbitrarily on x, z, a, b. The X-marginal of π is the target of interest, π.

Given this setup, starting from xt X, Auto Step MCMC (Algorithm 1) consists of the following steps:

1. Auxiliary refreshment: Draw

zt m and (at, bt) Unif( ).

2. Tuning parameter refreshment: Draw

θt η(dθ | xt, zt, at, bt).

3. Proposal: Set st = (xt, zt, at, bt, θt) and

s t+1 = f(st) = (x t+1, z t+1, a t+1, b t+1, θ t+1).

4. Accept: Set xt+1 = x t+1 with probability

min 1, π(s t+1) π(st) J(st) ,

and otherwise set xt+1 = xt.

We can recover standard involutive MCMC by setting η(dθ|x, z, a, b) = δθ0(dθ) for some fixed θ0 Θ. Further, choosing different involution families {fθ : θ Θ} and auxiliary distributions m on Z recovers variants of common algorithms, e.g., RWMH, MALA, and HMC. The major improvement is that the chain may draw the tuning parameter θ Θ automatically at each step in a manner that depends on the current state (xt, zt, at, bt). The key design choice, then, is to select a conditional tuning refreshment distribution η that yields values of θ that are well-adapted to the local shape of the target π. In this work, we focus on the design of the conditional tuning refreshment distribution η in the case where θ is a step size parameter (with Θ = R+). However, the Auto Step MCMC method described previously has the correct stationary distribution for more general parameter spaces Θ (see Proposition 4.2). Note also that the above approach of sampling a tuning parameter θ from a conditional distribution η given the augmented state was also introduced in concurrent work on Gibbs self-tuning (Bou-Rabee et al., 2024b;a), although the design of η in Section 3.1 differs.

3.1. Step size selection

We now focus on the design of the tuning refreshment distribution η when θ is a step size parameter with Θ = R+. Intuitively, a good choice of θ should yield a proposal for which the acceptance ratio exp(ℓ(x, z, θ)) of the original involutive method is not too close to either 0 (θ is too large) or 1 (θ is too small). Critically, this should also be true for exp( ℓ(x, z, θ)), which is the acceptance ratio in the reverse direction

ℓ(fθ(x, z), θ) = ℓ(x, z, θ).

To avoid setting arbitrary fixed bounds on ℓ, we use random a, b as thresholds and ensure that |ℓ| roughly tries to fall in the range (| log(b)|, | log(a)|). More precisely, given a fixed initial step size θ0 > 0, we propose setting the step size θ to

θ = θ0 2µ(x,z,a,b),

Auto Step: Locally adaptive involutive MCMC

Algorithm 2 Step size selector µ Require: state x, z, a, b, initial step size θ0.

1: θ θ0 2: ℓ ℓ(x, z, θ) 3: v 1{|ℓ| < | log b|} 1{|ℓ| > | log a|} 4: j = 0 number of doublings/halvings 5: if v = 0 then 6: return j 7: end if 8: while true do 9: j j + v 10: θ θ0 2j

11: ℓ ℓ(x, z, θ) 12: if v = 1 and |ℓ| | log b| then 13: return j 1 Require final halving 14: else if v = 1 and |ℓ| | log a| then 15: return j 16: end if 17: end while

where µ is the step size selector function

µ(x, z, a, b) = ( min{j Z+ : |ℓ(x, z, θ02j)| | log b|} 1, |ℓ0| < | log b| max{j Z : |ℓ(x, z, θ02j)| | log a|}, |ℓ0| > | log a| 0, otherwise,

and ℓ0 = ℓ(x, z, θ0). Therefore η(dθ|x, z, a, b) is the Dirac measure at µ(x, z, a, b), which has a density with respect to the counting measure on {θ0 2j : j Z}.

The pseudocode for computing µ(x, z, a, b) is given in Algorithm 2. If the initial step size θ0 yields an acceptable |ℓ0|, the function simply returns j = 0. If the initial step size is too large (|ℓ0| > | log a|), j is decreased until |ℓ0| | log a|. And if the initial step size is too small (|ℓ0| < | log b|), j is increased until |ℓ0| > | log b|, and then finally decreased by 1 to avoid poor proposals. Note that this function does not guarantee that | log b| |ℓ| | log a| precisely, but finds a good trade-off by approximately targeting that range while avoiding the need for expensive methods to find values exactly within the bounds.

The step size refreshment was inspired by that of auto MALA (Biron-Lattes et al., 2024), but has two important differences. First, we use symmetric thresholds that check | log b| |ℓ| | log a|, instead of checking log b ℓ log a. This is crucial for ensuring irreducibility of the method (see Section 4), and avoids the sampler getting stuck in the tails or near the mode of the target (see Fig. 1 for empirical results to this effect). Second, we include the step size θ as an augmentation of our state variable, which substantially simplifies theoretical analyses (compare the proof of Proposition 4.2 in Appendix A with the proof of Theorem 3.4 in (Biron-Lattes et al., 2024)).

Algorithm 3 Round-based Auto Step MCMC Require: Initial x0, number of rounds R, target π, auxiliary distribution m, step size distribution η, involutions {fθ}θ Θ.

1: θ0 1, c M I 2: m N(0, Id) 3: for r in 1, 2, ..., R do 4: T 2r Number of iterations 5: η Dirac(θ0 2µ(x,z,a,b))(dθ) 6: for t in 1, 2, ..., T do 7: ξ 1

3Beta(1, 1) Random mixing of the preconditioner

8: M 1/2 i,i ξ c M 1/2 i,i + (1 ξ) 9: m N(0, M)

see definition of µt in Section 3.2 10: xt, µt Auto Step(xt 1, π, m, η, {fθ}θ Θ) 11: end for 12: θ0 θ0 2median(µ1,...,µT )

14: c M diag h Var[x(j) t ]T t=1 id

15: end for 16: return {xt}T t=1

3.2. Round-based tuning

The Auto Step MCMC method has one free parameter: the initial step size θ0. While the method is insensitive to θ0 in that its performance per unit cost shrinks slowly at a rate of roughly O(| log θ0|) (see Figs. 3 and 4 and Corollary 4.10 for empirical and theoretical evidence to that effect), it is still helpful to tune this parameter to minimize the number of doubling/halving steps. Furthermore, many involutive MCMC methods e.g., RWMH, MALA, and HMC have a preconditioner, or mass matrix M that needs to be tuned.

In this work, we use a round-based procedure to tune θ0 and M (Algorithm 3). Each round corresponds to a block of iterations during which parameters are held constant. We use θ0 = 1 and M = I for the first round. At the end of each round we update θ0 θ0 2bµ, where bµ is the empirical median of the selected log step sizes µt = µ(xt, zt, at, bt) from the current round. We also set c M to the diagonal of the inverse sample covariance matrix, which is then mixed with the identity as a regularizer to form M in each iteration (Line 8 in Algorithm 3).

4. Theoretical Analysis

The marginal sequence xt on X of Auto Step MCMC is itself a Markov chain because each step redraws zt, at, bt, θt independently of their previous value conditioned on xt. In this section we establish various properties of the X-

Auto Step: Locally adaptive involutive MCMC

marginal Markov chain.

4.1. Invariance

First, we show that Auto Step MCMC is π-invariant on the augmented space S, and hence π-invariant on X. The result is a straightforward application of Tierney (1998, Theorem 2) on the augmented space S. Note that while this work focuses on step size parameters θ R+, Proposition 4.2 below holds for general parameter spaces Θ and tuning refreshment distributions η(dθ|x, z, a, b).

Assumption 4.1. For each θ Θ, fθ is a continuously differentiable involution.

Proposition 4.2. Under Assumption 4.1, Auto Step MCMC is π-invariant, and hence the X-marginal is π-invariant.

4.2. Irreducibility and aperiodicity

Next, we establish that the X-marginal of Auto Step MCMC is π-irreducible and aperiodic (see Roberts & Rosenthal (2004)): intuitively, the chain has a positive probability of eventually visiting any measurable A X with π(A) > 0, and it does not visit various sets in a repeating pattern. We will demonstrate π-irreducibility and aperiodicity simultaneously by showing that the X-component of the chain can reach any measurable set A X in a single step with positive probability (one-step irreducibility). The first assumption needed is that for any fixed θ Θ, the X-marginal kernel Pθ(x, ) of the original involutive MCMC algorithm given by Eqs. (1) and (2) can do so as well.

Assumption 4.3. For all x X, θ Θ, and A X such that π(A) > 0, the X-marginal kernel Pθ of involutive MCMC (Eqs. (1) and (2)) satisfies Pθ(x, A) > 0.

The second assumption needed is that there is a non-null set of parameters θ Θ that can be selected and result in an accepted move from any point x, z X Z in the original augmented space of involutive MCMC. We encode this using the positivity of the function

γ(x, z, θ) = Z min{η(θ | x, z, a, b), η(θ | fθ(x, z), a, b)}1 (d(a, b)).

Assumption 4.4. There exists a B Θ such that R

B dθ > 0 and for all x X, m-a.e. z Z, and θ B,

γ(x, z, θ) > 0.

These assumptions yield the desired result, which holds for general parameter spaces Θ and distributions η.

Proposition 4.5. If both Assumptions 4.3 and 4.4 hold, then Auto Step MCMC is one-step irreducible, and hence irreducible and aperiodic.

We now apply Proposition 4.5 to the case where θ is a step size parameter and we use η from Section 3.1. In this setting, Assumption 4.4 simplifies substantially.

Corollary 4.6. Suppose Θ = (0, ), Assumption 4.3 holds, and we use η from Section 3.1. Then, Auto Step MCMC is irreducible and aperiodic if for all x X and m-a.e. z Z, |ℓ(x, z, θ0)| / {0, }.

We show in Lemmas A.1 and A.2 that Assumption 4.3 holds for both RWMH and MALA under weak conditions, and hence the irreducibility and aperiodicity of Auto Step RWMH and MALA follows from |ℓ(x, z, θ0)| / {0, }.

4.3. Step size selector termination

We now establish that under mild conditions, the step size selector function µ can be computed in finite time. For starting state s = (x, z, a, b) and initial step size θ0 > 0, let τ(s, θ0) 1 be the number of iterations of the while loop in Algorithm 2. The key condition, Assumption 4.7, is satisfied intuitively when the density π m is continuous and the involution fθ becomes the identity as θ 0 and grows without bound as θ .

Assumption 4.7. For π m-a.e. (x, z) X Z,

lim θ 0+ |ℓ(x, z, θ)| = 0 lim θ |ℓ(x, z, θ)| = .

Proposition 4.8. Let θ0 > 0 and suppose Assumption 4.7 holds. Then τ(s, θ0) < , π-a.s.

Note that while Proposition 4.8 guarantees that the Markov chain can be simulated in finite time in practice, the longrun computational cost of the step size adaptation depends on the expected number of doubling/halving iterations in each step, Eτ(s, θ0) for s π. Proposition 4.9 bounds this expectation in terms of ℓ(x, z, 2tθ0) for t Z.

Proposition 4.9. For s = (x, z, a, b, θ) π and all θ0 > 0,

Eτ(s, θ0) E

t=0 e 2|ℓ(x,z,2tθ0)|+ 1 e |ℓ(x,z,2 tθ0)| 2 .

The first and second terms in the sum in Proposition 4.9 capture the expected number of doubling and halving steps, respectively. While more detailed bounds on Eτ(s, θ0) require problem-specific analysis that depends on fθ, π, and m, both terms will typically decay quickly in t. Corollary 4.10 provides an example of a more detailed result based on Proposition 4.9 in a representative setting when using Auto Step with random walk Metropolis Hastings.

Corollary 4.10. Let log π(x) be strongly concave with Lipschitz gradients, and set fθ as in Eq. (3) with M = I. Then

Eτ(s, θ0) = O(|log θ0|) as θ0 0 or θ0 ,

with a dimension-independent leading constant.

Auto Step: Locally adaptive involutive MCMC

Corollary 4.10 shows that the expected number of doublings/halvings i.e., the long-run average evaluations of fθ and ℓper iteration is very robust to the initial step size parameter θ0. This translates to a robust performance per unit cost: in the setting in Corollary 4.10, we expect Auto Step RWMH to exhibit an average jump distance per unit cost to scale like O | log θ0| 1 for small/large θ0. Contrast this to the significantly worse O(exp( | log θ0|)) decay for fixed step size RWMH. This difference in behaviour is confirmed empirically in Fig. 4.

4.4. Energy jump distance

For s = (x, z, a, b, θ) π, s = f(s) = (x , z , a , b , θ ), and U Unif[0, 1], define the energy jump distance

D = |ℓ(x, z, θ)|1 U π(s )

π(s) J(s) ,

which we use to encode the change in log(π(x)m(z)) after one iteration of Auto Step MCMC. Proposition 4.11 shows that any involutive MCMC method both traditional and Auto Step methods have an expected energy jump distance bounded above by a simple expression via the tuning parameter proposal density ratio η,

η = ess sup x,z,a,b,θ

η(θ | fθ(x, z), a, b)

η(θ | x, z, a, b) (under π).

Proposition 4.11. Under Assumption 4.1,

ED 2η max e 1, η log η .

In particular, for traditional involutive MCMC with a fixed parameter θ = θ0, or for Auto Step MCMC with σ2 = 0, or Auto Step MCMC with a fixed-width uniform distribution for η, we have that η 1, so

ED 2e 1 0.736.

The step size selector presented in Section 3.1 is a computationally efficient method to make |ℓ(x, z, θ)| fall roughly in the range (| log b|, | log a|), where a, b Unif( ). Therefore, as a heuristic, we expect

0.5 = E| log b| E|ℓ(x, z, θ)| E| log a| = 1.5,

with departures from exactness arising due to the discrete doubling/halving procedure (as opposed to an exact root finder). In other words, the step size selector in this work creates proposals with mean energy jump distance roughly targeting the maximum 0.736 for a broad class of involutive MCMC methods.

Figure 1: Comparison of the symmetric (this work, blue) versus asymmetric ((Biron-Lattes et al., 2024), orange) step size criteria in Algorithm 2, in terms of the move acceptance probability of Auto Step RWMH as a function of the state norm x . Note that the asymmetric criterion yields very low acceptance probabilities for states near the mode (left side of the plot) and in the tails (right side of the plot).

5. Experiments

In this section we present an empirical evaluation of two Auto Step MCMC variants: RWMH and MALA. We first use synthetic targets with varying tail behaviour to examine the effect of the symmetric termination criterion in Algorithm 2 versus the asymmetric criterion from Biron-Lattes et al. (2024), the efficacy of our proposed tuning procedure for θ0, and the robustness of the performance of our proposed method versus the initial step size θ0. We then investigate the performance of Auto Step RWMH and Auto Step MALA in comparison to previous adaptive methods.

Throughout, we measure the efficiency of each sampler in terms of effective sample size (ESS) (Flegal et al., 2008) per second. However, we found empirically that standard ESS estimates (Gelman et al., 2013, p. 286-287), (Galin L Jones & Neath, 2006, p. 1539-1541) did not accurately characterize sampler performance because they do not incorporate knowledge of the target distribution directly. Therefore we instead use a diagnostic (KSESS) outlined in Appendix D that involves the maximum difference between the empirical CDF and target CDF across all dimensions, where the target CDF is approximated using a long run of parallel tempering with Pigeons.jl (Surjanovic et al., 2023). KSESS is used here solely for research comparisons and is not recommended for practical data analysis, as it relies on knowledge of the target or expensive gold-standard samples. The code for reproducing the main experimental results is available at https: //github.com/Julia-Tempering/Auto Step.

5.1. Symmetric step size selection criterion

We first assess the benefit of using the symmetric step size criterion in Algorithm 2 versus the asymmetric cri-

Auto Step: Locally adaptive involutive MCMC

(a) Values of θ0

(b) Cost per iteration

Figure 2: The effect of tuning θ0, showing traces of θ0 (Fig. 2a) and cost per iteration (Fig. 2b) versus tuning round when θ0 is initialized in {10 7, . . . , 107}. Despite the wide range of initializations, the tuned values and cost per iteration stabilize quickly and reliably.

terion from Biron-Lattes et al. (2024) on three synthetic targets with varying tail behavior: N(0, 1), Laplace(0, 1), and Cauchy(0, 1). For each x {10 5, 10 4, . . . , 102}, we simulated 107 draws from the target and renormalized them to the specified value. Then, for each draw, we simulated one step of Auto Step RWMH/MALA (θ0 = 1), and recorded the acceptance probability.

The average of these acceptance probabilities is shown in Fig. 1 for Auto Step RWMH, showing that the asymmetric step size selector can get stuck for extended periods near the mode of the target (small x ) and in the tails (large x ). This behavior particularly problematic in practice for Bayesian inference, which is often initialized in the tails. In contrast, the proposed symmetric step size selector is more robust, exhibiting an acceptance probability that is greater than 10% for the entire range of norms from 10 5 to 102. This result aligns with our theoretical results (Proposition 4.5 and Corollary 4.6) on the irreducibility of Auto Step with the symmetric step size selection criterion.

5.2. Tuning the initial step size θ0

Next, we assess the stability and efficacy of the round-based tuning procedure for θ0 in Auto Step RWMH/MALA. For

each of the same three synthetic targets as in Section 5.1, we initialize the state x N(0, 202), θ0 {10 7, . . . , 107}, and run the Markov chain for R = 20 doubling rounds ( 2 106 steps total), tuning θ0 per Algorithm 3 after each round. For each trace, we track the value of θ0 as it is tuned, as well as the per iteration cost, measured by the number of evaluations of ℓin each round. Note that fixed step size MCMC methods call ℓonce per iteration, whereas Auto Step calls ℓonce plus an additional time for each doubling or halving during step size tuning.

Fig. 2 displays the results from this experiment: Fig. 2a shows the tuned values of θ0 as a function of doubling round, while Fig. 2b shows the corresponding average cost per iteration during each round. Both figures together indicate that the proposed tuning procedure is highly effective: despite the initialization of θ0 spanning 14 orders of magnitude, the tuning remains stable and converges to a reasonable value of θ0 1, while the cost per iteration is quickly minimized and remains stable throughout the rounds.

5.3. Robustness to initial step size θ0

Next we examine the robustness of Auto Step RWMH/MALA to the setting of θ0 compared with fixed-step-size RWMH/MALA. In this experiment we do not tune θ0, and fix it at values θ0 {10 7, . . . , 107}. For each of the three targets from Section 5.1, and each fixed θ0, we initialized each sampler at a target draw, and collected various metrics over 106 MCMC steps.

The results are presented in Figs. 3 and 4. The main comparison is in Fig. 3, which illustrates the difference in KSESS per unit cost for fixed step size vs. Auto Step methods, where one unit cost corresponds to one evaluation of ℓ. For these 1D unimodal targets, standard fixed step size methods perform well when well-tuned. But for poor step size choices, performance decays quickly at a rate of roughly O(exp( | log θ0|)). In contrast, Auto Step methods incur a penalty for adaptivity, but the performance is far more robust to θ0 and decays like O(| log θ0| 1), which aligns with our theory from Corollary 4.10. This main comparison is supported by additional results in Fig. 4. Fig. 4a demonstrates that Auto Step methods empirically provide a high energy jump distance per iteration across all values of θ0, which is valuable especially in the context of annealing/tempering methods that depend on the mixing of the energy statistic (Surjanovic et al., 2024). Fig. 4b shows that the acceptance probability of Auto Step robustly remains bounded away from 0 and 1 across all values of θ0. Finally Fig. 4c confirms the O(| log θ0|) scaling of cost per iteration predicted by Corollary 4.10.

Auto Step: Locally adaptive involutive MCMC

Figure 3: KSESS per unit cost for Auto Step (blue) and fixed-step (orange) RWMH (Fig. 3a) and MALA (Fig. 3b).

(a) Energy Jump Distance per Iteration

(b) Acceptance Probability

(c) Cost per Iteration

Figure 4: Energy jump distance per iteration (Fig. 4a), acceptance probability (Fig. 4b), and cost per iteration (Fig. 4c) for Auto Step and fixed step RWMH versus initial step size θ0.

Figure 5: min KSESS per second for Auto Step and stateof-the-art samplers across five benchmarked models.

5.4. Comparison with other adaptive methods

We compare the performance of Auto Step RWMH/MALA against five state-of-the-art adaptive samplers: NUTS (Neal, 2011), the hit-and-run slice sampler (Neal, 2003; B elisle et al., 1993), adaptive RWMH (Vihola, 2012), adaptive MALA (Xu et al., 2020), and delayed rejection HMC (Modi

et al., 2024). For Auto Step RWMH/MALA, we use roundbased adaptive tuning to adjust the initial step size and diagonal preconditioner. Our benchmarks include two synthetic distributions Neal s funnel in 2 and 100 dimensions and three real Bayesian posteriors: a three-parameter linear regression model for yearly temperatures at Kilpisj arvi, Finland (Bales et al., 2019), an orbit-fitting problem (Thompson et al., 2024), and an m RNA transfection model (Ballnus et al., 2017).

Fig. 5 presents the results in terms of minimum KSESS across dimensions per second. This figure demonstrates that Auto Step methods consistently achieve reasonable performance across all five benchmarks, particularly on the challenging funnel targets. Auto Step RWMH shows competitive performance, generally within the range of the other methods. Auto Step MALA shows slightly lower performance in some settings but exhibits better scaling to higher dimensions, as seen in the funnel100 results. Overall, both Auto Step variants demonstrate robust efficiency across diverse model geometries.

Auto Step: Locally adaptive involutive MCMC

6. Conclusion

In this paper we presented Auto Step MCMC, a locallyadaptive step size selection method for involutive MCMC. We proved that Auto Step MCMC kernels are π-invariant, irreducible, and aperiodic under mild conditions. We also provided bounds on the mean energy jump distance and expected cost per iteration. We demonstrated empirically that Auto Step MCMC is stable, reliable, and competitive with other adaptive methods. One promising direction for future work is a rigorous theoretical analysis of the sampling efficiency of Auto Step MCMC in terms of expected squared jump distance and asymptotic variance.

Acknowledgements

ABC and TC acknowledge the support of an NSERC Discovery Grant. TL acknowledges the support of the UBC Advanced Machine Learning Training Network. NS acknowledges the support of a Vanier Canada Graduate Scholarship. We additionally acknowledge use of the ARC Sockeye computing platform from the University of British Columbia.

Impact Statement

This work presents a new MCMC algorithm; the societal consequences of MCMC need not be discussed in this paper.

Andrieu, C. and Thoms, J. A tutorial on adaptive MCMC. Statistics and Computing, 18(4):343 373, 2008.

Andrieu, C., Lee, A., and Livingstone, S. A general perspective on the Metropolis Hastings kernel. ar Xiv:2012.14881, 2020.

Atchad e, Y. F. An adaptive version for the Metropolis adjusted Langevin algorithm with a truncated drift. Methodology and Computing in Applied Probability, 8(2):235 254, 2006.

Bales, B., Pourzanjani, A., Vehtari, A., and Petzold, L. Selecting the metric in hamiltonian monte carlo. ar Xiv preprint ar Xiv:1905.11916, 2019.

Ballnus, B., Hug, S., Hatz, K., G orlitz, L., Hasenauer, J., and Theis, F. J. Comprehensive benchmarking of Markov chain Monte Carlo methods for dynamical systems. BMC Systems Biology, 2017.

B elisle, C., Romeijn, E., and Smith, R. Hit-and-run algorithms for generating multivariate distributions. Mathematics of Operations Research, 18(2):255 266, 1993.

Biron-Lattes, M., Surjanovic, N., Syed, S., Campbell, T., and Bouchard-Cˆot e, A. auto MALA: Locally adaptive

Metropolis-adjusted Langevin algorithm. In International Conference on Artificial Intelligence and Statistics, 2024.

Bou-Rabee, N., Carpenter, B., Kleppe, T. S., and Marsden, M. Incorporating local step-size adaptivity into the No-UTurn Sampler using Gibbs self tuning. ar Xiv: 2408.08259, 2024a.

Bou-Rabee, N., Carpenter, B., and Marsden, M. GIST: Gibbs self-tuning for locally adaptive Hamiltonian Monte Carlo. ar Xiv:2404.15253, 2024b.

Chimisov, C., Latuszynski, K., and Roberts, G. Air Markov chain Monte Carlo. ar Xiv:1801.09309, 2018.

Coullon, J., South, L., and Nemeth, C. Efficient and generalizable tuning strategies for stochastic gradient MCMC. Statistics and Computing, 33(3):66, 2023.

Duane, S., Kennedy, A., Pendleton, B. J., and Roweth, D. Hybrid Monte Carlo. Physics Letters B, 195(2):216 222, 1987.

Flegal, J. M., Haran, M., and Jones, G. L. Markov chain Monte Carlo: Can we trust the third significant figure? Statistical Science, 23(2):250 260, 2008.

Galin L Jones, Murali Haran, B. S. C. and Neath, R. Fixedwidth output analysis for Markov chain Monte Carlo. Journal of the American Statistical Association, 101(476): 1537 1547, 2006.

Gelman, A., Carlin, J., Stern, H., Dunson, D., Vehtari, A., and Rubin, D. Bayesian Data Analysis. CRC Press, 3rd edition, 2013.

Geyer, C. J. Practical Markov Chain Monte Carlo. Statistical Science, 7(4):473 483, 1992.

Girolami, M. and Calderhead, B. Riemann manifold Langevin and Hamiltonian Monte Carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(2):123 214, 2011.

Green, P. and Mira, A. Delayed rejection in reversible jump Metropolis Hastings. Biometrika, 88(4):1035 1053, 2001.

Haario, H., Saksman, E., and Tamminen, J. An adaptive Metropolis algorithm. Bernoulli, 7(2):223 242, 2001.

Hastings, W. K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1): 97 109, 1970.

Hoffman, M. D. and Gelman, A. The No-U-Turn Sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. The Journal of Machine Learning Research, 15 (1):1593 1623, 2014.

Auto Step: Locally adaptive involutive MCMC

Kleppe, T. S. Adaptive step size selection for Hessian-based manifold Langevin samplers. Scandinavian Journal of Statistics, 43(3):788 805, 2016.

Kolmogorov, A. Sulla determinazione empirica di una legge di distributione. Giornale dell Istituto Italiano degli Attuari, 4:83 91, 1933.

Livingstone, S. Geometric ergodicity of the random walk Metropolis with position-dependent proposal covariance. Mathematics, 9(4), 2021.

Maire, F. and Vandekerkhove, P. Markov kernels local aggregation for noise vanishing distribution sampling. SIAM Journal on Mathematics of Data Science, 4(4): 1293 1319, 2022.

Marsaglia, G., Tsang, W. W., and Wang, J. Evaluating Kolmogorov s distribution. Journal of Statistical Software, 8 (18):1 4, 2003.

Marshall, T. and Roberts, G. An adaptive approach to Langevin MCMC. Statistics and Computing, 22:1041 1057, 09 2012.

Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 21(6):1087 1092, 1953.

Modi, C., Barnett, A., and Carpenter, B. Delayed rejection Hamiltonian Monte Carlo for sampling multiscale distributions. Bayesian Analysis, 2024.

Neal, R. MCMC using Hamiltonian dynamics. In Brooks, S., Gelman, A., Jones, G., and Meng, X.-L. (eds.), Handbook of Markov chain Monte Carlo, chapter 5. CRC Press, 2011.

Neal, R. M. Bayesian Learning for Neural Networks. Springer New York, 1 edition, 1996.

Neal, R. M. Slice sampling. The Annals of Statistics, 31(3): 705 767, 2003.

Neklyudov, K., Shvechikov, P., and Vetrov, D. Metropolis Hastings view on variational inference and adversarial training. ar Xiv:1810.07151, 2018.

Neklyudov, K., Welling, M., Egorov, E., and Vetrov, D. Involutive MCMC: A unifying framework. In International Conference on Machine Learning, 2020.

Nishimura, A. and Dunson, D. Variable length trajectory compressible hybrid Monte Carlo. ar Xiv:1604.00889, 2016.

Polson, N. G. and Scott, J. G. On the half-Cauchy prior for a global scale parameter. Bayesian Analysis, 7(4):887 902, 2012.

Roberts, G. and Rosenthal, J. General state space Markov chains and MCMC algorithms. Probability Surveys, 1: 20 71, 2004.

Roberts, G., Gelman, A., and Gilks, W. Weak convergence and optimal scaling of random walk Metropolis algorithms. Annals of Applied Probability, 7(1):110 120, 1997.

Roberts, G. O. and Rosenthal, J. S. Optimal scaling of discrete approximations to Langevin diffusions. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 60(1):255 268, 1998.

Rossky, P. J., Doll, J. D., and Friedman, H. L. Brownian dynamics as smart Monte Carlo simulation. The Journal of Chemical Physics, 69(10):4628 4633, 1978.

Surjanovic, N., Biron-Lattes, M., Tiede, P., Syed, S., Campbell, T., and Bouchard-Cˆot e, A. Pigeons.jl: Distributed sampling from intractable distributions. ar Xiv:2308.09769, 2023.

Surjanovic, N., Syed, S., Bouchard-Cˆot e, A., and Campbell, T. Uniform ergodicity of parallel tempering with efficient local exploration. ar Xiv:2405.11384, 2024.

Thompson, W., Lawrence, J., Blakely, D., Marois, C., Wang, J., Giordano, M., Brandt, T., Johnstone, D., Ruffio, J.-B., Ammons, S., Crotts, K., Do O, C., Gonzales, E., and Rice, M. Octofitter: fast, flexible, and accurate orbit modelling to detect exoplanets. The Astronomical Journal, 166(164): 1 20, 2024.

Tierney, L. A note on Metropolis Hastings kernels for general state spaces. The Annals of Statistics, 8(1):1 9, 1998.

Tierney, L. and Mira, A. Some adaptive Monte Carlo methods for Bayesian inference. Statistics in Medicine, 18 (17-18):2507 2515, 1999.

Turok, G., Modi, C., and Carpenter, B. Sampling from multiscale densities with delayed rejection generalized Hamiltonian Monte Carlo. ar Xiv:2406.02741, 2024.

Vihola, M. Robust adaptive Metropolis algorithm with coerced acceptance rate. Statistics and Computing, 22(5): 997 1008, 2012.

Xu, K., Ge, H., Tebbutt, W., Tarek, M., Trapp, M., and Ghahramani, Z. Advanced HMC.jl: A robust, modular and efficient implementation of advanced HMC algorithms. In Proceedings of The 2nd Symposium on Advances in Approximate Bayesian Inference, 2020.

Auto Step: Locally adaptive involutive MCMC

Proof of Proposition 4.2. The auxiliary refreshment and tuning parameter refreshment steps in Auto Step MCMC (Steps 1. and 2.) resample (z, a, b, θ) jointly from their conditional distribution given x under π. This move is well-known to be π-invariant, and so it remains only to show that the Metropolis-corrected involutive proposal (Steps 3. and 4.) is π-invariant. The kernel for the proposal on the augmented space S is

Q(s, ds ) = δf(s)(ds ),

and the acceptance probability α : S2 R+ is given by

α(s, s ) = min 1, π(s )

π(s) J(s) .

In the notation of Tierney (1998, Theorem 2), define the measure µ(ds, ds ) = π(ds)δf(s)(ds ); because f is an involution, we have that the symmetric set R and density ratio r : R R+ are given by

R = {(s, s ) S2 : f(s) = s , π(s) > 0, π(s ) > 0}, r(s, s ) =

π(ds)δf(s)(ds )

π(ds )δf(s )(ds).

Note that condition (i) of Tierney (1998, Theorem 2) holds by definition of R and α. Suppose for the moment that r(s, s ) = π(s)

π(s )J(s ); then condition (ii) and hence π-invariance holds because

α(s, s )r(s, s ) = min 1, π(s )

π(s) J(s) π(s)

π(s )J(s ) = min π(s)

π(s )J(s ), J(s )J(s) = α(s , s),

which follows because J(s)J(s ) = 1 µ-a.e. on R. To demonstrate that r(s, s ) has the required form, consider a test function g : S2 R: Z g(s, s )π(ds)δf(s)(ds ) = Z g(s, f(s))π(ds)

= Z g((x, z, a, b, θ), (fθ(x, z), a, b, θ))π(x, z, a, b, θ)dxdzd(a, b)dθ.

Since fθ is a continuously differentiable involution, we can transform variables x , z = fθ(x, z) by including a Jacobian term J(s) = | fθ(x, z)| in the integrand and by noting x, z = fθ(x , z ):

= Z g((fθ(x , z ), a, b, θ), (x , z , a, b, θ))π(fθ(x , z ), a, b, θ)| fθ(x , z )|dx dz d(a, b)dθ

= Z g(f(s ), s )π(f(s ))J(s )ds

= Z g(f(s ), s )π(f(s ))

π(s ) J(s )π(ds )

= Z g(s, s ) π(s)

π(s )J(s )π(ds )δf(s )(ds).

Examining the first and last integral expressions, the density ratio has the form

π(ds)δf(s)(ds )

π(ds )δf(s )(ds) = π(s)

π(s )J(s ).

Proof of Proposition 4.5. Let K(x, ) denote the Markov kernel for the X-marginal process of Auto Step MCMC. Since for u, v 0, min{1, uv} min{1, u} min{1, v}, we have that for s = (x, z, a, b, θ),

min 1, π(f(s))

π(s) J(s) min n 1, eℓ(x,z,θ)o min 1, η(θ | fθ(x, z), a, b)

η(θ | x, z, a, b)

Auto Step: Locally adaptive involutive MCMC

K(x, A) Z 1[fθ(x, z) A Z] min n 1, eℓ(x,z,θ)o min 1, η(θ | fθ(x, z), a, b)

η(θ | x, z, a, b)

η(dθ | x, z, a, b)m(dz)1 (d(a, b))

= Z 1[fθ(x, z) A Z] min n 1, eℓ(x,z,θ)o γ(x, z, θ)m(dz)dθ,

γ(x, z, θ) = Z min{η(θ | x, z, a, b), η(θ | fθ(x, z), a, b)}1 (d(a, b)).

By Assumption 4.4, for all x X, m-a.e. z Z, and for all θ B where R

B dθ > 0, γ(x, z, θ) > 0. Therefore Z 1[fθ(x, z) A Z] min n 1, eℓ(x,z,θ)o γ(x, z, θ)m(dz)1[θ B]dθ > 0

Z 1[fθ(x, z) A Z] min n 1, eℓ(x,z,θ)o m(dz)1[θ B]dθ > 0.

The proof concludes by noting that Z 1[fθ(x, z) A Z] min n 1, eℓ(x,z,θ)o m(dz)1[θ B]dθ = Z Pθ(x, A)1[θ B]dθ,

where Pθ(x, A) is the one-step probability of transitioning into A from x for the original involutive chain with parameter θ, and then by applying Assumption 4.3.

Proof of Corollary 4.6. The proof involves verifying Assumption 4.4. Note that dθ is the counting measure on {θ = θ0 2j : j Z}. Consider setting B = {θ0}. Assumption 4.4 holds if for all x X and m-a.e. z Z, Z 1[µ(x, z, a, b) = θ0 = µ(fθ0(x, z), a, b)]1 (d(a, b)) > 0.

That is, if there is a nonzero probability of choosing the default parameter θ0 at any point (x, z). Note that

µ(x, z, a, b) = θ0 = µ(fθ0(x, z), a, b)

log(a) < ℓ(x, z, θ0) < log(b) or log(a) < ℓ(x, z, θ0) < log(b).

By assumption, for all x X and m-a.e. z Z, we have ℓ(x, z, θ0) / { , 0, }. If ℓ(x, z, θ0) > 0, then when a < exp( ℓ(x, z, θ0)) < b we have the condition hold. This has positive measure under 1 (d(a, b)). If ℓ(x, z, θ0) < 0, then when a < exp(ℓ(x, z, θ0)) < b, the above condition holds. This set also has positive measure.

Lemma A.1. For the Auto Step RWMH kernel with any fixed θ > 0, x X, and A X with π(A) > 0, we have Pθ(x, A) > 0, provided that π(x) > 0 for all x X.

Proof of Lemma A.1. Fix θ > 0, A X with π(A) > 0 and x X. Because π λ, we have λ(A) > 0. By translation properties of the Lebesgue measure, for any x Rd, λ(A) = λ(A x) > 0, where A x = { x x : x A}. Here, (x , z ) = fθ(x, z) = (x + z, z) and so | fθ(x, z)| = 1. Also, since z m where m = N(0, I), we have

ℓ(x, z, θ) = log π(x )

Pθ(x, A) Z 1[z A x] min 1, π(x + z)

m(z) λ(dz) > 0,

because for all x, z we have λ(A x) > 0, m(z) > 0, and min{1, π(x + z)/π(x)} > 0.

Auto Step: Locally adaptive involutive MCMC

Lemma A.2. For the Auto Step MALA kernel with any fixed θ > 0, x X, differentiable π, positive definite M, and A X with π(A) > 0, we have Pθ(x, A) > 0, provided that π(x) > 0 for all x X.

Proof of Lemma A.2. Fix θ > 0, A X with π(A) > 0 and x X. Because π λ, we have λ(A) > 0. As in (Biron-Lattes et al., 2024), we combine the updates on (x, z) into one step, so that fθ(x, z) = (x (θ), z (θ)), where

x (θ) = x + θM 1z + θ2

2 M 1 log γ(x), z (θ) = z + θ

2 log γ(x) + θ

2 log γ(x (θ)) .

Now, x (θ) A if

z Ax := M( x x)

2 log γ(x) : x A .

By translation and scaling properties of the Lebesgue measure, for any x Rd, λ(Ax) > 0. It is a standard result that the leapfrog integrator satisfies | fθ(x, z)| = 1. We have

ℓ(x, z, θ) = log π(x )m(z )

Pθ(x, A) Z 1[z Ax] min 1, π(x )m(z )

m(z) λ(dz) > 0.

because for all x, z we have λ(Ax) > 0, m(z) > 0, and the acceptance ratio is positive.

Proof of Proposition 4.8. We generalize the step size termination proof of Theorem 3.3 by Biron-Lattes et al. (2024). Consider first the case where v = 1. Since for π m-a.e. x, z, limθ 0+ |ℓ(x, z, θ)| = 0, and | log a| > 0 almost surely, there exists a θ > 0 such that 0 < θ < θ , |ℓ(x, z, θ)| < | log a|. Therefore there exists a j < 0 such that 2jθ0 < θ and the while loop terminates. Next consider the case where v = 1. Since for π m-a.e. x, z, limθ |ℓ(x, z, θ)| = , and | log b| < almost surely, there exists a θ > 0 such that θ > θ , |ℓ(x, z, θ)| > | log b|. Therefore there exists a j > 0 such that 2jθ0 > θ and the while loop terminates.

Proof of Proposition 4.9.

Eτ(s, θ0) =

t=0 P(τ(s, θ0) > t)

t=0 P max 0 j t |ℓ(x, z, 2jθ0)| < | log b| min t j 0 |ℓ(x, z, 2jθ0)| > | log a|

t=0 P max 0 j t |ℓ(x, z, 2jθ0)| < | log b| + P min t j 0 |ℓ(x, z, 2jθ0)| > | log a|

t=0 P |ℓ(x, z, 2tθ0)| < | log b| + P |ℓ(x, z, 2 tθ0)| > | log a|

t=0 P |ℓ(x, z, 2tθ0)| < log b + P |ℓ(x, z, 2 tθ0)| > log a

t=0 P e |ℓ(x,z,2tθ0)| > b + P e |ℓ(x,z,2 tθ0)| < a .

Since a, b are uniform on 0 a < b 1,

P(a > x) = (1 x)2, and P(b < x) = x2,

Auto Step: Locally adaptive involutive MCMC

t=0 Ee 2|ℓ(x,z,2tθ0)| + E 1 e |ℓ(x,z,2 tθ0)| 2 , (x, z) π m.

Fubini s theorem completes the proof.

Proof of Corollary 4.10. The involution for RWMH with M = I is given by

fθ(x, z) = (x + θz, z), m = N(0, I).

ℓ(x, z, θ) = log π(x + θz), m( z)

= log π(x + θz) log π(x).

Since log π is L-Lipschitz smooth,

|ℓ(x, z, θ)| θ| log π(x)T z| + 1

1 e |ℓ(x,z,2 tθ0)| 2

1 e 2 tθ0| log π(x)T z| 1

2 Lθ2 04 t z 2 2

t=0 1 2e 2 tθ0| log π(x)T z| 1

2 L4 tθ2 0 z 2 + e 2 2 tθ0| log π(x)T z| L4 tθ2 0 z 2.

Furthermore, note that (1 e x)2 min(x2, 1) for x 0. Therefore,

1 e |ℓ(x,z,2 tθ0)| 2

t=0 min ℓ(x, z, 2 tθ0)2, 1

2 tθ0| log π(x)T z| + 1

2L4 tθ2 0 z 2 2 , 1

4 t θ0| log π(x)T z| + 1

2Lθ2 0 z 2 2 , 1

Let t0 = max{0, (3/2) log θ0| log π(x)T z| + 1

2Lθ2 0 z 2 } . Then,

1 e |ℓ(x,z,2 tθ0)| 2

t=t0 4 t θ0| log π(x)T z| + 1

2Lθ2 0 z 2 2

t=0 4 t4 t0 θ0| log π(x)T z| + 1

2Lθ2 0 z 2 2

t=0 4 t t0 + 4/3.

Next, since log π is U-strongly log-concave,

ℓ(x, z, θ) θ log π(x)T z 1

Auto Step: Locally adaptive involutive MCMC

If log π(x)T z 0, then

|ℓ(x, z, θ)| θ| log π(x)T z| + 1

2Uθ2 z 2 = 1

2Uθ2 z 2 θ log π(x)T z.

On the other hand, if log π(x)T z > 0, then

|ℓ(x, z, θ)|

( 0 θ 2 log π(x)T z

1 2Uθ2 z 2 θ log π(x)T z θ > 2 log π(x)T z

Taken together, for θ 0,

|ℓ(x, z, θ)| 1 θ > 2 log π(x)T z

2Uθ2 z 2 θ log π(x)T z

1 θ > 2 log π(x)T z + 1

2Uθ2 z 2 θ log π(x)T z .

t=0 e 2|ℓ(x,z,2tθ0)|

t=0 e 2 2tθ01 2tθ0> 2 log π(x)T z+1

2 U2tθ0 z 2 log π(x)T z)

t=0 e 2 2tθ01 2t/2θ0>1, 2tθ0> 2 log π(x)T z+1

2 U2tθ0 z 2 log π(x)T z).

Let t 0 = max{0, 3 log θ0, (3/2) log 2 log π(x)T z+1

Uθ0 z 2 } , where log of a negative number is taken to be . Then,

t=0 e 2|ℓ(x,z,2tθ0)|

t=t 0 e 2 2tθ01[...]( 1

2 U2tθ0 z 2 log π(x)T z)

t=t 0 e 2 2t/2

t=0 e 2(t+1)/2

0 e 2t/2 dt

= t 0 + 2 log 2

1 t 1e t dt

Combining the above two bounds, as well as the bounds x x + 1 for x > 0, u T v u v for vectors u, v, and log π(x) L x for an L-Lipschitz smooth function (without loss of generality, assume π(x) reaches its maximum and has log π(x) = 0 at x = 0):

Eτ(s, θ0) Et0 + Et 0 + 4

5 + 3 log 1 + θ 1 0 + 3

2E log 1 + θ0L x z + 1

2Lθ2 0 z 2 + 3

2E log 1 + 2L x z + 1

Jensen s inequality yields

Eτ(s, θ0) 5 + 3 log 1 + θ 1 0 + 3

2 log 1 + θ0LE x E z + 1

2Lθ2 0E z 2 + 3

2E log 1 + 2L z E x + 1

Auto Step: Locally adaptive involutive MCMC

Since z N(0, I) in Rd,

2 (d + 1)1/2, and E z 2 = d,

where the bound on E z follows from Gautschi s inequality. Since x has a strongly log-concave and Lipschitz smooth density,

R elog π(x) x dx R elog π(x) dx (2πU 1)d/2 R 1 (2πU 1)d/2 e 1

2 U x 2 x dx

(2πL 1)d/2 R 1 (2πL 1)d/2 e 1

2 L x 2 dx = U (d+1)/2E z

L d/2 Ld/2(d + 1)1/2

U (d+1)/2 .

Finally, for any a, b > 0,

E log 1 + a z + 1

= E1[ z < 1] log 1 + a z + 1

+ E1[ z 1] log 1 + a z + 1

E1[ z < 1] log 1 + a + 1

+ E1[ z 1] log 1 + a z + 1

E1[ z < 1] log 1 + a + 1

+ log 1 + a E z + 1

0 P 1[ z < 1] log 1 + a + 1

> t dt + log 1 + a E z + 1

0 P z 2 < min 1, a + 1 b(et 1)

dt + log 1 + a E z + 1

1 2d/2Γ(d/2)

Z min n 1, a+1 b(et 1)

0 sd/2 1e s/2ds dt + log 1 + a E z + 1

1 2d/2Γ(d/2)

Z min n 1, a+1 b(et 1)

0 sd/2 1ds dt + log 1 + a(d + 1)1/2 + 1

2 d2d/2Γ(d/2)

min 1, a + 1 b(et 1)

d/2 dt + log 1 + a(d + 1)1/2 + 1

= 2 d2d/2Γ(d/2)

Z log(1+ a+1

a + 1 b(et 1)

+ log 1 + a(d + 1)1/2 + 1

= 2 d2d/2Γ(d/2)

log 1 + a + 1

b ) (et 1) d/2dt

+ log 1 + a(d + 1)1/2 + 1

= 2 d2d/2Γ(d/2)

log 1 + a + 1

b ) e sd/2 es

Auto Step: Locally adaptive involutive MCMC

+ log 1 + a(d + 1)1/2 + 1

2 d2d/2Γ(d/2)

log 1 + a + 1

b ) e sd/2ds

+ log 1 + a(d + 1)1/2 + 1

= 2 d2d/2Γ(d/2)

log 1 + a + 1

+ log 1 + a(d + 1)1/2 + 1

= 2 d2d/2Γ(d/2)

log 1 + a + 1

+ log 1 + a(d + 1)1/2 + 1

Therefore, the final bound is

Eτ(s, θ0) 5 + 3 log 1 + θ 1 0 + 3

2 log 1 + θ0LLd/2(d + 1)1/2

U (d+1)/2 (d + 1)1/2 + 1

1 + 2L z Ld/2(d+1)1/2

U (d+1)/2 + 1 Uθ0 z 2

5 + 3 log 1 + θ 1 0 + 3

2 log 1 + θ0LLd/2(d + 1)

U (d+1)/2 + 1

2 2 d2d/2Γ(d/2)

1 + 2L Ld/2(d+1)1/2

U (d+1)/2 + 1 Uθ0

1 + 2L Ld/2(d+1)

U (d+1)/2 + 1

5 + 3 log 1 + θ 1 0 + 6 2d/2 + 3

2 log 1 + θ0LLd/2(d + 1)

U (d+1)/2 + 1

2d/2 + 1 log

1 + 2L Ld/2(d+1)

U (d+1)/2 + 1

The result follows by inspection of the above bound.

Proof of Proposition 4.11. The expected jump distance is

ED = Z |ℓ(x, z, θ)| min 1, eℓ(x,z,θ) η(θ|fθ(x, z), a, b)

η(θ|x, z, a, b)

π(dx)m(dz)(21 (d(a, b)))η(dθ | x, z, a, b).

Let A = {x, z, a, b, θ : exp(ℓ(x, z, θ)) η(θ|x, z, a, b)/η(θ|fθ(x, z), a, b)}, and define A<, A , A> similarly by replacing the inequality in the definition. We begin by splitting the integral into regions A , A>:

A> |ℓ(x, z, θ)|π(dx)m(dz)(21 (d(a, b)))η(dθ | x, z, a, b)

A |ℓ(x, z, θ)|eℓ(x,z,θ)η(θ|fθ(x, z), a, b)π(dx)m(dz)(21 (d(a, b)))dθ.

Next we perform the transformation of variables x , z = fθ(x, z) on the A> integral. Recall that ℓ(x, z, θ) satisfies

ℓ(fθ(x, z), θ) = ℓ(x, z, θ),

Auto Step: Locally adaptive involutive MCMC

and note that this transformation yields the integration region A<, such that

A< |ℓ(x, z, θ)|eℓ(x,z,θ)η(θ|fθ(x, z), a, b)π(dx)m(dz)(21 (d(a, b)))dθ

A |ℓ(x, z, θ)|eℓ(x,z,θ)η(θ|fθ(x, z), a, b)π(dx)m(dz)(21 (d(a, b)))dθ.

The upper bound follows by combining the two terms and bounding the η ratio:

A |ℓ(x, z, θ)|eℓ(x,z,θ) η(θ|fθ(x, z), a, b)

η(θ|x, z, a, b) η(dθ|x, z, a, b)π(dx)m(dz)(21 (d(a, b)))

2η sup y log η |y|eyπ(A )

2η sup y log η |y|ey

= 2η max e 1, η log η .

Auto Step: Locally adaptive involutive MCMC

Model Dimension α m RNA 5 5.767 orbital 12 5.389 kilpisjarvi 3 5.9561 funnel2 2 5.960 funnel100 100 72.551

Table 1: The scaling constant α for each model, as mentioned in Section 5.

B. Additional Experimental Details and Results

B.1. Synthetic and real data models

We present the synthetic and real data models used in Section 5. In the following, N(µ, σ2) represents a normal distribution with mean µ and variance σ2. We use N(µ, σ2; b L, b U) to denote a truncated normal distribution with bounds (b L, b U).

The Neal s funnel with d dimensions (d 2) and scale parameter τ > 0, denoted by funnel(d, τ), is given by

X1 N(0, 9), X2, . . . , Xd | X1 = x1 iid N(0, (exp(x1/τ))2).

For the 2-dimensional funnel we used τ = 0.6, and for the 100-dimensional funnel we used τ = 6.

The m RNA model with N observations, the time t RN, and the observed outcomes y RN is given by

log10(t0) Uniform( 2, 1)

log10(k0) Uniform( 5, 5)

log10(β) Uniform( 5, 5)

log10(δ) Uniform( 5, 5)

log10(σ) Uniform( 2, 2)

k0 exp( β (ti t0)) exp( δ (ti t0))

δ β , if δ = β

k0 (ti t0), if δ = β

, i = 1, . . . , N

yi | µi, σ N(µi, σ2), i = 1, . . . , N.

The kilpisjarvi model with the predictors x RN, the observed responses y RN, and additional parameters µα, µβ, σα, σβ, is given by

α N(µα, σ2 α)

β N(µβ, σ2 β)

σ N(0, 1; 0, )

yi | α, β, σ, xi N(α + βxi, σ2), i = 1, . . . , N.

In our experiments we used µα = 9.313, µβ = 0, σα = 100, and σβ = 0.0333.

The orbital model is a model of the multiple system Gliese 229 available in Octofitter.jl (Thompson et al., 2024) at

https://github.com/sefffal/Orbit Posterior DB/blob/main/models/astrom-GL229A.jl.

B.2. Estimation of scaling factors

We measure the cost of each method by Nℓ+ αNg, where Nℓand Ng are the number of evaluations of log π( ) and log π( ), respectively. We use α as a target-dependent scaling factor to balance the cost of gradient and density evaluations.

Auto Step: Locally adaptive involutive MCMC

Figure 6: The same comparison as 1, except for the MALA involution instead of the RWMH involution.

To determine α, we ran three separate Markov chains using Pigeons to obtain draws, benchmarked the time required for gradient computations, and benchmarked the time for log density evaluations. The scaling factor α is then obtained as the ratio of the total time spent on gradient evaluations to the total time spent on log density evaluations. The chains are long enough such that the estimate error (in absolute difference) is within 2%. The α values used in all experiments are presented in Table 1.

There seems to be a positive correlation between α and the dimension of the problems. This correlation is likely due to the auto-differentiation system used to compute gradients. While auto-differentiation avoids the need for manual differentiation, it appears to incur a computational cost that increases with dimensionality.

B.3. Algorithm Tuning Parameters

We provide the detailed tuning parameters for the samplers compared in Section 5.4.

Adaptive MALA: Step size θ0 = 0.1.

Delayed Rejection HMC: Step size θ0 = 0.1, number of subsequent proposals k = 2, and step size divisor a = 5.0.

NUTS: Maximum tree depth Lmax = 5 and target acceptance ratio αtarget = 0.65.

B.4. Additional results

Additional results for the experiments in Sections 5.3 and 5.1 are presented in Figs. 6 to 8. In particular, Fig. 6 shows the same comparison of symmetric and asymmetric step size selectors as in Fig. 1 in the main text, except for the MALA involution. The conclusion here is slightly different for MALA than for RWMH; both asymmetric and symmetric step size selection yields reasonable acceptance probabilities near the mode, while both have decaying acceptance probabilities in the tails. However, despite the decay, the symmetric criterion still yields a substantial increase over the asymmetric criterion, and should still be preferred. Fig. 7 shows the same additional metrics as Fig. 4 when comparing fixed step sizes vs Auto Step (energy jump distance per iteration, acceptance probability, and cost per iteration) but for the MALA involution. The conclusions for MALA drawn here are precisely the same as those for RWMH. Finally, Fig. 8 shows the expected jump distance for fixed and Auto Step RWMH and MALA; these plots show the expected decay of at least e | log θ0| for fixed step size methods, and | log θ0| 1 for Auto Step methods. It is worth pointing out that the expected jump distance for fixed step size RWMH on very heavy tailed targets (like the Cauchy) can be quite high, but indicates a few very large jumps rather than good mixing behaviour.

We also present additional results in Fig. 9 for the experiments in Section 5.4. The min ESS metrics were computed using the method described by Geyer (1992) in Section 3.3, which we found to be more reliable than the default settings in MCMCChains.

C. Reference pair plots

Figs. 10 to 14 display pair plots of the reference draws for each of the experiments in Section 5.4. Each set of reference draws contains between 106 107 draws. These draws were used for computing the KSESS estimates.

Auto Step: Locally adaptive involutive MCMC

(a) Energy Jump Distance per Iteration

(b) Acceptance Probability

(c) Cost per Iteration

Figure 7: The same metrics as presented in Fig. 4, except for the MALA involution instead of the RWMH involution.

Figure 8: The jump distance per iteration for Auto Step and fixed step RWMH (Fig. 8a) and MALA (Fig. 8b) versus initial step size θ0.

(a) min KSESS per unitcost

(b) min ESS per second

(c) min ESS per unit cost

Figure 9: The same comparisons as 5, except that we present min KSESS per unit cost (Fig. 9a), min ESS per second (Fig. 9b), and min ESS per unit cost (Fig. 9c), instead of min KSESS per second.

D. KSESS diagnostic

Suppose we obtain a collection of i.i.d. draws (xt)T t=1 from a target distribution with known CDF F(x) (or given a sufficiently large set of draws (ext) e T t=1, e T T such that the empirical CDF of (ext) e T t=1 can be used in place of F). For any T N, define

D((xt)T t=1) = sup x |FT (x) F(x)|,

Auto Step: Locally adaptive involutive MCMC

where FT (x) is the empirical distribution of (xt)T t=1. It is known that

T D((xt)T t=1) d K as T where K follows the Kolmogorov distribution (Kolmogorov, 1933) (see also Marsaglia et al. (2003)),

k=1 e (2k 1)2π2

E[K] = log 2 rπ

Var[K] = π2

12 π(log 2)2

Heuristically, given a sample of size T = NB, we have that as N, B ,

B D (xt)(τ+1)B t=τB+1 E[K],

and hence the sample size T is approximately

T KSESS1 := T

1 N PN 1 τ=0

B D (xt)(τ+1)B t=τB+1

If instead of exact Monte Carlo draws from the target, we obtain draws from a (potentially slowly mixing) Markov chain, we expect the D statistic to follow a convergence law roughly of the form

αT D((xt)T t=1) d K, where αT is the effective sample size. In this case,

1 N PN 1 τ=0

αB D (xt)(τ+1)B t=τB+1

as desired. However, this doesn t measure severe failures well; note that the minimum possible value of KSESS1 is T log 2p π

2/B, which occurs when D = 1 for each batch (i.e., when the sampler is working as poorly as possible). Hence we also consider the metric

2 D((xt)T t=1)

which characterizes severe failure, but has too high variance to be useful when the sampler is functioning well. Hence the final KSESS metric we use is

KSESS = KSESS2 KSESS2 T log 2p π

2/B KSESS1 otherwise

In our experiments we set N = 40.

Auto Step: Locally adaptive involutive MCMC

Figure 10: Reference samples for funnel2.

Auto Step: Locally adaptive involutive MCMC

Figure 11: First 5 dimensions of the reference samples for funnel100

Auto Step: Locally adaptive involutive MCMC

Figure 12: Reference samples for kilpisjarvi.

Auto Step: Locally adaptive involutive MCMC

Figure 13: Reference samples for m RNA.

Auto Step: Locally adaptive involutive MCMC

Figure 14: Reference samples for orbital.