# infinitehorizon_distributionally_robust_regretoptimal_control__b1626dc4.pdf Infinite-Horizon Distributionally Robust Regret-Optimal Control Taylan Kargin * 1 Joudi Hajar * 1 Vikrant Malik * 1 Babak Hassibi 1 We study the infinite-horizon distributionally robust (DR) control of linear systems with quadratic costs, where disturbances have unknown, possibly time-correlated distribution within a Wasserstein2 ambiguity set. We aim to minimize the worstcase expected regret the excess cost of a causal policy compared to a non-causal one with access to future disturbance. Though the optimal policy lacks a finite-order state-space realization (i.e., it is non-rational), it can be characterized by a finite-dimensional parameter. Leveraging this, we develop an efficient frequency-domain algorithm to compute this optimal control policy and present a convex optimization method to construct a near-optimal state-space controller that approximates the optimal non-rational controller in the H -norm. This approach avoids solving a computationally expensive semi-definite program (SDP) that scales with the time horizon in the finite-horizon setting. 1. Introduction Addressing uncertainty is a core challenge in decisionmaking. Control systems inherently encounter various uncertainties, such as external disturbances, measurement errors, model disparities, and temporal variations in dynamics (van der Grinten, 1968; Doyle, 1985). Neglecting these uncertainties in policy design can result in considerable performance decline and may lead to unsafe and unintended behavior (Samuelson & Yang, 2017). Traditionally, the challenge of uncertainty in control systems has been predominantly approached through either stochastic or robust control frameworks (Kalman, 1960; Zames, 1981; Doyle et al., 1988). Stochastic control (e.g., Linear Quadratic Gaussian (LQG) or H2-control) aims to minimize an expected cost, assuming disturbances follow a *Equal contribution 1California Institute of Technology. Correspondence to: Taylan Kargin . Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by the author(s). known probability distribution (Hassibi et al., 1999). However, in practical scenarios, the true distribution is often estimated from sampled data, introducing vulnerability to inaccurate models. On the other hand, robust control minimizes the worst-case cost across potential disturbance realizations, such as those with bounded energy or power (H control) (Zhou et al., 1996). While this ensures robustness, it can be overly conservative. Two recent approaches have emerged to tackle this challenge. Regret-Optimal (RO) Control. Introduced by (Sabag et al., 2021; Goel & Hassibi, 2023), this framework offers a promising strategy to tackle both stochastic and adversarial uncertainties. It defines regret as the performance difference between a causal control policy and a clairvoyant, noncausal one with perfect knowledge of future disturbances. In the full-information setting, RO controllers minimize the worst-case regret across all bounded energy disturbances (Sabag et al., 2021; Goel & Hassibi, 2023). The infinitehorizon RO controller also takes on a state-space form, making it conducive to efficient real-time computation (Sabag et al., 2021). Extensions of this framework have been investigated in various settings, including measurement-feedback control (Goel & Hassibi, 2021a; Hajar et al., 2023b), dynamic environments (Goel & Hassibi, 2021b), safety-critical control (Martin et al., 2022; Didier et al., 2022), filtering (Sabag & Hassibi, 2022; Goel & Hassibi, 2023), and distributed control (Martinelli et al., 2023). While these controllers effectively emulate the performance of non-causal controllers in worst-case disturbance scenarios, they may exhibit excessive conservatism when dealing with stochastic ones. Distributionally Robust (DR) Control. In contrast to traditional approaches such as H2 or H and RO control that focus on a single distribution or worst-case disturbance realization, the DR framework addresses uncertainty in disturbances by considering ambiguity sets sets of plausible probability distributions (Yang, 2020; Kim & Yang, 2023; Hakobyan & Yang, 2022; Taskesen et al., 2023; Aolaritei et al., 2023a;b; Brouillon et al., 2023). This methodology aims to design controllers with robust performance across all probability distributions within a given ambiguity set. The size of the ambiguity set provides control over the desired robustness against distributional uncertainty, ensuring that the resulting controller is not excessively conservative. Distributionally Robust Infinite-Horizon Control The controller s performance is highly sensitive to the chosen metric for quantifying distributional shifts. Common choices include the total variation (TV) distance (Tzortzis et al., 2015; 2016), the Kullback-Leibler (KL) divergence (Falconi et al., 2022; Liu et al., 2023), and the Wasserstein2 (W2) distance (Taha et al., 2023; Taskesen et al., 2023; Aolaritei et al., 2023a; Hajar et al., 2023a; Brouillon et al., 2023; Kargin et al., 2024). The controllers derived from KL-ambiguity sets (Petersen et al., 2000; Falconi et al., 2022) have been linked to the well-known risk-sensitive controller (Jacobson, 1973; Speyer et al., 1974; Whittle, 1981), which minimizes an exponential cost (see (Hansen & Sargent, 2008) and the references therein). However, distributions in a KL-ambiguity set are restricted to be absolutely continuous with respect to the nominal distribution (Hu & Hong, 2012), significantly limiting its expressiveness. In contrast, W2-distance, which quantifies the minimal cost of transporting mass between two probability distributions, induces a Riemannian structure on the space of distributions (Villani, 2009) and allows for ambiguity sets containing distributions with both discrete and continuous support. Thanks to this versatility and the rich geometric framework, it has found widespread adoption across various fields, including machine learning (Arjovsky et al., 2017), computer vision (Liu et al., 2020; Ozaydin et al., 2024), estimation and filtering (Shafieezadeh Abadeh et al., 2018; Lotidis et al., 2023; Prabhat & Bhattacharya, 2024) , data compression (Blau & Michaeli, 2019; Lei et al., 2021; Malik et al., 2024), and robust optimization (Zhao & Guan, 2018; Kuhn et al., 2019; Gao & Kleywegt, 2022; Blanchet et al., 2023) . Moreover, the W2-distance has emerged as a theoretically appealing statistical distance for DR linear-quadratic control problems (Taha et al., 2023) due to its compatibility with quadratic objectives and the resulting tractability of the associated optimization problems (Gao & Kleywegt, 2022). 1.1. Contributions This paper explores the framework of Wasserstein-2 distributionally robust regret-optimal (DR-RO) control of linear dynamical systems in the infinite-horizon setting. Initially introduced by (Taha et al., 2023) for the full-information setting, DR-RO control was later adapted to the partially observable case by (Hajar et al., 2023a). Similarly, (Taskesen et al., 2023) derived a DR controller for the partially observed linear-quadratic-Gaussian (LQG) problem, assuming time-independent disturbances. These prior works, focusing on the finite-horizon setting, are hampered by the requirement to solve a semi-definite program (SDP) whose complexity scales with the time horizon, prohibiting their applicability for large horizons. Our work addresses this limitation by considering the infinite-horizon setting where the probability distribution of the disturbances over the entire time horizon is assumed to lie in a W2-ball of a specified radius centered at a given nominal distribution. We seek a linear time-invariant (LTI) controller that minimizes the worst-case expected regret for distributions adversarially chosen within the W2-ambiguity set. Our contributions are summarized as follows. 1. Stabilizing time-invariant controller. As opposed to the finite-horizon controllers derived in (Hakobyan & Yang, 2022; Taskesen et al., 2023; Aolaritei et al., 2023b; Taha et al., 2023; Hajar et al., 2023a), the controllers obtained in the infinite-horizon setting stabilize the underlying dynamics (Corollary 3.4) 2. Robustness against non-iid disturbances. In contrast to several prior works that assume time-independence of disturbances (Yang, 2020; Kim & Yang, 2023; Hakobyan & Yang, 2022; Taskesen et al., 2023; Zhong & Zhu, 2023; Aolaritei et al., 2023a;b), our approach does not impose such assumptions, thereby ensuring that the resulting controllers are robust against time-correlated disturbances. 3. Characterization of the optimal controller. We cast the DR-RO control problem as a max-min optimization and derive the worst-case distribution and the optimal controller using KKT conditions (Theorem 3.2). While the resulting controller is non-rational, lacking a finite-order state-space realization (Corollary 4.3), we show it admits a finite-dimensional parametric form (Theorem 4.2). 4. Efficient computation of the optimal controller. Utilizing the finite-dimensional parametrization, we propose an efficient algorithm based on the Frank-Wolfe method to compute the optimal non-rational DR-RO controller in the frequency-domain with arbitrary fidelity (Algorithm 1). 5. Near-optimal state-space controller. We introduce a novel convex program that finds the best rational approximation of any given order for the non-rational controller in the H -norm (Theorem 5.5). Therefore, our approach enables efficient real-time implementation using a nearoptimal state-space controller (Lemma 5.7). Notations: The letters N, Z, R, and C denote the set of natural numbers, integers, real, and complex numbers, respectively. T denotes the complex unit circle. For z C, |z| is its magnitude, and z is the conjugate. Sn + denotes the set of positive semidefinite (psd) matrices of size n n. Bare calligraphic letters (K, M, etc.) are reserved for operators. I is the identity operator with a suitable block size. For an operator M, its adjoint is M . For a matrix A, its transpose is A , and its Hermitian conjugate is A . For psd operators/matrices, denotes the Loewner order. For a psd operator M, both M and M 1 2 denote the PSD square-root. {M}+ and {M} denote the causal and strictly anti-causal parts of an operator M. M(z) denotes the z-domain transfer function of a Toeplitz operator M. Distributionally Robust Infinite-Horizon Control tr( ) denotes the trace of operators and matrices. is the usual Euclidean norm. and 2 are the H operator) and H2 (Frobenius) norms, respectively. Probability distributions are denoted by P. Pp(Rd) denotes the set of distributions with finite pth moment over a Rd. E denotes the expectation. The Wasserstein-2 distance between distributions P1, P2 Rd is denoted by W2(P1, P2) such that W2(P1, P2) inf E w1 w2 2 1/2 , (1) where the infimum is over all joint distributions of (w1, w2) with marginals w1 P1 and w2 P2. 2. Preliminaries 2.1. Linear-Quadratic Control Consider a discrete-time, linear time-invariant (LTI) dynamical system expressed as a state-space model given by: xt+1 = Axt + Buut + Bwwt, st = Cxt. (2) Here, xt Rdx is the state, st Rds is the regulated output, ut Rdu is the control input, and wt Rdw is the exogenous disturbance at time t. The state-space parameters (A, Bu, Bw, C) are known with stabilizable (A, Bu), controllable (A, Bw), and observable (A, C). The disturbances are generated from an unknown stochastic process. We focus on the infinite-horizon setting, where the time index spans from the infinite past to the infinite future, taking values in Z1. Defining the doubly-infinite column vectors of regulated output s:=(st)t Z, control input u:=(ut)t Z, and disturbance process w:=(wt)t Z trajectories, we can express the temporal interactions between these variables globally by representing the dynamics (2) as a causal linear input/output model, described by: s = Fu + Gw, (3) where F and G are strictly causal (i.e., strictly lowertriangular) and doubly-infinite ds du and ds dw-block Toeplitz operators, respectively. These operators describe the influence of the control input and disturbances on the regulated output through convolution with the impulse response of the dynamical system (2), which are completely determined by the model parameters (A, Bu, Bw, C). Control Policy. We restrict our attention to the fullinformation setting where the control input ut at time t Z has access to the past disturbances (ws)t s= . In particular, we consider linear time-invariant (LTI) disturbance feedback control2 (DFC) policies that map the disturbances to the control input via a causal convolution sum: s= b Kt sws, for all t Z. (4) 1The doubly-infinite horizon is chosen for simplicity in derivations, but the results are extendable to a semi-infinite horizon. 2Youla parametrization enables the conversion between a DFC controller and a state-feedback controller (Youla et al., 1976). The sequence { b Kt} t=0 of du dw matrices are known as the Markov parameters of the controller. Similar to the causal linear model in (3), the controller equation in (4) can be expressed globally by u = Kw, where K is a bounded, strictly causal, du dw-block Toeplitz operator with lower block-diagonal entries given by the Markov parameters. The set of causal DFC policies is denoted by K . Cost. At each time step, the control inputs and disturbances incur a quadratic instantaneous cost s t st+u t Rut, where R 0. Without loss of generality, we take R = I by redefining Bu R 1 2 Bu and R 1 2 ut ut. By defining the truncated sequences s T :=(st)T 1 t=0 and u T :=(ut)T 1 t=0 the cumulative cost over a horizon of T N is simply given by cost T (u, w) := s T 2 + u T 2. (5) 2.2. The Regret-Optimal Control Framework We aim to design controllers that reduce the regret against the best offline sequence of control inputs selected in hindsight. For a horizon T, the cumulative regret is given by REGRETT (u,w):=cost T (u,w) min u T cost T (u ,w). (6) We highlight that the minimization on the right-hand side is among all control input sequences, including non-causal (offline) ones. The regret-optimal (RO) control framework, introduced by (Sabag et al., 2021), aims to craft a causal and time-invariant controller K K that minimizes the steady-state worst-case regret across all bounded energy disturbances. This can be formally cast as γRO := inf K K lim sup T 1 T sup w T 2 1 REGRETT (Kw, w). (7) In the full-information setting, the best sequence of control inputs selected in hindsight is given by u = K w where K := (I + F F) 1F G, (8) is the optimal non-causal policy (Hassibi et al., 1999). Since a non-causal controller lacks physical realization, the optimal RO controller, KRO represents the best causal policy, attaining performance levels akin to the optimal non-causal policy K , which enjoys complete access to the disturbance trajectory in advance. Exploiting the time-invariance of dynamics in (2) and the controller K K , Sabag et al. (2021) demonstrates the equivalence of (7) to the following: inf K K sup w 2 1 w RKw= inf K K RK , (9) where w is the ℓ2-norm, and RK, which we call the regret operator, is given as RK := (K K ) (I + F F)(K K ). (10) The resulting controller closely mirrors the non-causal controller s performance under the worst-case disturbance sequence but may be conservative for stochastic disturbances. Distributionally Robust Infinite-Horizon Control 2.3. Distributionally Robust Regret-Optimal Control This paper investigates distributionally robust regret-optimal control, seeking to devise a causal controller minimizing the worst-case expected regret within a Wasserstein-2 (W2) ambiguity set of disturbance probability distributions. The W2-ambiguity set WT (P , r) for horizon T is defined as a W2-ball of radius of r T >0 centered at a nominal distribution P ,T P(RT dw), namely: WT (P , r T ):= P P(RT dw)|W2(P, P ) r T . (11) In contrast to (7), which addresses the worst-case regret across all bounded energy disturbances, our focus is on the worst-case expected regret across all distributions within the W2-ambiguity set, as defined by Taha et al. (2023) Reg T (K, r T ):= sup PT WT (P ,T ,r T ) EPT [REGRETT (Kw, w)] , where EPT denotes the expectation such that w T PT . In the infinite-horizon case, this cumulative quantity diverges to infinity. Therefore, we focus on the steady-state worstcase expected regret, as defined by (Kargin et al., 2024): Definition 2.1. The steady-state worst-case expected regret suffered by a policy K K is given by the ergodic limit of the cumulative worst-case expected regret, i.e., Reg (K, r) := lim sup T 1 T Reg T (K, r T ). (12) To ensure the limit in (12) is well-defined, the asymptotic behavior of the ambiguity set must be specified. For this purpose, we make the following assumption. Assumption 2.2. The nominal disturbance process w := (w ,t)t Z forms a zero-mean weakly stationary random process with an auto-covariance operator M :=(c M ,t s)t,s Z, i.e., EP [w ,tw ,s] = c M ,t s. Moreover, the size of the ambiguity set for horizon T scales as r T r T for a r>0. The choice of r T T aligns with the fact that the W2distance between two random vectors of length T, each sampled from two different iid processes, scales proportionally to While the limit (12) is well-defined under Assumption 2.2, it can still be infinite depending on the chosen controller K. Notably, a finite value for (12) implies closed-loop stability. In Problem 2.3, we formally state the infinite-horizon Wasserstein-2 DR-RO problem. Problem 2.3 (Distributionally Robust Regret-Optimal Control ). Find a causal LTI controller, K K , that minimizes the steady-state worst-case expected regret (12), i.e., inf K K Reg (K, r)= inf K K lim sup T 1 T Reg T (K, r T ). (13) In Section 3, we provide an equivalent max-min optimization formulation of Problem 2.3. 3. A Saddle-Point Problem This section presents a tractable convex reformulation of the infinite-horizon DR-RO problem. Concretely, Theorem 3.1 introduces an equivalent single-variable variational characterization of the steady-state worst-case expected regret (12) incurred by a fixed time-invariant controller. Exploiting this, we show in Theorem 3.2 that Problem 2.3 reduces to a convex program over positive-definite operators via duality. Moreover, we characterize the optimal controller and the worst-case distribution via KKT conditions. All the proofs of the subsequent theorems are deferred to the Appendix. Two major challenges are present in solving the Problem 2.3: ergodic limit in (12) and causality constraint in (13). Firstly, the ergodic limit definition of the worst-case expected regret for a fixed policy K K requires successively solving optimization problems with ever-increasing dimensions. To address this challenge, we leverage the asymptotic convergence properties of Toeplitz matrices and derive an equivalent formulation of (12) as an optimization problem over a single decision variable as in Kargin et al. (2024). Similar to the time-domain derivations of H2 and risk-sensitive controllers in the infinite horizon, the resulting formulations involve the Toeplitz operators RK. This result is presented formally in the subsequent theorem. Theorem 3.1 (A Variational Formula for Reg (Kargin et al., 2024, Thm.5)). Under Assumption 2.2, the steadystate worst-case expected regret Reg (K, r) incurred by a causal policy K K is equivalent to the following: inf γ 0, γI RK γ tr ((I γ 1RK) 1 I)M +γr2. (14) which takes a finite value whenever RK is bounded. Additionally, the worst-case disturbance is obtained from w := (I γ 1 RK) 1w where γ is the optimal solution of (14) satisfying tr ((I γ 1 RK) 1 I)2M = r2. Notice that the optimization in (14) closely mirrors the finite-horizon version presented by Taha et al. (2023, Thm. 2), with the key difference being the substitution of finitehorizon matrices with Toeplitz operators. The second challenge is addressing the causality constraint on the controller. When the causality assumption on the controller is lifted, the non-causal policy K achieves zero worst-case expected regret since RK becomes zero and so the worst-case regret by Theorem 3.1. While this example illustrates the triviality of non-causal DR-RO problem, the minimization of worst-case expected regret objective in (14) over causal policies is, in general, not a tractable problem. Leveraging Fenchel duality of the objective in (14), we address the causality constraint by reformulating Problem 2.3 as a concave-convex saddle-point problem in Theorem 3.2 so that the well-known Wiener-Hopf technique (Wiener & Distributionally Robust Infinite-Horizon Control Hopf, 1931; Kailath et al., 2000) can be used to obtain the optimal DR-RO controller (see Lemma D.2 for details). To this end, let = I +F F be the canonical spectral factorization3, where both and its inverse 1 are causal operators. We also introduce the Bures-Wasserstein (BW) distance for positive-definite (pd) operators defined as BW(M1, M2)2 :=tr M1+M2 2(M where M1, M2 0 with finite trace (Bhatia et al., 2017). Theorem 3.2 (A saddle-point problem for DR-RO). Under Assumption 2.2, Problem 2.3 reduces to a feasible concave-convex saddle-point problem given as sup M 0 inf K K tr(RKM) s.t. BW(M, M ) r. (15) Letting KH2 := 1{ K }+ be the H2 controller, the unique saddle point (K , M ) of (15) satisfies: K = KH2 + 1 {{ K } L }+ L 1 , (16a) M = (I γ 1 RK ) 1M (I γ 1 RK ) 1, (16b) where L L =M is the canonical spectral factorization with causal and unique 4 L and L 1 , and γ >0 uniquely satisfies tr ((I γ 1 RK ) 1 I)2M = r2. This result demonstrates that the optimal DR-RO controller integrates the H2 controller with an additional correction term that accounts for the time correlations in the worstcase disturbance, w , which are encapsulated by the autocovariance operator M . Remark 3.3. As r , the optimal γ approaches the lower bound γRO =inf K K RK and K recovers the regret-optimal (RO) controller. Conversely, as r 0, the ambiguity set collapses to the nominal model as γ and K recovers the H2 controller when M = I. Thus, adjusting r facilitates the DR-RO controller to interpolate between the RO and H2 controllers. We conclude this section by asserting the closed-loop stability of (2) under the optimal DR-RO controller, K . This stability directly results from the saddle-point problem (15) achieving a finite optimal value. Corollary 3.4. K stabilizes the closed-loop system. 4. An Efficient Algorithm In this section, we introduce a numerical method to compute the saddle-point (K , M ) of the max-min problem in (15). While both (K , M ) are non-rational, i.e., do not admit a finite order state-space realization, Theorem 4.2 states that M possesses a finite-dimensional parametric form in 3Analogues to Cholesky factorization of finite matrices. 4See the note in Appendix B.4 about the uniqueness of L the frequency domain. Exploiting this fact, we conceive Algorithm 1, a procedure based on the Frank-Wolfe method, to compute the optimal M in the frequency domain. Furthermore, we devise a novel approach to approximate the non-rational M in H -norm by positive rational functions, from which a near-optimal state-space DR-RO controller can be computed using (16a). We leave the discussion on the rational approximation method to Section 5. To enhance the clarity of our approach, we assume for the remainder of this paper that the nominal disturbances are uncorrelated, i.e., M = I. Additionally, we utilize the frequency-domain representation of Toeplitz operators as transfer functions, denoting M as M(z), L as L(z), K as K(z), and similarly for other operators, where z C. 4.1. Finite-Dimensional Parametrization of M We first obtain an equivalent condition of optimality of M . Lemma 4.1. Define the anti-causal operator T := { K } . The optimality condition in (16) takes the equivalent form: I+4γ 1 {T L } {T L } where γ >0 is such that BW(L L , I) = r. Denoting N := L L , there exists a one-to-one mapping between M =L L and N due to the uniqueness of the spectral factorization. Consequently, we interchangeably refer to both N and M as the optimal solution. The following theorem characterizes the optimal N in the frequency domain, implying a finite-dimensional parametrization. Theorem 4.2. Denoting by T(z) = C(z 1I A) 1B the transfer function of the anti-causal operator T = { K } , let f : (γ, Γ) R Rdx dw 7 N return a pd operator with a transfer function z T 7 N(z) taking the form I+4γ 1Γ (z 1I A) C C(z 1I A) 1Γ 2 , where (A, B, C) are obtained from the state-space parameters of the system in (2) (see Appendix E). Then, the optimal solution N =L L in (17) satisfies N = f(γ , Γ ) where 0 (I ejωA) 1BL (ejω)dω, (18) and γ >0 is such that BW(L L , I) = r. Notice in Theorem 4.2 that N (z) involves the square root of a rational term. In general, the square root does not preserve rationality. We thus get Corollary 4.3. Corollary 4.3. The optimal DR-RO controller, K (z), and N (z) are non-rational. Thus, K (z) does not admit a finite-dimensional state-space form. Given the non-rationality of the controller K (z), Kargin et al. (2024) proposes a fixed-point algorithm exploiting the finite-parametrization of the controller. In the next section, Distributionally Robust Infinite-Horizon Control we propose an alternative efficient optimization algorithm, which, in contrast to the fixed-point algorithm, has provable convergence guarantee to the saddle point (K , N ). 4.2. An Iterative Optimization in the Frequency Domain Although the problem is concave, its infinite-dimensional nature complicates the direct application of standard optimization tools. To address this challenge, we employ frequency-domain analysis via transfer functions, allowing for the adaptation of standard optimization techniques. Specifically, we utilize a variant of the Frank-Wolfe method (Frank & Wolfe, 1956; Jaggi, 2013). Our approach is versatile and can be extended to other methods, such as projected gradient descent (Goldstein, 1964) and the fixed-point method in (Kargin et al., 2024). Furthermore, the convergence of our method to the saddle point (K , N ) can be demonstrated using standard tools in optimization. Detailed pseudocode is provided in Algorithm 1 in Appendix F.1. Frank-Wolfe: We define the following function and its (Gateaux) gradient (Danskin, 1966): Φ(M) inf K K tr (RKM) (19) Φ(M)=L { K L} { K L} L 1 . (20) where LL =M is the spectral factorization. Rather than directly solving the optimization (15), the Frank-Wolfe method solves a linearized subproblem in consecutive steps. Namely, given the kth iterate Mk, the next iterate Mk+1 is obtained via f Mk = arg max M I, BW(M,I) r tr ( Φ(Mk) M) (21a) Mk+1 = (1 ηk)Mk + ηk f Mk, (21b) where ηk [0, 1] is a step-size, commonly set to ηk = 2 k+2 (Jaggi, 2013). Letting Rk := Φ(Mk) be the gradient as in (19), Frank-Wolfe updates can be expressed equivalently using spectral densities as: f Mk(z)=(I γ 1 k Rk(z)) 2 (22) Mk+1(z)=(1 ηk)Mk(z)+ηk f Mk(z), z T (23) where γk > 0 solves tr ((I γ 1 k Rk) 1 I)2 = r2. See Appendix F.2 for a closed-form Rk(z). Discretization: Instead of the continuous unit circle T, we use its uniform discretization with N points, TN := {ej2πn/N | n = 0, . . . , N 1}. Updating Mk+1(z) at a frequency z using the gradient Rk(z) at the same z requires Mk(z ) at all frequencies z T due to spectral factorization. Thus, Mk+1(z) depends on Mk(z ) across the entire circle. This can be addressed by finer discretization. Spectral Factorization: For the non-rational spectral densities Mk(z), we can only use an approximate factorization (Sayed & Kailath, 2001). Consequently, we use the DFTbased algorithm from Rino (1970), which efficiently factorizes scalar densities (i.e., dw =1), with errors diminishing rapidly as N increases. Matrix-valued spectral densities can be factorized using various other algorithms (Wilson, 1972; Ephremidze, 2010). See Appendix F.3 for a pseudocode. Bisection: We use bisection method to find the γk >0 that solves tr ((I γ 1 k Rk) 1 I)2 = r2 in the Frank-Wolfe update (22). See Appendix F.4 for a pseudocode. Remark 4.4. The gradient Rk(z) requires the computation of the finite-dimensional parameter via (18), which can be performed using N-point trapezoidal integration. See Appendix F.2 for details. We conclude this section with the following convergence result due to (Jaggi, 2013; Lacoste-Julien & Jaggi, 2014). Theorem 4.5 (Convergence of Mk). There exists constants δN >0, depending on discretization N, and κ>0, depending only on state-space parameters (2) and r, such that, for a large enough N, the iterates in (21) satisfy Φ(M ) Φ(Mk) 2κ k + 2(1 + δN). (24) 5. Rational Approximation The preceding section determined that the optimal solution, denoted as N , is non-rational and lacks a state-space representation. Nevertheless, Algorithm 1 introduced in Section 4.2 can effectively approximate it in the frequency domain. Indeed, after convergence, the algorithm returns the optimal finite parameter, Γ , which can be used to compute N (z) at any arbitrary frequency using Theorem 4.2, and thus K (z) (see Algorithm 1 in Appendix F.1). However, a state-space controller must be devised for any practical real-time implementation. This section introduces an efficient method to obtain statespace controllers approximating the non-rational optimal controller. Instead of directly approximating the controller itself, our method involves an initial step of approximating the power spectrum N (z) of the worst-case disturbance to minimize the H -norm of the approximation error using positive rational functions. While problems involving rational function approximation generally do not admit a convex formulation, we show in Theorem 5.5 that approximating positive power spectra by a ratio of positive fixed order polynomials can be cast as a convex feasibility problem. After finding a rational approximation of N (z), we compute a state-space controller according to (16a). For the sake of simplicity, we focus on scalar disturbances, i.e., dw =1. 5.1. State-Space Models from Rational Power Spectra As established in Theorem 3.2, the derivation of a optimal controller K is achieved through the positive operator Distributionally Robust Infinite-Horizon Control N = L L using the Wiener-Hopf technique. Specifically, we have K = KH2 + 1 {{ K } L }+ L 1 L 1 . Since other controllers of interest, including H2, H , and RO, can all be formulated this way, we focus on obtaining approximations to positive power spectra. It is worth noting that a positive and symmetric rational approximation b N(z) of order m N can be represented as a ratio b N(z) = P(z)/Q(z) of two positive symmetric polynomials P(z) = p0 +Pm k=1 pk(zk +z k), and Q(z) = q0 + Pm k=1 qk(zk + z k). When such P(z), Q(z) exist, we can obtain a rational spectral factorization of b N(z) by obtaining spectral factorization for P(z), and Q(z). Finally, we end this section by stating an exact characterization of positive trig. polynomials. While verifying the positivity condition for general functions might pose challenges, the convex cone of positive symmetric trigonometric polynomials, Tm,+, possess a characterization through a linear matrix inequality (LMI), as outlined below: Lemma 5.1 (Trace parametrization of Tm,+ (Dumitrescu, 2017, Thm. 2.3)). For k=[ m, m], let Θk R(m+1) (m+1) be the primitive Toeplitz matrix with ones on the kth diagonal and zeros everywhere else. Then, P(z) = p0 + Pm k=1 pk(zk + z k) > 0 if and only if there exists a real positive definite matrix P Sm+1 + such that pk = tr(PΘk), k = 0, . . . , m. (25) According to Lemma 5.1, any positive trig. polynomial of order at most m can be expressed (non-uniquely) as P(z)=Pr k= r tr(PΘk)z 1 =tr (PΘ(z)). Here, Θ(z) := Pr k= r Θkz 1. 5.2. Rational Approximation using H -norm In this context, we present a novel and efficient approach for deriving rational approximations of non-rational power spectra. Our method bears similarities to the flexible uniform rational approximation approach described in (Sharon et al., 2021), which approximates a function with a rational form while imposing the positivity of the denominator of the rational form as a constraint. Our method uses H -norm as criteria to address the approximation error effectively. First, consider the following problem: Problem 5.2 (Rational approximation via H -norm minimization). Given a positive spectrum N, find the best rational approximation of order at most m N with respect to H norm, i.e., inf P,Q Tm,+ P/Q N s.t. tr(Q) = 1 (26) Note that the constraint tr(Q) = 1, equivalent to q0 = 1, eliminates redundancy in the problem since the fraction P/Q is scale invariant. While the objective function in Equation (26) is convex with respect to P and Q individually, it is not jointly convex 0 1 2 3 4 5 6 1 (a) The frequency domain representation of N for r = 0.01, 1, 3 for system [AC15]. 0 0.5 1 1.5 2 2.5 3 0 (b) The worst-case expected regret of different controllers for the system [AC15]. Figure 1: Variation of N with r and the performance of the DR-RO controller versus the H2, H , and RO controller. in (P, Q). In this form, Problem 5.2 is not amenable to standard convex optimization tools. To circumvent this issue, we instead consider the sublevel sets of the objective function in Equation (26). Definition 5.3. For a given ϵ > 0 approximation bound, the ϵ-sublevel set of Problem 5.2 is defined as Sϵ :={(P, Q) | P/Q N ϵ, tr(Q)=1} . By applying the definition of H -norm, we have that P/Q N =max z T |P(z)/Q(z) N(z)| ϵ ( P(z) (N(z)+ϵ) Q(z) 0, P(z) (N(z) ϵ) Q(z) 0, (27) where the last set of inequalities hold for all z T. Notice that the inequalities in Equation (27) and the positivity constraints on P, Q are jointly affine in (P, Q). Moreover, the equation tr(Q) = 1 is an affine equality constraint. Therefore, we have the following claim. Lemma 5.4. The set Sϵ is jointly convex in (P, Q). Unlike its non-convex optimization counterpart in Problem 5.2, a membership oracle for the convex set Sϵ offers a means to obtain accurate rational approximations for nonrational functions. According to Lemma 5.1, the positive trig. polynomials (P, Q) Sϵ can be parameterized by psd matrices P and Q. This allows the equality constraint tr(Q) and the affine inequalities in (27) to be expressed as Distributionally Robust Infinite-Horizon Control Linear Matrix Inequalities (LMIs) in terms of P and Q. The resulting theorem characterizes the ϵ-sublevel sets. Theorem 5.5 (Feasibility of Sϵ). Let ϵ > 0 be a given accuracy level, and m N is a fixed order. The trig. polynomials P and Q of order m belong to the ϵ-sublevel set, (P, Q) Sϵ if and only if there exists P, Q Sm+1 + such that tr (Q) = 1 and for all z T, . 1) tr (PΘ(z)) (N(z)+ϵ) tr (QΘ(z)) 0, (28) 2) tr (PΘ(z)) (N(z) ϵ) tr (QΘ(z)) 0. (29) 50 100 150 200 (a) White noise 50 100 150 200 0 (b) Worst disturbance for DR-RO, infinite horizon Figure 2: The control costs of different DR controllers under (a) white noise and (b) worst disturbance for DR-RO in infinite horizon, for system [AC15]. The finite-horizon controllers are re-applied every s = 30 steps. The infinite horizon DR-RO controller achieves the lowest average cost compared to the finite-horizon controllers. The sole limitation in this approach arises from the fact that for an non-rational N(z), the set of infinitely many inequalities in (27) cannot be precisely characterized by a finite number of constraints, as seen in the trace parametrization of positive polynomials. To overcome this challenge, one can address the inequalities in (27) solely for a finite set of frequencies, such as TN = {ej2πn/N | n = 0, . . . , N 1} for N m. While this introduces an approximation, the method s accuracy can be enhanced arbitrarily by increasing the frequency samples. By taking this approach, the problem of rational function approximation can be reformulated as a convex feasibility problem involving LMIs and a finite number of affine (in)equality constraints. It is crucial to note our algorithm can be used in the following two modes. These operational modes highlight the algorithm s adaptability for the given two use cases. 50 100 150 200 0 (a) Worst disturbance for DR-RO, finite horizon 50 100 150 200 0 (b) Worst disturbance for DR-LQR, finite horizon Figure 3: The control costs of different DR controllers under (a) worst disturbances for DR-RO in finite horizon and (b) worst disturbances for DR-LQR in finite horizon, for system [AC15]. The finite-horizon controllers are re-applied every s = 30 steps. Despite being designed to minimize the cost under specific disturbances, the finite horizon DR controllers are outperformed by the infinite horizon DR-RO controller. 1. Best Precision for a given degree By adjusting the parameter ϵ, which signifies our tolerance for deviations from M(ejw), we can refine the approximation s accuracy. This method is particularly valuable when we need to find the best possible polynomial representation of M(ejw) for a given degree. 2. Lowest Degree for a given precision In contrast, we can ask for the lowest degree polynomial which achieves a certain precision level ϵ. This mode is advantageous when the priority is to minimize computational overhead or when we need a simpler polynomial approximation, as long as the approximation remains within acceptable accuracy bounds 5.3. Obtaining State-Space Controllers Note that given the polynomial z-spectra, we require its spectral factorization to obtain the state-space controller that approximates the DR-RO controller. The following Lemma introduces a simple way to obtain such an approximation Lemma 5.6 (Canonical factor of polynomial z-spectra (Sayed & Kailath, 2001, Lem. 1)). Consider a Laurent polynomial of degree m, P(z) = Pm k= m pkz k, with pk = p k R, such that P(z) > 0. Then, there exists a canonical factor L(z) = Pm k=0 ℓkz k such that P(z) = |L(z)|2 and L(z) has all of its root in T. Distributionally Robust Infinite-Horizon Control Using Lemma 5.6, we can compute spectral factors by factorizing the symmetric positive polynomials and multiplying all the factors with stable roots together. Consequently, this rational spectral factor enables the derivation of a rational controller, denoted as K(z) (refer to Section 5.3). Now we present the DR-RO controller in state-space form. Lemma 5.7. Let L(z) be the rational factor of the spectral factorization N(z) = L(z) L(z) = P(z)/Q(z) of a degree m rational approximation P(z)/Q(z). The controller obtained from L(z) using (16), i.e., K(z)=KH2(z)+ (z) 1 n { (z)K (z)} L(z) o + L(z) 1 is rational and can be realized as a state-space controller as follows: e(t+1) = e Fe(t)+ e Gw(t), u(t) = e He(t)+ e Jw(t)) (30) where et is the controller state, and ( e F, e G, e H, e J) are determined from (A, Bu, Bw) and L(z). 6. Numerical Experiments In this section, we present the performance of the DRRO controller, compared to H2, H , regret-optimal and other finite-horizon DR controllers. We present frequency domain and time-domain evaluations, and we showcase the performance of the rational approximation method. We employ benchmark models such as [REA4], [AC15], and [HE3] from (Leibfritz & Lipinski, 2003). In the frequency domain simulations, results for [REA4] and [HE3] are presented. In the time domain simulations for the aircraft model [AC15] are presented, with additional simulations provided in Appendix I. The [REA4] is a chemical reactor model and [HE3] is a helicopter model with 8 states each. The [AC15] is an aircraft model with 4 states. We perform all experiments using MATLAB, on an Apple M1 processor with 8 GB of RAM. We specify the nominal distribution as a Gaussian, with zero mean and identity covariance. 6.1. Frequency Domain Evaluations We investigate the behaviour of the DR-RO controller and its rational approximation for various values of the radius r. To show the behavior of the worst-case disturbance we plot its power spectrum N(ejω) for three different values of the radius r for the [AC15] system in Figure 1a. As can be seen for r = 0.01, the worst-case disturbance is almost white, since that is the case for the nominal disturbance. As r increases, the time correlation of the worst-case disturbance increases, and the power spectrum becomes peaky. For the [AC15] system, the worst-case expected regret cost, as outlined in (2.1), for DR-RO, the H2, H , and RO controllers. are depicted in Figure 1b. We observe that for smaller r, the DR-RO performs close to the H2 controller. However, as r increases, the worst-case regret is close to the regret achieved by the RO controller. Throughout the variation in r, the DR-RO achieves the lowest worst-case expected regret among all the other mentioned controllers. To implement the DR-RO controller in practice, we need a rational controller. We find the rational approximation of N(ejω) as P (ejω) Q(ejω) using the method of Section 5.2 for [AC15] and degrees m = 1, 2, 3. The performance of the resulting rational controllers is compared to the non-rational DR-RO in Table 1. As can be seen, the rational approximation with an order greater than 2 achieves an expected regret that well matches that of the non-rational for all values of r. r=0.01 r=1 r=1.5 r=2 r=3 DRRO 59.16 302.08 488.57 718.20 1307.12 RA(1) 60.49 33394.74 4475.70 9351.89 2376.77 RA(2) 59.58 303.33 491.75 723.96 1318.98 RA(3) 59.57 302.41 489.49 719.72 1309.85 Table 1: The worst-case expected regret of the non-rational DR-RO controller, compared to the rational controllers RA(1), RA(2), and RA(3), obtained from degree 1, 2, and 3 rational approximations to N(ejω). 6.2. Time Domain Evaluations We compare the time-domain performance of the infinite horizon DR-RO controller to its finite horizon counterparts, namely DR-RO and DR-LQR, as outlined in (Yan et al., 2023). The latter controllers are computed through an SDP whose dimension scales with the time horizon. We plot the average LQR cost over 210 time steps, aggregated over 1000 independent trials. Figure 2a illustrates the performance of DR controllers under white Gaussian noise, while 2b, 3a, and 3b demonstrate responses to worst-case noise scenarios dictated by each of the controllers, using r = 1.5. For computational efficiency, the finite horizon controllers operate over a horizon of only s = 30 steps and are re-applied every s steps. Their worst-case disturbances in 3a and 3b are also generated every s steps, resulting in correlated disturbances only within each s steps. Our findings highlight the infinite horizon DR-RO controller s superior performance over all four scenarios. Note that extending the horizon of the SDP for longer horizons to come closer to the infinite horizon performance is extremely computationally inefficient. These underscore the advantages of using the infnite horizon DR-RO controller. 7. Future Work Our work presents a complete framework for solving the DR control problem in the full-information setting. Future generalizations would address our limitations. One is to extend the rational approximation method from single to multi input systems. Another is to extend the results to partially observable systems where the state is not directly accessible. Finally, it would be useful to incorporate adaptation as the controller learns disturbance statistics through observations. Distributionally Robust Infinite-Horizon Control Impact Statement This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here. Aolaritei, L., Fochesato, M., Lygeros, J., and D orfler, F. Wasserstein tube mpc with exact uncertainty propagation. In 2023 62nd IEEE Conference on Decision and Control (CDC), pp. 2036 2041, 2023a. doi: 10.1109/CDC49753. 2023.10383526. Aolaritei, L., Lanzetti, N., and D orfler, F. Capture, propagate, and control distributional uncertainty. In 2023 62nd IEEE Conference on Decision and Control (CDC), pp. 3081 3086, 2023b. doi: 10.1109/CDC49753.2023. 10383424. Arjovsky, M., Chintala, S., and Bottou, L. Wasserstein generative adversarial networks. In Precup, D. and Teh, Y. W. (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 214 223. PMLR, 06 11 Aug 2017. URL https://proceedings.mlr. press/v70/arjovsky17a.html. Bhatia, R., Jain, T., and Lim, Y. On the Bures-Wasserstein distance between positive definite matrices, December 2017. URL http://arxiv.org/abs/1712. 01504. ar Xiv:1712.01504 [math]. Blanchet, J., Kuhn, D., Li, J., and Taskesen, B. Unifying Distributionally Robust Optimization via Optimal Transport Theory, August 2023. URL http://arxiv.org/ abs/2308.05414. ar Xiv:2308.05414 [math, stat]. Blau, Y. and Michaeli, T. Rethinking lossy compression: The rate-distortion-perception tradeoff. In International Conference on Machine Learning, pp. 675 685, Long Beach, California, USA, July 2019. PMLR. Brouillon, J.-S., Martin, A., Lygeros, J., D orfler, F., and Trecate, G. F. Distributionally robust infinite-horizon control: from a pool of samples to the design of dependable controllers, 2023. Danskin, J. M. The theory of max-min, with applications. SIAM Journal on Applied Mathematics, 14(4):641 664, 1966. ISSN 00361399. URL http://www.jstor. org/stable/2946123. Didier, A., Sieber, J., and Zeilinger, M. N. A system-level approach to regret optimal control. IEEE Control Systems Letters, 6:2792 2797, 2022. Doyle, J., Glover, K., Khargonekar, P., and Francis, B. Statespace solutions to standard h2 and h control problems. In 1988 American Control Conference, pp. 1691 1696, Atlanta, GA, USA, June 1988. IEEE. doi: 10.23919/ACC. 1988.4789992. URL https://ieeexplore.ieee. org/document/4789992/. Doyle, J. C. Structured uncertainty in control system design. In 1985 24th IEEE Conference on Decision and Control, pp. 260 265, 1985. doi: 10.1109/CDC.1985.268842. Dumitrescu, B. Positive Trigonometric Polynomials and Signal Processing Applications. Signals and Communication Technology. Springer International Publishing, Cham, 2017. ISBN 978-3-319-53687-3 9783-319-53688-0. doi: 10.1007/978-3-319-53688-0. URL http://link.springer.com/10.1007/ 978-3-319-53688-0. Ephremidze, L. An Elementary Proof of the Polynomial Matrix Spectral Factorization Theorem, November 2010. URL http://arxiv.org/abs/1011. 3777. ar Xiv:1011.3777 [math]. Ephremidze, L., Saied, F., and Spitkovsky, I. M. On the Algorithmization of Janashia-Lagvilava Matrix Spectral Factorization Method. IEEE Transactions on Information Theory, 64(2):728 737, February 2018. ISSN 0018-9448, 1557-9654. doi: 10.1109/TIT.2017. 2772877. URL http://ieeexplore.ieee.org/ document/8105834/. Falconi, L., Ferrante, A., and Zorzi, M. A new perspective on robust performance for LQG control problems. In 2022 IEEE 61st Conference on Decision and Control (CDC), pp. 3003 3008, Cancun, Mexico, December 2022. IEEE. ISBN 978-1-66546-761-2. doi: 10.1109/CDC51059.2022.9992567. URL https:// ieeexplore.ieee.org/document/9992567/. Frank, M. and Wolfe, P. An algorithm for quadratic programming. Naval Research Logistics Quarterly, 3(1-2):95 110, March 1956. ISSN 0028-1441, 1931-9193. doi: 10.1002/nav.3800030109. URL https://onlinelibrary.wiley.com/doi/ 10.1002/nav.3800030109. Gao, R. and Kleywegt, A. J. Distributionally Robust Stochastic Optimization with Wasserstein Distance, April 2022. URL http://arxiv.org/abs/1604. 02199. ar Xiv:1604.02199 [math]. Goel, G. and Hassibi, B. Regret-optimal measurementfeedback control. In Learning for Dynamics and Control, pp. 1270 1280. PMLR, 2021a. Distributionally Robust Infinite-Horizon Control Goel, G. and Hassibi, B. Regret-optimal control in dynamic environments, February 2021b. URL http://arxiv. org/abs/2010.10473. ar Xiv:2010.10473 [cs, eess, math]. Goel, G. and Hassibi, B. Regret-optimal estimation and control. IEEE Transactions on Automatic Control, 68(5): 3041 3053, 2023. Goldstein, A. A. Convex programming in Hilbert space. Bulletin of the American Mathematical Society, 70(5): 709 710, 1964. Hajar, J., Kargin, T., and Hassibi, B. Wasserstein Distributionally Robust Regret-Optimal Control under Partial Observability. In 2023 59th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1 6, Monticello, IL, USA, September 2023a. IEEE. ISBN 9798350328141. doi: 10.1109/Allerton58177.2023. 10313386. URL https://ieeexplore.ieee. org/document/10313386/. Hajar, J., Sabag, O., and Hassibi, B. Regret Optimal Control under Partial Observability, November 2023b. URL http://arxiv.org/abs/2311. 06433. ar Xiv:2311.06433 [cs, eess, math]. Hakobyan, A. and Yang, I. Wasserstein distributionally robust control of partially observable linear systems: Tractable approximation and performance guarantee. In 2022 IEEE 61st Conference on Decision and Control (CDC), pp. 4800 4807. IEEE, 2022. Hansen, L. and Sargent, T. Robustness. Princeton University Press, 2008. ISBN 978-0-691-114422. URL https://books.google.com/books? id=7ww V27Tv R8EC. Hassibi, B., Sayed, A. H., and Kailath, T. Indefinite Quadratic Estimation and Control. Society for Industrial and Applied Mathematics, 1999. doi: 10.1137/ 1.9781611970760. URL https://epubs.siam. org/doi/abs/10.1137/1.9781611970760. Hu, Z. and Hong, L. J. Kullback-Leibler divergence constrained distributionally robust optimization, November 2012. URL https://optimization-online. org/2012/11/3677/. Jacobson, D. Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games. IEEE Transactions on Automatic Control, 18(2):124 131, April 1973. ISSN 00189286. doi: 10.1109/TAC.1973.1100265. URL http:// ieeexplore.ieee.org/document/1100265/. Jaggi, M. Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization. In Dasgupta, S. and Mc Allester, D. (eds.), Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pp. 427 435, Atlanta, Georgia, USA, June 2013. PMLR. URL https://proceedings.mlr.press/v28/ jaggi13.html. Issue: 1. Kailath, T., Sayed, A. H., and Hassibi, B. Linear estimation. Prentice Hall information and system sciences series. Prentice Hall, Upper Saddle River, N.J, 2000. ISBN 9780-13-022464-4. Kalman, R. E. A New Approach to Linear Filtering and Prediction Problems. Journal of Basic Engineering, 82 (1):35 45, March 1960. ISSN 0021-9223. doi: 10.1115/ 1.3662552. Kargin, T., Hajar, J., Vikrant, M., and Hassibi, B. Wasserstein distributionally robust regret-optimal control over infinite-horizon. In Abate, A., Margellos, K., and Papachristodoulou, A. (eds.), Proceedings of The 6th Annual Conference on Learning for Dynamics and Control, volume 242 of Proceedings of Machine Learning Research. PMLR, 15 17 July 2024. Kim, K. and Yang, I. Distributional robustness in minimax linear quadratic control with wasserstein distance. SIAM Journal on Control and Optimization, 61(2):458 483, 2023. doi: 10.1137/22M1494105. URL https:// doi.org/10.1137/22M1494105. Kuhn, D., Esfahani, P. M., Nguyen, V. A., and Shafieezadeh Abadeh, S. Wasserstein distributionally robust optimization: Theory and applications in machine learning. In Operations research & management science in the age of analytics, pp. 130 166. Informs, 2019. Lacoste-Julien, S. and Jaggi, M. An Affine Invariant Linear Convergence Analysis for Frank-Wolfe Algorithms, January 2014. URL http://arxiv.org/abs/1312. 7864. ar Xiv:1312.7864 [math]. Lei, E., Hassani, H., and Bidokhti, S. S. Out-of-distribution robustness in deep learning compression. ar Xiv preprint ar Xiv:2110.07007, October 2021. Leibfritz, F. and Lipinski, W. Description of the benchmark examples in compleib 1.0. Dept. Math, Univ. Trier, Germany, 32, 2003. Liu, R., Shi, G., and Tokekar, P. Data-driven distributionally robust optimal control with state-dependent noise. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 9986 9991, 2023. doi: 10.1109/IROS55552.2023.10342392. Distributionally Robust Infinite-Horizon Control Liu, Y., Zhu, L., Yamada, M., and Yang, Y. Semantic Correspondence as an Optimal Transport Problem. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4462 4471, Seattle, WA, USA, June 2020. IEEE. ISBN 978-1-72817-168-5. doi: 10.1109/CVPR42600.2020.00452. URL https:// ieeexplore.ieee.org/document/9156442/. Lotidis, K., Bambos, N., Blanchet, J., and Li, J. Wasserstein Distributionally Robust Linear-Quadratic Estimation under Martingale Constraints. In Ruiz, F., Dy, J., and van de Meent, J.-W. (eds.), Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 of Proceedings of Machine Learning Research, pp. 8629 8644. PMLR, April 2023. URL https://proceedings.mlr.press/ v206/lotidis23a.html. Malik, V., Kargin, T., Kostina, V., and Hassibi, B. A Distributionally Robust Approach to Shannon Limits using the Wasserstein Distance, May 2024. URL http:// arxiv.org/abs/2405.06528. ar Xiv:2405.06528 [cs, math]. Martin, A., Furieri, L., Dorfler, F., Lygeros, J., and Ferrari Trecate, G. Safe control with minimal regret. In Learning for Dynamics and Control Conference, pp. 726 738. PMLR, 2022. Martinelli, D., Martin, A., Ferrari-Trecate, G., and Furieri, L. Closing the Gap to Quadratic Invariance: a Regret Minimization Approach to Optimal Distributed Control, November 2023. URL http://arxiv.org/abs/ 2311.02068. ar Xiv:2311.02068 [cs, eess]. Ozaydin, B., Zhang, T., Bhattacharjee, D., S usstrunk, S., and Salzmann, M. OMH: Structured Sparsity via Optimally Matched Hierarchy for Unsupervised Semantic Segmentation, April 2024. URL http://arxiv.org/abs/ 2403.06546. ar Xiv:2403.06546 [cs]. Petersen, I., James, M., and Dupuis, P. Minimax optimal control of stochastic uncertain systems with relative entropy constraints. IEEE Transactions on Automatic Control, 45(3):398 412, March 2000. ISSN 00189286. doi: 10.1109/9.847720. URL http://ieeexplore. ieee.org/document/847720/. Prabhat, H. and Bhattacharya, R. Optimal State Estimation in the Presence of Non-Gaussian Uncertainty via Wasserstein Distance Minimization, March 2024. URL http://arxiv.org/abs/2403. 13828. ar Xiv:2403.13828 [cs, eess, math, stat]. Rino, C. Factorization of spectra by discrete Fourier transforms (Corresp.). IEEE Transactions on Information Theory, 16(4):484 485, July 1970. ISSN 0018- 9448. doi: 10.1109/TIT.1970.1054502. URL http:// ieeexplore.ieee.org/document/1054502/. Sabag, O. and Hassibi, B. Regret-Optimal Filtering for Prediction and Estimation. IEEE Transactions on Signal Processing, 70:5012 5024, 2022. ISSN 1053-587X, 19410476. doi: 10.1109/TSP.2022.3212153. URL https:// ieeexplore.ieee.org/document/9911672/. Sabag, O., Goel, G., Lale, S., and Hassibi, B. Regretoptimal controller for the full-information problem. In 2021 American Control Conference (ACC), pp. 4777 4782. IEEE, 2021. Sabag, O., Goel, G., Lale, S., and Hassibi, B. Regret-optimal lqr control, 2023. Samuelson, S. and Yang, I. Data-driven distributionally robust control of energy storage to manage wind power fluctuations. In 2017 IEEE Conference on Control Technology and Applications (CCTA), pp. 199 204, 2017. doi: 10.1109/CCTA.2017.8062463. Sayed, A. H. and Kailath, T. A survey of spectral factorization methods. Numerical Linear Algebra with Applications, 8(6-7):467 496, September 2001. ISSN 1070-5325, 1099-1506. doi: 10.1002/ nla.250. URL https://onlinelibrary.wiley. com/doi/10.1002/nla.250. Shafieezadeh Abadeh, S., Nguyen, V. A., Kuhn, D., and Mohajerin Esfahani, P. M. Wasserstein distributionally robust kalman filtering. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. URL https://proceedings.neurips. cc/paper_files/paper/2018/file/ 15212f24321aa2c3dc8e9acf820f3c15-Paper. pdf. Sharon, N., Peiris, V., Sukhorukova, N., and Ugon, J. Flexible rational approximation and its application for matrix functions. ar Xiv preprint ar Xiv:2108.09357, 2021. Speyer, J., Deyst, J., and Jacobson, D. Optimization of stochastic linear systems with additive measurement and process noise using exponential performance criteria. IEEE Transactions on Automatic Control, 19(4):358 366, August 1974. ISSN 0018-9286. doi: 10.1109/TAC.1974. 1100606. URL http://ieeexplore.ieee.org/ document/1100606/. Taha, F. A., Yan, S., and Bitar, E. A Distributionally Robust Approach to Regret Optimal Control using the Wasserstein Distance. In 2023 62nd IEEE Conference on Decision and Control (CDC), pp. 2768 2775, Singapore, Singapore, December 2023. IEEE. Distributionally Robust Infinite-Horizon Control ISBN 9798350301243. doi: 10.1109/CDC49753.2023. 10384311. URL https://ieeexplore.ieee. org/document/10384311/. Taskesen, B., Iancu, D., Kocyigit, C., and Kuhn, D. Distributionally robust linear quadratic control. In Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., and Levine, S. (eds.), Advances in Neural Information Processing Systems, volume 36, pp. 18613 18632. Curran Associates, Inc., 2023. URL https://proceedings.neurips. cc/paper_files/paper/2023/file/ 3b7a66b2d1258e892c89f485b8f896e0-Paper-Conference. pdf. Tzortzis, I., Charalambous, C. D., and Charalambous, T. Dynamic programming subject to total variation distance ambiguity. SIAM Journal on Control and Optimization, 53(4):2040 2075, 2015. doi: 10.1137/140955707. URL https://doi.org/10.1137/140955707. Tzortzis, I., Charalambous, C. D., Charalambous, T., Kourtellaris, C. K., and Hadjicostis, C. N. Robust Linear Quadratic Regulator for uncertain systems. In 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 1515 1520, Las Vegas, NV, USA, December 2016. IEEE. ISBN 978-1-5090-1837-6. doi: 10.1109/CDC. 2016.7798481. URL http://ieeexplore.ieee. org/document/7798481/. van der Grinten, P. M. E. M. Uncertainty in measurement and control. Statistica Neerlandica, 22(1):43 63, 1968. doi: https://doi.org/10.1111/j.1467-9574.1960.tb00617. x. Villani, C. Optimal transport: old and new. Number 338 in Grundlehren der mathematischen Wissenschaften. Springer, Berlin, 2009. ISBN 978-3-540-71049-3. OCLC: ocn244421231. Whittle, P. Risk-sensitive linear/quadratic/gaussian control. Advances in Applied Probability, 13 (4):764 777, December 1981. ISSN 00018678, 1475-6064. doi: 10.2307/1426972. URL https://www.cambridge.org/core/ product/identifier/S0001867800036508/ type/journal_article. Wiener, N. and Hopf, E. Uber eine Klasse singul arer Integralgleichungen. Sitzungsberichte der Preussischen Akademie der Wissenschaften. Physikalischmathematische Klasse. Akad. d. Wiss., 1931. Wilson, G. T. The Factorization of Matricial Spectral Densities. SIAM Journal on Applied Mathematics, 23(4):420 426, 1972. ISSN 00361399. doi: 10.1137/0123044. URL http://www.jstor.org/ stable/2100089. Publisher: Society for Industrial and Applied Mathematics. Yan, S., Taha, F. A., and Bitar, E. A distributionally robust approach to regret optimal control using the wasserstein distance, 2023. Yang, I. Wasserstein distributionally robust stochastic control: A data-driven approach. IEEE Transactions on Automatic Control, 66(8):3863 3870, 2020. Youla, D., Jabr, H., and Bongiorno, J. Modern Wiener-Hopf design of optimal controllers Part II: The multivariable case. IEEE Transactions on Automatic Control, 21(3): 319 338, June 1976. ISSN 0018-9286. doi: 10.1109/ TAC.1976.1101223. URL http://ieeexplore. ieee.org/document/1101223/. Zames, G. Feedback and optimal sensitivity: Model reference transformations, multiplicative seminorms, and approximate inverses. IEEE Transactions on Automatic Control, 26(2):301 320, April 1981. ISSN 00189286. doi: 10.1109/TAC.1981.1102603. URL http:// ieeexplore.ieee.org/document/1102603/. Zhao, C. and Guan, Y. Data-driven risk-averse stochastic optimization with Wasserstein metric. Operations Research Letters, 46(2):262 267, March 2018. ISSN 01676377. doi: 10.1016/j.orl.2018. 01.011. URL https://linkinghub.elsevier. com/retrieve/pii/S0167637718300506. Zhong, Z. and Zhu, J.-J. Nonlinear wasserstein distributionally robust optimal control. In ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems, 2023. URL https://openreview.net/forum? id=4Zh Pz V5boz. Zhou, K., Doyle, J. C., and Glover, K. Robust and optimal control. Prentice Hall, Upper Saddle River, N.J, 1996. ISBN 978-0-13-456567-5. Distributionally Robust Infinite-Horizon Control A. Organization of the Appendix This appendix is organized into several sections: First, Appendix B provides notations, definitions, and remarks about the problem formulation and uniqueness of the spectral factorization. Next, Appendix C and Appendix D contain proofs of the duality and optimality theorems in Section 3. Subsequently, Appendix E is dedicated to proofs of lemmas and theorems related to the efficient algorithm discussed in Section 4, and Appendix F describes the pseudo-code of the algorithm. Further, Appendix G contains the proof of the state-space representation of the controller presented in Section 5. Finally, additional simulation results are presented in Appendix I. B. Notations, Definitions and Remarks B.1. Notations In the paper, we use the notations in Table 2 for brevity. B.2. Explicit Form of Finite-Horizon State Space Model Consider the restrictions of the infinite-horizon dynamics in (3) to the finite horizon as s T = FT u T + GT w T . (31) The causal linear measurement model for the finite-horizon case in (31) can be stated explicitly as follows: s0 s1 s2 ... s T 0 0 0 . . . 0 Bu 0 0 . . . 0 ABu Bu 0 . . . 0 ... ... ... ... ... AT 1Bu AT 2Bu AT 3Bu ... 0 u0 u1 u2 ... u T 0 0 0 . . . 0 Bw 0 0 . . . 0 ABw Bw 0 . . . 0 ... ... ... ... ... AT 1Bw AT 2Bw AT 3Bw ... 0 w0 w1 w2 ... w T | {z } w T (32) B.3. A Note about Robustness to Disturbances vs Robustness to Model Uncertainties In our approach, we consider distributional robustness against disturbances, which provides flexibility, adaptability, and dynamic responses to unforeseen events. Although we do not explicitly address model uncertainties, these uncertainties can be effectively lumped together as disturbances a technique known as uncertainty/disturbance lumping. This approach is particularly effective when the model uncertainties are relatively small. By treating parameter uncertainties as disturbances, we simplify system analysis and ensure that the controller is robust not only to known uncertainties but also to unexpected variations and modeling errors. B.4. A note about the Uniqueness of the spectral factor L In Theorem 3.2, given that L is the causal and causally invertible spectral factor of M = LL , it is unique up to a unitary transformation of its block-elements from the right. Fixing the choice of the unitary transformation in the spectral factorization (eg. positive-definite factors at infinity (Ephremidze et al., 2018)) results in a unique L. Distributionally Robust Infinite-Horizon Control Symbol Description xt State at time t st Regulated output at time t ut Control input at time t wt Exogenous disturbance at time t A State transition matrix Bu Control input matrix Bw Disturbance input matrix C Regulated output matrix R Control input cost matrix FT Finite-horizon operator for control input GT Finite-horizon operator for disturbance F Infinite-horizon operator for control input G Infinite-horizon operator for disturbance Euclidean norm 2 H2 (Frobenius) norm H (operator) norm E Expectation K Set of causal (online) and time-invariant DFC policies KT Set of causal DFC policies over a horizon T RK Regret operator M Auto-covariance operator for disturbances L Unique, causal and causally invertible spectral factor of M = LL N The unique positive definite operator equal to L L W2 Wasserstein-2 metric S Symmetric positive polynomial matrix tr( ) Trace of a Toeplitz operator { }+ Causal part of an operator { } Strictly anti-causal part of an operator M, or M 1 2 Symmetric positive square root of an operator or matrix S+ n The set of positive semidefinite matrices Tm,+ The set of positive trigonometric polynomials of degree m Table 2: Notation Table C. Proof of Duality Theorems C.1. Proof of Strong Duality The strong duality follows from the proof of Theorem 5 in (Kargin et al., 2024). For completeness, we provide the following proof sketch. We prove Theorem 5 (Strong Duality) in multiple steps. First, we provide a finite-horizon counterpart of the strong duality result from (Taha et al., 2023). Then, we reformulate the objective functions of both the finite-horizon and infinite-horizon dual problems using normalized spectral measures. We demonstrate the pointwise convergence of the finite-horizon dual objective function to the infinite-horizon objective by analyzing the limiting behavior of the spectrum of Toeplitz matrices. Finally, we show that the infinite-horizon dual problem attains a finite value and that the limits of the optimal values (and solutions) of the finite-horizon dual problem coincide with those of the infinite-horizon dual problem. Step 1: Finite-Horizon Duality. For finite-horizon problems, we state a strong duality result adapted from (Taha et al., 2023). The average worst-case expected regret for a finite-horizon controller KT is shown to be equivalent to a dual convex problem. Furthermore, the worst-case disturbance can be identified from the nominal disturbance. Step 2: Normalized Spectral Measure. We introduce an alternative formulation for the finite-horizon and infinitehorizon problems using normalized spectral measures. The normalized spectral measure of the matrix CK,T is defined, Distributionally Robust Infinite-Horizon Control and a function f(γ, λ) is used to express the objective functions. The finite-horizon objective function FT (γ) and the infinite-horizon objective function F(γ) are then formulated in terms of these spectral measures. Step 3: Asymptotic Equivalence of Objective Functions. We show that the finite-horizon matrix CK,T and the finitedimensional block Toeplitz matrix CK,T are asymptotically equivalent. Using Szego s limit theorems for block Toeplitz matrices, we demonstrate that the normalized spectral measures converge pointwise. This leads to the convergence of the finite-horizon objective function FT (γ) to the infinite-horizon objective function F(γ). Step 4: Asymptotic Equivalence of the Infimum and the Solution. Finally, we use Γ-convergence to show that the infima and the optimal solutions of the finite-horizon problems converge to those of the infinite-horizon problem. This establishes that lim T infγ 0 FT (γ) = infγ 0 F(γ) and lim T γ ,T = γ . D. Proof of Optimality Theorems D.1. Proof of Theorem 3.1 This result is proven in detail in Kargin et al. (2024, Appendix A1). Due to its length, we provide only a brief sketch here. Interested readers can refer to Kargin et al. (2024) for the complete proof. D.2. Proof of Theorem 3.2 The proof involves four main steps. Reformulation using Lemma D.1: Using Lemma D.1, we reformulate the original optimization problem. This lemma allows the expression of the convex mapping X 7 tr(X 1) using Fenchel duality, which transforms the objective function into a form that involves the supremum over a positive semi-definite matrix M and the only term depending on K remains tr(RKM). Application of Wiener-Hopf Technique: We then Lemma D.2, which provides a method to approximate a non-causal controller by a causal one, minimizing the cost tr(RKM). The optimal causal controller K is derived using the Weiner-Hopf Technique. Karush-Kuhn-Tucker (KKT) Conditions: We then find the conditions on the optimal M. This involves simplifying the objective function and finding the optimal Mγ and Kγ for the level γ. Final Reformulation and Duality: We further simplify the problem and apply strong duality to achieve the final form. The optimal K is then derived from the Wiener-Hopf technique, with γ and M obtained through duality arguments. Before proceeding with the proof, we first state two useful lemmas Lemma D.1. The convex mapping X 7 tr X 1 for X 0 can be expressed via Fenchel duality as sup M 0 tr(XM) + 2 tr( ( tr(X 1), if X 0 + , o.w. (33) Proof. Observe that the objective tr(XM) + 2 tr( M) is concave in M, and the expression on the right-hand side can be obtained by solving for M. When X 0, i.e., X may have negative eigenvalues, then the expression tr(XM) can be made arbitrarily negative, and tr( M) arbitrarily large, by chosing an appropriate M. The following lemma will be useful in the proof of Theorem 3.2. Lemma D.2 (Wiener-Hopf Technique (Kailath et al., 2000)). Consider the problem of approximating a non causal controller K by a causal controller K, such that K minimises the cost tr(RKM), i.e., inf K K tr(RKM) (34) Distributionally Robust Infinite-Horizon Control where M 0, RK = (K K ) (K K ) and K is the non-causal controller that makes the objective defined above zero. Then, the solution to this problem is given by K = 1 { K L}+ L 1, (35) where L is the unique causal and causally invertible spectral factor of M such that M = LL and { }+ denotes the causal part of an operator. Alternatively, the controller can be written as, K = KH2 + 1 {{ K } L }+ , (36) where KH2 := 1{ K }+. Proof. Let L be the unique causal and causally invertible spectral factor of M, i.e. M = LL . Then, using the cyclic property of tr, the objective can be written as, inf K K tr( (K K ) M (K K ) ) = inf K K tr(( K K ) LL ( K K ) ) (37) = inf K K tr(( KL K L) ( KL K L) ) (38) = inf K K KL K L 2 F , (39) where A 2 F represents the square of the Frobenium norm of a matrix. Since , K and L are causal, and K L can be broken into causal and non-causal parts, it is evident that the controller that minimizes the objective is the one that makes the term KL K L strictly anti-causal, cancelling off the causal part of K L. This means that the optimal controller satisfies, K L = { K L}+ . (40) Also, since L 1 and 1 are causal, the optimal causal controller is given by (35). Finally, using the fact that K = { K }+ + { K } and simplifying, we get (36). Proof of Theorem 3.2. We first simplify our optimization problem (13) using Lemma D.1. We then find the conditions on the optimal optimization variables using Karush-Kuhn-Tucker (KKT) conditions. Using Lemma D.1, we can write, inf K K , γI RK tr((I γ 1RK) 1M ) = inf K K sup M 0 tr((I γ 1RK)M) + 2 tr q = sup M 0 tr(M) + 2 tr q + inf K K γ 1 tr(RKM) Fixing γ 0, we focus on the reduced subproblem of (14), sup M 0 γ tr(M) γ tr(M ) + 2γ tr q + inf K K tr(RKM). (41) Using the definition of the Bures-Wasserstein distance, we can reformulate (41) as sup M 0 inf K K tr(RKM) γ BW(M, M )2 := sup M 0 Φ(M). (42) Thus, the original formulation in (14) can be expressed as inf γ 0 sup M 0 inf K K tr(RKM) + γ r2 BW(M, M )2 . (43) Note that the objective above is affine in γ 0 and strictly concave in M. Moreover, primal and dual feasibility hold, enabling the exchange of infγ 0 sup M 0 resulting in sup M 0 inf K K inf γ 0 tr(RKM) + γ r2 BW(M, M )2 , (44) Distributionally Robust Infinite-Horizon Control where the inner minimization over γ reduces the problem to its constrained version in Equation (15). Finally, the form of the optimal K follows from the Wiener-Hopf technique in Lemma D.2 and the optimal γ and M can be obtained using the strong duality result in (C). To see the optimal form of M , consider the gradient of Φ(M) in (42) with respect to M and setting it to 0. Using Danskin theorem (Danskin, 1966), we have, 1 2 I + γ 1RK = 0. (45) Taking inverse on both sides, we get, 2 = I γ 1RK 1 . (46) We can now obtain two equations. First, by right multiplying by M 1 2 and second, by left multiplying by M 1 2 . On multiplying these two equations together and simplifying, we get (16b). E. Proofs related to the Efficient Algorithm in Section 4 E.1. Proof of Lemma 4.1 With T := { K } , the optimality condition in (16) takes the equivalent form: i. M = I γ 1 RK 2 , (47a) ii. RK =L {T L } {T L } L 1 , (47b) iii. tr h (I γ 1 RK (z)) 1 I 2i = r2, (47c) Using the spectral factorization M L L , the conditions i. and ii. can be equivalently re-expressed as i. (L L ) 1/2 = I γ 1 RK (48) ii. RK =L {T L } {T L } L 1 (49) By plugging ii. into i., we get 0 = I (L L ) 1/2 γ 1 L {T L } {T L } L 1 = 0, (50) Multiplying by L from the left and by L from the right, we get 0 = L L (L L )1/2 γ 1 {T L } {T L } , where we used the identity L (L L ) 1/2L = (L L )1/2. Letting N = L L , this expression can be solved for N , yielding the following implicit equation, N = L L = 1 I + 4γ 1 {T L } {T L } implying thus (17), with γ >0 satisfying tr ((I γ 1 RK ) 1 I)2 = r2 (or equivalently, BW(L L , I) = r). E.2. Note on Frequency Domain Representation of Toeplitz Operators We start this section of the appendix by justifying our choice of working out our results in the frequency domain. Let V = [Vij] i,j= be a doubly infinite block matrix, i.e.a Toeplitz operator, which represents a discrete, linear, timeinvariant system (i.e.Vij = Vi j), and which maps a sequence of inputs to a sequence of outputs. In this case of a time-invariant system, the representation of the operator in the z-domain (or the so-called bilateral z-transform) is i= Viz i, (52) Distributionally Robust Infinite-Horizon Control defined for the regions of the complex plane where the above series converges absolutely, known as the ROC: region of convergence. V (z) is also known as the transfer matrix. The causality of V can be readily given in terms of V (z). Indeed, we have the following: V is causal if and only if V (z) is analytic in the exterior of some annulus, |z| > α > 0. Likewise, V is anticausal if and only if V (z) is analytic in the interior of some annulus, |z| < α < 0. Moreover, V is strictly causal (anticausal) if and only if it is causal (anticausal) and V ( ) = 0(V (0) = 0). We also define the trace of a Toeplitz operator M as follows 0 Tr(M(ejω))dω. (53) In the coming sections, we use the frequency domain counterparts of our Toeplitz operators (such as F, G, M...) by setting z = ejω for ω [0, 2π). E.3. Frequency-Domain Characterization of the Optimal Solution of Problem 2.3 We present the frequency-domain formulation of the saddle point (K , M ) derived in Theorem 3.2 to reveal the structure of the solution. We first introduce the following useful results: Denoting by M (z) and RK (z) the transfer functions corresponding to the optimal M and RK , respectively, the optimality conditions in (16) and (17) take the equivalent forms: i. M (z) = I γ 1 RK (z) 2 , (54a) ii. RK (z)=L (z) {T L } (z) {T L } (z)L (z) 1, (54b) iii. tr h (I γ 1 RK (z)) 1 I 2i = r2, , (54c) iv. N (z) = L (z) L (z) = 1 I + 4γ 1 {T L } (z) {T L } (z) 2 , (54d) t=0 b L ,tz t (55) is the transfer function corresponding to the causal canonical factor L and T = { K } is the strictly anticausal operator where its transfer function, T(z), is found from the following: Lemma E.1 (Adapted from lemma 4 in (Sabag et al., 2021)). The transfer function (z)K (z) can be written as the sum of a causal and strictly anticausal transfer functions: (z)K (z) = T(z) + U(z) (56) T(z) = { K } (z) = C(z 1I A) 1B (57) U(z) = { K }+(z) = (z)KH2(z) = CP(A(z I A) 1 + I)Bw (58) where KH2(z) = 1{ K }+(z), and Klqr :=(I+B u PBu) 1B u PA, with P 0 is the unique stabilizing solution to the LQR Riccati equation P = Q + A PA A PBu(I + B u PBu) 1B u PA, Q = C C, Ak = A Bu Klqr, and A = A k, B = A k PBw, C = (I + B u PBu) /2B u. (59) Notice that given the causal L(z) and strictly anti-causal T(z), the strictly anti-causal part {T(z)L(z)} has a state space representation, shown in the following lemma. Lemma E.2. Let L be a causal operator. The strictly anti-causal operator {T L} possesses a state space representation as follows: {T L} (z) = C(z 1I A) 1Γ, (60) 0 (I ejωA) 1BL(ejω)dω. (61) Distributionally Robust Infinite-Horizon Control Proof. Let L(z) = P t=0 b Ltz t be the transfer function of L. Using equations (56) and (57),(58), S(z) := { K L} (z), can be written as: S(z) = {TL} (z) + {UL} (z) (62) C(z I A) 1BL(z) t=0 z(t+1)A t B m=0 b Lmz m ) t=0 z(t+1)A t ! X (d) = C(z 1I A) 1Γ (66) Here, (a) holds because both U(z) and L(z) are causal, so the strictly anticausal part of U(z)L(z) is zero. (b) holds as we do the Neumann series expansion of (z I A) and replace L(z) by its equation (55). (c) holds as we take the anticausal part of expression (64) to be the strictly positive exponents of z. (d) completes the result by using the Neuman series again, defining Γ := P t=0 A t Bb Lt, and leveraging Parseval s theorem to conclude the equation of the finite parameter 0 (I ejωA) 1BL(ejω)dω. (67) Proof of Theorem 4.2: Using Lemma E.2, and plugging (60) into (54d), the frequency-domain optimality equation (54d) can be reformulated explicitly as follows N (z) = L (z) L (z) = 1 I + 4γ 1 Γ (z 1I A) C C(z 1I A) 1Γ where Γ as in (67), and γ >0 satisfying tr ((I γ 1 RK ) 1 I)2 = r2 (or equivalently, BW(L L , I) = r), which gives the desired result. Proof of Corollary 4.3: Notice that the rhs of (68) involves the positive definite square-root of the rational term Γ (z 1I A) C C(z 1I A) 1Γ . The square root does not preserve rationality in general, implying the desired result. E.4. Proof of Theorem 4.5 Before proceeding with the proof, we state the following useful lemma. Lemma E.3. For a positive-definite Toeplitz operator M 0 with tr(M) < and tr(log(M)) > , let M 7 Φ(M) be a mapping defined as Φ(M) inf K K tr (RKM) . (69) Denote by M = LL and = I + F F the canonical spectral factorizations where L, as well as their inverses L 1, 1 are causal operators. The following statements hold: i. The function Φ can be written in closed form as Φ(M) = tr { K L} { K L} . (70) ii. The gradient of Φ has the following closed form Φ(M) = RK = L { K L} { K L} L 1. (71) iii. The function Φ is concave, positively homogeneous, and Φ(M) = tr(M Φ(M)). (72) Distributionally Robust Infinite-Horizon Control Proof of Theorem 4.5. Our proof of convergence follows closely from the proof technique used in (Jaggi, 2013). In particular, since the unit circle is discretized and the computation of the gradients are approximate, the linear suboptimal problem is solved upto an approximation, δN which depends on the problem parameters and the discretization level N. Namely, for a large enough N, we have tr( Φ(Mk) f Mk+1) sup M Ωr tr( Φ(Mk)M) δN (73) Ωr := {M 0 | tr(M 2 M + I) r2}, (74) Therefore, using Theorem 1 of (Jaggi, 2013), we obtain Φ(M ) Φ(Mk) 2κ k + 2(1 + δN). (75) where κ > 0 is the so-called curvature constant associated with the problem which is defined as follows M, f M Ωr η [0,1] M =M+η( f M M) 2 η2 [ Φ(M ) + Φ(M) + tr( Φ(M) (M M))] , (76) M, f M Ωr η [0,1] M =M+η( f M M) 2 η2 (tr(M Φ(M)) Φ(M )) , (77) M, f M Ωr η [0,1] M =M+η( f M M) inf K K 2 η2 tr (M ( Φ(M) RK)) (78) where the last two equalities follow from Lemma E.3. F. Algorithms F.1. Pseudocode for Frequency-domain Iterative Optimization Method Solving (15) The pseudocode for Frequency-domain tterative optimization method is presented in Algorithm 1. F.2. Additional Discussion on the Computation of Gradients By the Wiener-Hopf technique discussed in Lemma D.2, the gradient Rk = Φ(Mk) can be obtained as Rk(z) = Lk(z) { K Lk} (z) { K Lk} (z)Lk(z) 1, (79) where Lk L k = Mk is the unique spectral factorization. Furthermore, using (66),(67), we can reformulate the gradient Rk(z) more explicitly as Rk(z) = Lk(z) Γ k(I z A) C C(I z A) 1Γk Lk(z) 1, (80) where Γk = 1 2π R 2π 0 (I ejωA) 1BLk(ejω)dω as in (67). Here, the spectral factor Lk(z) is obtained for z TN by Appendix F.3. Similarly, the parameter Γk can be computed numerically using trapezoid rule over the discrete domain TN, i.e., z TN (I z A) 1BLk(z). (81) The gradient Rk(z) can thus be efficiently computed for z TN. Distributionally Robust Infinite-Horizon Control Algorithm 1 Frequency-domain iterative optimization method solving (15) 1: Input: Radius r>0, state-space model (A, Bu, Bw), discretizations N >0 and N >0 tolerance ϵ>0 2: Compute (A, B, C) from (A, Bu, Bw) using (59) 3: Generate frequency samples TN := {ej2πn/N | n=0, . . . , N 1} 4: Initialize M0(z) I for z TN, and k 0 5: repeat 6: Set the step size ηk 2 k+2 7: Compute the spectral factor Lk(z) Spectral Factor(Mk) (see Appendix F.3) 8: Compute the parameter Γk 1 z TN (I z A) 1BLk(z). (see Appendix F.2) 9: Compute the gradient for z TN (see Appendix F.2) Rk(z) Lk(z) { K Lk} (z) { K Lk} (z)Lk(z) 1 10: Solve the linear subproblem (21a) via bisection (see Appendix F.4) f Mk(z) (I γ 1 k Rk(z)) 2 for z TN and γk through Bisection 11: Set Mk+1(z) (1 ηk)Mk(z) + ηk f Mk(z) for z TN. 12: Increment k k + 1 13: until Mk+1 Mk / Mk ϵ 14: Compute Nk(z) = 1 I + 4γ 1 k Γ k(z 1I A) C C(z 1I A) 1Γk 2 for z TN := {ej2πn/N | n = 0, . . . , N 1} 15: Compute K(z) Rational Approximate(Nk(z)) (see Appendix F.5) F.3. Implementation of Spectral Factorization To perform the spectral factorization of an irrational function M(z), we use a spectral factorization method via discrete Fourier transform, which returns samples of the spectral factor on the unit circle. First, we compute Λ(z) for z TN, which is defined to be the logarithm of M(z), then we take the inverse discrete Fourier transform λk for k = 0, . . . , N 1 of Λ(z) which we use to compute the spectral factorization as k=1 λkz k n + 1 for k = 0, . . . , N 1 where zn = ej2πn/N . The method is efficient without requiring rational spectra, and the associated error term, featuring a purely imaginary logarithm, rapidly diminishes with an increased number of samples. It is worth noting that this method is explicitly designed for scalar functions. Algorithm 2 Spectral Factor: Spectral Factorization via DFT 1: Input: Scalar positive spectrum M(z) > 0 on TN := {ej2πn/N | n=0, . . . , N 1} 2: Output: Causal spectral factor L(z) of M(z) > 0 on TN 3: Compute the cepstrum Λ(z) log(M(z)) on z TN. 4: Compute the inverse DFT λk IDFT(Λ(z)) for k = 0, . . . , N 1 5: Compute the spectral factor for zn = ej2πn/N k=1 λkz k n + 1 , n = 0, . . . , N 1 Distributionally Robust Infinite-Horizon Control F.4. Implementation of Bisection Method To find the optimal parameter γk that solves tr ((I γ 1 k Rk) 1 I)2 = r2 in the Frank-Wolfe update (22), we use a bisection algorithm. The pseudo code for the bisection algorithm can be found in Algorithm 3. We start off with two guesses of γ i.e.(γleft, γright) with the assumption that the optimal γ lies between the two values (without loss of generality). Algorithm 3 Bisection Algorithm 1: Input: h(γ), γright, γleft 2: Compute the gradient at γright: h(γ)|γright 3: while | γright γleft |> ϵ do 4: Calculate the midpoint γmid between γleft and γright 5: Compute the gradient at γmid: h(γ)|γmid 6: if h(γ)|γmid = 0 then 7: return γmid 8: else if h(γ)|γmid > 0 then 9: Update γright to γmid 10: else 11: Update γleft to γmid 12: end if 13: end while 14: return the average of γleft and γright F.5. Implementation of Rational Approximation We present the pseudocode of Rational Approximation. Algorithm 4 Rational Approximation 1: Input: Scalar positive spectrum N(z) > 0 on TN := {ej2πn/N | n=0, . . . , N 1}, and a small positive scalar ϵ 2: Output: Causal controller K(z) on TN 3: Get P(z), Q(z) by solving the convex optimization in (26), for fixed ϵ, given M(z), i.e.: min p0,...pm R,q0,...qm R,ε 0 ε (82) s.t. q0 = 1, P(z), Q(z) > 0, P(z) (N(z)+ϵ) Q(z) 0, P(z) (N(z)ϵ) Q(z) 0 z TN 4: Get the rational spectral factors of P(z), Q(z), which are SP (z), SQ(z) using the canonical Factorization method in (Sayed & Kailath, 2001) 5: Get Lr(z),the rational spectral factor of N(z), as SP (z)/SQ(z) 6: Get K(z) from the formulation in (30),(101) G. Proof of the State-Space Representation of the Controller G.1. Proof of Lemma 5.7 Let the spectral factor L(z) in rational form be given as L(z) = (I + C(z I A) 1 B) D1/2, (83) with its inverse given by: L 1(z) = D 1/2(I C(z I ( A B C)) 1 B), (84) and its operator form denoted by L. Distributionally Robust Infinite-Horizon Control We write the DR-RO controller, K(z), as a sum of causal functions: K(z) = 1(z){ K L}+(z) L 1(z) (85) = 1(z) { K }+(z) L(z) + {{ K } L}+(z) L 1(z) (86) = 1(z){ K }+(z) + 1{{ K } L}+(z) L 1(z). (87) From Lemma 4 in (Sabag et al., 2023), we have: { K } (z) = RB u(z 1I A k) 1A k PBw (88) where the LQR controller is defined as Klqr = (I +B u PBu) 1B u PA and the closed-loop matrix AK = A Bu Klqr with P 0 is the unique stabilizing solution to the LQR Riccati equation P = Q + A PA A PBu(I + B u PBu) 1B u PA, Q = C C, and with R = (I + B u PBu) /2. Multiplying equation (88) with L, and taking its causal part, we get: {{ K } L}+(z) = { RB u(z 1I A k) 1A k PBw C(z I A) 1 B D1/2 RB u(z 1I A k) 1A k PBw D1/2}+. (89) Given that the term RB u(z 1I A k) 1A k PBw D1/2 is strictly anticausal, and considering the matrix U which solves the lyapunov equation: A k PBw C + A k UA = U, we get {{ K } L}+(z) as: {{ K } L}+(z) = { RB u((z 1I A k) 1A k U + U A(z I A) 1 + U) B D1/2}+ (90) = RB u U( A(z I A) 1 + I) B D1/2 (91) = z RB u U(z I A) 1 B D1/2 (92) Now, multiplying equation (92) by the inverse of L (84), we get: {{ K } L}+(z) L 1(z) = z RB u U(z I A) 1 B(I + C(z I A) 1 B) 1 (93) = z RB u U(z I A) 1(I + B C(z I A) 1) 1 B (94) = z RB u U(z I Ak) 1 B (95) = RB u U(I + (z I Ak) 1 Ak) B (96) where Ak = A B C. The inverse of is given by 1(z) = (I Klqr(z I Ak) 1Bu) R , and we know from lemma 4 in (Sabag et al., 2023) that { K }+(z) = RB u PA(z I A) 1Bw RB u PBw. Then we can get the 2 terms of equation (87): 1(z){ K }+(z) = Klqr(z I Ak) 1(Bw Bu R RB u PBw) R RB u PBw (97) 1(z){{ K } L}+(z) L 1(z) = (I Klqr(z I Ak) 1Bu) R RB u U(z I Ak) 1 Ak B (98) + Klqr(z I Ak) 1Bu R RB u U B (99) R RB u U B (100) Distributionally Robust Infinite-Horizon Control Finally, summing equations (97) and (98), we get the controller K(z) in its rational form: K(z) = R RB u Klqr (z I AK 0 Bu R RB u Ak ) 1 AK B Bw + Bu R RB u(PBw + U1 B) | {z } e G R RB u(PBw + U1 B) | {z } e J which can be explicitly rewritten as in (30). H. SDP Formulation for the Finite Horizon from (Yan et al., 2023) In this section, we state the SDP formulation of the finite-horizon DR-RO control problem for a fixed horizon T > 0 presented in (Yan et al., 2023), which is the main controller we compare against, to showcase the value of the infinite-horizon setting. This result highlights the triviality of non-causal estimation as opposed to causal estimation. In Theorem H.2, we demonstrate that the finite-horizon DR-RO problem reduces to an SDP. Problem H.1 (Distributionally Robust Regret-Optimal (DR-RO) Control in the Finite Horizon ). Find a casual and time-invariant controller, KT KT , that minimizes the worst-case expected regret in the finite horizon (2.3), i.e., inf KT KT R(KT , r) (102) Theorem H.2 (Adapted from (Yan et al., 2023). An SDP formulation for finite-horizon DR-RO). Let the horizon T >0 be fixed and given the noncausal controller K ,T := (IT + F T FT ) 1F T GT , the Problem H.1 reduces to the following SDP inf KT KT , γ 0, XT 0 γ(r2 T tr(IT ))+tr(XT ) s.t. XT γIT 0 γIT γIT (KT K ,T ) 0 KT K ,T (IT +F T FT ) 1 Moreover, the worst-case disturbance w T can be identified from the nominal disturbances w ,T as w T = (IT γ 1 T K ,T TK ,T ) 1w ,T where γ >0 and K T are the optimal solutions. Note that the scaling of the SDP in Theorem H.2 with the time horizon is prohibitive for many time-critical real-world applications. Therefore, we compare our infinite-horizon controller to the finite-horizon one in the simulation sections 6 and I. I. Additional Simulations I.1. Note on Comparison with Other Methods in the Literature: As our work is the first to explore infinite-horizon distributionally robust control, our comparative experiments are constrained by the existing literature on finite-horizon distributionally robust control. Since the closest work to ours is that of (Taha et al., 2023), our numerical experiments primarily compare with their finite-horizon version that utilizes an SDP formulation. Unfortunately, the framework in (Taskesen et al., 2023) only allows for time-independent disturbances. While this approach is valuable for partially observed systems, it is widely acknowledged that the optimal distributionally robust controller for fully observed systems remains the same as the standard LQR controller as long as the disturbances are independent (though not necessarily identical) (Hassibi et al., 1999). Therefore, in our setup, the results from Taskesen et al. simply reduce to the optimal LQR controller. This observation has also been noted in Taskesen et al. While in the main text we simulated under the worst-case distributions corresponding to each controller being compared, we include in this section of the appendix other systems under the worst-case distributions, and also under other disturbance realizations (namely sinusoidal and uniform distributions). I.2. Additional Time Domain and Frequency Domain Simulations Time domain simulations: We repeat the same experiment of section 6 for 2 more systems, [REA4] and [HE3] (Leibfritz & Lipinski, 2003). [REA4] is a SISO system with 8 states and a stable A matrix, while [HE3] has 4 states and an unstable A Distributionally Robust Infinite-Horizon Control matrix. The results are shown in figures 4,5. Similarly to our previous discussion, the infinite horizon DRRO controller achieves good performance across all systems, achieving the lowest cost under all considered noise scenarios. 50 100 150 200 0 (a) White noise 50 100 150 200 0 (b) Worst disturbance for (I) 50 100 150 200 0 (c) Worst disturbance for (II) 50 100 150 200 0 (d) Worst disturbance for (III) Figure 4: The control costs of the different DR controllers: (I) DR-RO in infinite horizon, (II) DR-RO in finite horizon and (III) DR-LQR in finite horizon under different disturbances for system [REA4] (Leibfritz & Lipinski, 2003). (a) is white noise, while (b), (c) and (d) are worst-case disturbances for each of the controllers, for r = 1.5. The finite-horizon controllers are re-applied every s = 30 steps. For all disturbances, the infinite horizon DRRO controller achieves lowest average cost, even in cases (c) and (d) where the finite horizon DR controllers are designed to minimize the cost. In figures 6 and 7, we show the performance of the different DR controllers: (I) DR-RO in infinite horizon, (II) DR-RO in finite horizon and (III) DR-LQR in finite horizon under uniform noise and sinusoidal noise, respectively, for different systems. Note that the distributionally robust controller is guaranteed to perform better than other controllers under its own worst-case distribution, but has no guarantee of performance under other disturbances. Under uniform and sinusoidal noise, our infinite horizon DR-RO controller performs better than the finite horizon DR-LQR for systems [REA4] and [AC15], but worse than the finite horizon DR-LQR and on par with the finite horizon DR-RO for system [HE3]. Frequency domain simulations We show in figure 8 the frequency domain representation of the square of the norm of the DR-RO controller and its approximation for [AC15] and [HE3], demonstrating that lower order approximations of m(ejω) provide good estimates. Distributionally Robust Infinite-Horizon Control 50 100 150 200 0 (a) White noise 50 100 150 200 0 (b) Worst disturbance for (I) 50 100 150 200 0 (c) Worst disturbance for (II) 50 100 150 200 0 (d) Worst disturbance for (III) Figure 5: The control costs of the different DR controllers: (I) DR-RO in infinite horizon, (II) DR-RO in finite horizon and (III) DR-LQR in finite horizon under different disturbances for system [HE3] (Leibfritz & Lipinski, 2003). (a) is white noise, while (b), (c) and (d) are worst-case disturbances for each of the controllers, for r = 1.5. The finite-horizon controllers are re-applied every s = 30 steps. For all disturbances, the infinite horizon DRRO controller achieves lowest average cost, even in cases (c) and (d) where the finite horizon DR controllers are designed to minimize the cost. 50 100 150 200 0 (a) System [REA4] 50 100 150 200 0 (b) System [HE3] 50 100 150 200 0 (c) System [AC15] Figure 6: The control costs of the different DR controllers: DR-RO in infinite horizon, DR-RO in finite horizon and DR-LQR in finite horizon under uniform noise distributions (with amplitude=2) for different systems, for r = 1.5. 50 100 150 200 0 (a) System [REA4] 50 100 150 200 0 (b) System [HE3] 50 100 150 200 0 (c) System [AC15] Figure 7: The control costs of the different DR controllers: DR-RO in infinite horizon, DR-RO in finite horizon and DR-LQR in finite horizon under sinusoidal noise distributions (frequency=1, phase=π/4, amplitude=2) for different systems, for r = 1.5. Distributionally Robust Infinite-Horizon Control 0 1 2 3 4 5 6 0 0 1 2 3 4 5 6 0 0 0.5 1 1.5 2 2.5 3 0 Figure 8: The frequency domain representation of the square of the norm of the DR-RO controller K(ejw) and its approximation for [AC15] 8a and [HE3] 8b. Figures 8a and 8b reaffirm our conclusions that lower order approximations of m(ejw) still yield good estimates of the same. Figure 8c represents the worst case expected regret of H2, H and the RO controller.