# asynchronous_distributed_gaussian_process_regression__d9b2f653.pdf

Asynchronous Distributed Gaussian Process Regression

Zewen Yang*, Xiaobing Dai*, Sandra Hirche

Technical University of Munich zewen.yang@tum.de, xiaobing.dai@tum.de, hirche@tum.de

In this paper, we address a practical distributed Bayesian learning problem with asynchronous measurements and predictions due to diverse computational conditions. To this end, asynchronous distributed Gaussian process (Async DGP) regression is proposed, which is the ﬁrst effective online distributed Gaussian processes (GPs) approach to improve the prediction accuracy in real-time learning tasks. By leveraging the devised evaluation criterion and established prediction error bounds, Async DGP enables the distinction of contributions of each model for prediction ensembling using aggregation strategy. Furthermore, we extend its utility to dynamic systems by introducing a learning-based control law, ensuring guaranteed control performance in safety-critical applications. Additionally, a networked online learning simulation platform for distributed GPs, namely online GP gym (GPgym), is introduced for testing the performance of learning and control of dynamical systems. Numerical simulations within GPgym across regression tasks with real-world data sets and dynamical control scenarios demonstrate the effectiveness and applicability of Async DGP.

Introduction Distributed learning, employing parallelized training and cooperative learning within distributed systems, holds promise for enhancing the efﬁciency of machine learning in the training and prediction process. The application of distributed learning extends across diverse domains, including but not limited to multi-agent systems (Yan et al. 2020), edge computing (Chen and Ran 2019), and networked Internet of Things devices (Park and Saad 2019), etc. Speciﬁcally, for the safe operation of systems in complex and dynamic environments, real-time predictions and prompt model updates must be employed. However, the implementation of online learning within a distributed framework is confronted with challenges of the inherent heterogeneity in computational nodes, leading to asynchronous predictions (see Figure 1). In other words, variations in computation speeds among nodes result in distinct computational times, while discrepancies in data volume contribute to different processing times. Even in scenarios where computational nodes are

*These authors contributed equally. Copyright 2025, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

= ˆf2(x(t2 2))

= ˆf3(x(t1 3))

only prior available

= ˆf1(x(t0 1))

ˆf1(x(t1 1)) ˆf1(x(t0 1)) compute process D1(t1 1) D1(t0 1)

ˆf1(x(t0 1)) available

Figure 1: Asynchronous distributed predictions. The temporal sequence of the learning process of each computational node i is characterized by discrete phases: a green block representing the time frame for receiving and allocating data set Di at its own time tk i with k N, and a blue block denoting the computational duration of the prediction ˆfi of the true function f. During the intervening gray block, only previous predictions are available.

homogeneous, non-simultaneous measurement and data reception exhibit asynchrony. Addressing this issue becomes imperative for ensuring the seamless usage of distributed learning algorithms in real-world applications, especially in contexts where timely and accurate predictions are essential for the safe real-time operations of dynamical systems1.

Related Work To achieve online learning in distributed systems, we consider Gaussian process (GP) regression, a non-parametric supervised learning technique, increasingly utilized in safety-critical applications for its high expressive capabilities and probabilistic guaranteed bounded prediction errors (Hashimoto et al. 2022). As the computational complexity associated with updates and predictions in GP regression escalates with the expanding number of training points, an effective way to mitigate the issue is distributed computing inducing distributed Gaussian processes (DGPs) (Deisenroth and Ng 2015), where a center node aggregates the predictions from the distributed nodes (experts) with the divided sub-dataset. One such approach is

1Extended related work, proofs, and results are available in the complementary document at ar Xiv (Yang, Dai, and Hirche 2024).

The Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25)

the mixture of experts (MOE) method (Tresp 2000b; Yuan and Neubauer 2008; Masoudnia and Ebrahimpour 2014), where each model s prediction is constant weight. Additionally, the product of experts (POE) family methods (Ng and Deisenroth 2014; Cao and Fleet 2015; Cohen et al. 2020), which determine aggregation weights using posterior variance, and Bayesian committee machine (BCM) family methods (Tresp 2000a; Deisenroth and Ng 2015; Liu et al. 2018), considering prior variance, have been proposed. Another avenue involves fusion methods utilizing dynamical average consensus (Lederer et al. 2023). However, this approach requires solving differential equations and interacting with the computational nodes, leading to additional computational delay. Despite these efforts, the prevailing issue of asynchronous DGP predictions remains unaddressed, posing inevitable challenges due to inherent variations and noise across diverse models. Peng et al. (2017) propose an ofﬂine asynchronous distributed variational GP method, however, the delay is introduced by the variational approximation meaning this method exacerbates the asynchronous problem rather than resolving it. The studies (Nguyen et al. 2022; Egel e et al. 2023) proposed an asynchronous optimization method for large-scale hyperparameter optimization, considering asynchronous communication between nodes. However, these approaches require pre-partitioning the entire dataset and limit the applicability to online learning scenarios where data arrives sequentially. While agent-based systems integrating DGPs demonstrate promise in cooperative learning (Yang et al. 2021; Dai et al. 2024a,b; Yang et al. 2024a,b), achieving simultaneous predictions remains a signiﬁcant challenge without delay due to the communication and computational time (Dai et al. 2023). Therefore, there exists no algorithm designed explicitly to alleviate the problems associated with asynchronous predictions and delayed measurements in online learning with GPs.

Contribution In addressing this practical problem posed by the nature of asynchronous online inference, we propose a multi-model learning methodology employing asynchronous distributed Gaussian processes (Asyn DGPs), which answers two key questions. Question 1: Are the previous temporal predictions useless? Question 2: How can the prior model of GP be applied in an asynchronous prediction scenario? Moreover, we introduce a quantiﬁable criterion for assessing predictions from distinct GP models. In addition, a comprehensive analysis is presented establishing the prediction error bounds for aggregated predictions in asynchronous learning. Then, based on the introduced distributed learning framework, we formulate a learning-based protocol designed for the control task of dynamical systems. Through quantitative evaluations of the learning process, we ensure that control performance is guaranteed within a deterministic ultimate bound. Additionally, we introduce a MATLAB simulation platform called online GP gym (GPgym) to verify the online learning algorithms while considering the delays in both predictions and observations. Furthermore, we conduct real-world benchmarks achieving signiﬁcant improvements in prediction accuracy over state-of-the-art DGP approaches in the

regression task. Lastly, a dynamical system track control task simulation demonstrates the applicability of Asyn DGP in safety-critical applications.

Problem Statement This paper investigates a distributed system comprising M N>0 computational nodes. The primary objective is to cooperatively infer an unknown function f : Rn R using its estimated function ˆf. To accomplish this goal, each node within the distributed system is equipped with a GP model for predictions with a data set Di = x(ι), y(ι)

ι=1, ,Ni with Ni = |Di|, where i = 1, . . . , M. A Gaussian Process induces a Gaussian distribution of an unknown function f deﬁned by a prior mean m : Rn R and a kernel κ : Rn Rn R>0, denoted as f GP(m, κ) (Rasmussen and Williams 2006). Given a dataset D, the prediction of f at a query point x is derived using Bayesian theory, resulting in f(x) N(µ(x|D), σ2(x|D)). The posterior mean µ(x|D) and posterior variance σ2(x|D) are expressed as

µ(x|D)=m(x)+k T D(x)(KD + σ2 n IN) 1(y D m D), (1)

σ2(x|D)=κ(x, x) k T D(x)(KD + σ2 n IN) 1k D(x), (2)

where k D(x) = [κ(x, x(1)), , κ(x, x(|D|))]T , KD = [κ(x(i), x(j))]i,j=1, ,|D|, m D = [m(x(1)), , m(x(|D|))]T , and y D = [y(1), , y(|D|)]T . Without loss of generality, each data pair x(ι), y(ι) satisﬁes the following assumption.

Assumption 1. For each data pair x(ι), y(ι) in the data set D = x(ι), y(ι)

ι=1, ,|D| is sampled from the unknown

f with noise, where y(ι) = f(x(ι))+ w(ι). The noise w(ι) is bounded by |w(ι)| σn for ι = 1, , |D| and σn R 0. In a practice setting, noisy data is commonly assumed, but it is necessary to establish that the noise is bound in order to derive the error bounds. Considering real-world scenarios where delayed measurement data processing or computation time consumption must be factored in, the distributed system can only obtain asynchronous predictions. Consequently, the estimated function ˆf at current time t R 0 aggregates the individual node prediction functions ˆfi, which is formulated as follows

i=1 ˆfi x(tk i ), Di(tk i ) , (3)

k := argmax{s N : ts i t} (4)

where tk i indicates the time that the computational node i computes the k-th prediction of its GP model. The time interval tk i for obtaining prediction ˆfi(x(t), tk i ) spans the interval from tk i to tk+1 i , encompassing both data processing time and prediction computation time as shown in Figure 1. Therefore, the discrete-time tk+1 i is deﬁned by tk+1 i = tk i + tk i with the initial time t0 i = 0 for i = 1, . . . , M. The data processing time involves various components not limited to data transmission and storage. Moreover, the

computation of individual predictions involves calculating the posterior mean µ(x(tk i )|Di(tk i )) and posterior variance σ2(x(tk i )|Di(tk i )), which are only available for calculation after time tk+1 i . However, the existing distributed learning approaches often simplify this asynchronous prediction by aggregating the latest available results from each GP model. The individual prediction function is ˆfi x(tk i ), Di(tk i ) = ωi(x(tk i ))µi(x(tk i )), (5)

where µi( ) is the simpliﬁed notation for µ( |Di(tk i )), and the ωi(x(tk i )) is the aggregation weight determining the contribution of the posterior mean from the i-th GP model. For instance, in the MOE approach, ωi(x(tk i )) = 1/M when considering all models have no information related to the weights. In addition, the speciﬁc formulation of the aggregation weights for both POE family and BCM family approaches is presented as follows

ωi(x(tk i )) = ρiσ 2 i (x(tk i )) σ2 i (x(tk i )), (6)

where σi( ) is deﬁned as σ( |Di(tk i )) for notation simpliﬁcation. The weighting factor ρi is deﬁned as the difference of information entropy between the posterior and prior (Cao and Fleet 2015), but open for other options as long as ρi 0. In POE family approaches, σi is formulated by

σ 2 i (x(tk i )) =

s=1 ρsσ 2 s (x(tk i )),

and in BCM family is deﬁned by

σ 2 i (x(tk i )) =

s=1 ρsσ 2 s (x(tk i )) + (1

s=1 ρs)σ 2 ,

where σ is the prior variance of the unknown function f. Note that the asynchronous aggregation problem (5), where x(tk i ) and x(tk j ) may not be identical for i, j = 1, . . . , M and t R 0, cannot be trivially addressed by conventional aggregation strategies such as MOE, POE, and BCM, since these methods only deal with the aggregation with same input x. Moreover, the lack of prediction error bounds for these methods with different input x poses a signiﬁcant challenge, especially in the context of real-time learning for safety-critical applications. Therefore, in the following section, we propose the async DGP, which not only accounts for the signiﬁcance of previous predictions but also balances the contributions of the prior mean. Additionally, we delve into an analysis of the prediction error bounds considering the asynchronous effects.

Async DGP and GPgym Platform To tackle the inherent challenges posed by asynchrony, we initiate our approach by introducing a criterion for evaluating prediction performance. This metric serves as a critical foundation for the subsequent development of the async DGP approach. Furthermore, we provide a rigorous analysis of the prediction error bounds associated with our proposed approach. Subsequently, we present the GPgym that facilitates testing of distributed real-time learning performance and veriﬁes safety-critical applications.

Performance Criterion for Predictions Before designing the prediction performance criterion, we propose prediction criteria for facilitating the design of Async DGP in the following lemmas. Lemma 1 (Hashimoto et al. (2022)). Suppose the function f( ) belongs to a reproduced kernel Hilbert space (RKHS) corresponding to κ with the bounded RKHS norm denoted as f κ = p

f, f κ Γ, Γ R0,+ in the compact input domain X. Then, the prediction error by using data set D satisfying Assumption 1 is bounded by

|f(x) µ(x|D)| βσ(x|D), x X, (7)

where β = q

Γ2 y T D (KD + σ2n IN) 1y D + |D|.

Based on the above lemma providing the prediction error criterion βσ(x|D), now we propose the performance criterion η for asynchronous predictions as follows. Lemma 2. Assume the kernel κ is chosen as Lipschitz continuous with the Lipschitz constant Lκ R 0 w.r.t a deﬁned distance d : Rn Rn R 0, i.e.,

|κ(x, x ) κ(x, x )| Lκd(x x ) (8)

for x, x , x Rn. Suppose the function f( ) belongs to an RKHS corresponding to κ with the bounded RKHS norm denoted as Γ R 0 in the compact input domain X. Moreover, let the corresponding RKHS norm be bounded by Γ. Then, the asynchronous prediction error between f(x(t)) and µi(x(tk i )) using the data set Di satisfying Assumption 1 is bounded by

|f(x(t)) µi(x(tk i ))| ηk i (t), (9)

where ηk i (t) = Lfd(x(tk i ), x(t)) + βσi(x(tk i )) with Lf = 2LκΓ. This lemma shows that the prediction performance criterion ηki i (t) is comprised of two components: the previous posterior variance denoted as σi( ) signifying the conﬁdence associated with the prediction, and d( , ) reﬂecting the bias against the input point. Since ηki i (t) exhibits monotonically increasing behavior w.r.t both σi( ) and d( , ), predictions characterized by higher accuracy and proximity to the query point result in a diminished prediction error. It s worth mentioning that the deterministic prediction error bounds presented in Lemma 1 and 2 are variants derived from prior works, such as (Maddalena, Scharnhorst, and Jones 2021; Hashimoto et al. 2022). Additionally, probabilistic bounds can also be derived, as demonstrated in (Srinivas et al. 2012; Whitehouse, Ramdas, and Wu 2024). This indicates the versatility and extendibility of our method, as it can readily incorporate probabilistic bounds. Considering Equation (8), the selection of the distance function d is pivotal, reﬂecting the divergence in the kernel space κ corresponding to the state space. In the absence of prior knowledge, a commonly employed kernel is the squared exponential (SE) kernel deﬁned as κSE(x, x ) = σ2 f exp( 1 2σ2 l (x x )T (x x )), where σf R>0 and σl R>0 are hyper-parameters.

Lemma 3. Consider the SE kernel and its distance function deﬁned as d SE(x, x ) = x x , x, x Rn, then the corresponding Lipschitz constant is Lκ,SE = σ2 f exp( 0.5)/σl.

In order to have an independent length scale for each dimension, automatic relevance determination (ARD) squared exponential (SE) kernel can be considered, which is deﬁned

κARD-SE(x, x ) = σ2 f exp( 1

2(x x )T Σ 2 L (x x )),

where ΣL = diag(l1, , ln) and li R+ for i = 1, , M associated with each dimension in the state space, facilitate dimension-wise distance. This choice of kernels and their parameterization aligns to capture the inherent relationships within the data, while the ARD-SE kernel provides additional ﬂexibility in addressing varying length scales across dimensions.

Answers of Question 1 and 2: Design of Async DGP By quantifying the prediction performance as the aggregating criteria in Lemma 1 and 2, we are able to utilize previous predictions from the GP models, which answers Question 1. That is to say, the earlier temporal predictions are valuable based on the state rather than the time. In order to leverage the previous prediction results, we deﬁne the collected prediction information set I(t) deﬁned as

I(t) = (10)

n x(tk i ), µi(x(tk i )), σi(x(tk i )) tk i t, k N o .

With given information set I(t), we present our asynchronous aggregation approach incorporated with posterior predictions and prior means of DGP models as

k=0 ωi k(t)µi(x(tk i )) + ωm(t)m(x(t)), (11)

with ki(t) = sup{k N|tk i t}, which address the Question 2. The core of Async DGP is to ﬁlter out inferior predictions in the information set I(t) according to the proposed prediction criteria. With the proposed performance criterion η, the aggregation weights ωi k(t) for the posterior mean are designed as

ωi k(t) = ω2(t)ρi k(t)(ηk i (t)) 2, (12)

ρi k(t) = max{log(βσf/ηk i (t)), 0}, (13)

k=0 ρi k(t)(ηk i (t)) 2 (14)

k=0 ρi k(t) (βσf) 2.

Here, the weighting factor ρi k( ) : R 0 R 0 functions as a selection criterion. Speciﬁcally, when the discrepancy

in information entropy between the weighted prior variance βσf and the proposed criterion ηk i (t) falls below a threshold of 1, the posterior mean is excluded from selection. Therefore, to take account of the prior mean, the weighting factor ωm(t) is designed as

ωm(t) = ω2(t) 1

k=0 ρi k(t) (βσf) 2. (15)

Notably, the results of the prior mean at time t in (11) are unaffected by delays, because of its exclusion from learning and training processes. Particularly in instances of lacking prior knowledge of the unknown function, the prior mean is set to zero, indicating its result is readily available without latency. Moreover, the prior mean can be regarded as the posterior mean with an empty data set. Thus, with the results in Lemma 1 and the expression of σi( ) in (1), the weighted prior variance βσf is used for evaluating the error bound between true function and prior mean function, i.e.,

|f(x(t)) m(x(t))| βσf. (16)

With the accuracy evaluation of prior mean, the variable ρi k(t) is set such that only the predictions µi(x(tk i )) with better performance than m(x(t)) will be aggregated with ρi k(t) > 0. Remark 1. Our aggregation method (11) reveals that the most useful prediction is one that is closest to the query state x(t), rather than the most temporally recent prediction at time t assuming both have the same prediction error bound from GP. Once the prediction for x(tk i ) is completed, the result is promptly sent and stored based on the collected prediction information set. Moreover, there is no need to recalculate the posterior mean at the state point x(tk i ), which can be reused for future calculation. Remark 2. Notably, there are no upper bound for ρi k(t), especially considering the fact that ρi k(t) when ηk i (t) 0 in (13), the second term in (14) could be negative. However, considering only predictions with ηk i (t) βσf have non-zero ρi k(t), it has

k=0 ρi k(t)(ηk i (t)) 2

k=0 ρi k(t)(βσf) 2, (17)

such that a valid ω(t) with ω 2(t) (βσf) 2 exists. Moreover, it is direct to see ω(t) (0, βσf]. Since the information set I(t) expands over time, the associated data storage and computational requirements for the aggregation operation also grow proportionally. To circumvent these challenges and manage the ﬁnite information set practically, a well-deﬁned constant I N>0 is introduced, such that |I(t)| I holds for all t R 0 with memory management strategy. As the Async DGP only uses the prediction results, the aggregation algorithm only takes O(1) for sum and add operations. However, the computational complexity of the calculation of the results depends on the local GP model. For instance, utilizing the state-ofthe-art Lo G-GP for online learning approach, the update of a

Algorithm 1: Asyn GP

1: while Prediction not ﬁnished do 2: Get current state x(t) and send to all GP experts; 3: Part I: Pre-processing

4: Calculate ηk i (t) for each element in I(t) 5: Delete the elements with ηk i (t) > βσf from I(t) 6: Part II: New Predictions Reception 7: for GP expert from i = 1 to M do 8: Pk i receive prediction result from expert i 9: if Pk i then 10: Next i 11: end if 12: Calculate ηk i (t) for newly received prediction 13: if ηk i (t) βσf AND |I(t)| < I then 14: I(t) {I(t), Pk i } 15: else if ηk i (t) βσf AND |I(t)| = I then 16: I(t) I predictions with lowest ηk i (t) 17: end if 18: end for 19: Part III: Asynchronous Aggregation

20: ˆf(x(t)) Aggregation using (11) 21: end while

Lo G-GP model requires Op(log(N)) and the mean and variance predictions require Op(log2(N)). Speciﬁcally, given the structure of ρi k(t), which permits only predictions with ηk i (t) βσf to participate in the aggregation, a heuristic algorithm is designed as in Algorithm 1. This algorithm comprises three main parts: pre-processing, new prediction reception, and asynchronous aggregation. In the pre-processing part, the condition ηk i (t) βσf ﬁlters each element in I(t), i.e., the elements violate this condition will be removed from I(t). In the prediction reception part, the centralized node will check whether any new prediction is available from expert i. According to the deﬁnition of I(t) in (10), each GP expert will transmit Pk i = {µi(x(tk i )), σi(x(tk i )), x(tk i )} to the centralized node. After receiving the new non-empty predictions, the condition ηk i (t) βσf will be checked, and only maximal I predictions including the new one and old predictions will be left in I(t).

In the aggregation part, the prediction ˆf(x(t)) is generated using (11). Therefore, if no element in I(t), the prior mean m(x(t)) is used, i.e., ˆf(x(t)) = m(x(t)).

Prediction Performance Guarantee The proposed asynchronous aggregation strategy (11) combining with the information set management in Algorithm 1 inherits the prediction error bound from exact GP as in Lemma 1, which is shown as follows. Theorem 1. Consider M GP models generated predictions with different data sets Di, i = 1, , M satisfying Assumption 1. Choose Lipschitz kernel κ, and assume f has

UDP Receive

Online Learning

Async DGP Info. Set Post-Process Dynamical System

Policy New predictions Filter

Info. Set Pre-Process GP model i

Central Server Plant

Figure 2: The framework of GPgym.

the bounded RKHS norm Γ corresponding to κ. The predictions are aggregated according to (11) with the information set management following Algorithm 1. Then, the prediction error is bounded by

|f(x(t)) ˆf(x(t))| ω(t), t R 0. (18)

This theorem shows the prediction error bound is monotonically increasing w.r.t ηk i (t) considering the deﬁnition of (14). This indicates aggregation of more accurate predictions, i.e., small prediction error σi( ) and close distance d( , ) beneﬁt the aggregated prediction performance. To guarantee an error prediction bound, which is independent of time, we propose the following corollary. Corollary 1. Consider M GP models generated predictions with different data sets Di, i = 1, , M satisfying Assumption 1. Choose Lipschitz kernel κ, and assume f has the bounded RKHS norm Γ corresponding to κ. The predictions are aggregated according to (11) with the information set management following Algorithm 1. Then, the prediction error is bounded by

|f(x(t)) ˆf(x(t))| βσf, t R 0 (19)

The proposed Async DGP and the established prediction error bounds can be effectively leveraged in safety-critical applications, such as safe control tasks. To validate the algorithm and its implementation, it is essential to develop a real application, which is elucidated in the subsequent section.

GPgym For analyzing the learning performance and safety-critical applications of dynamical systems characterized by asynchronous phenomena, we develop an online learning platform, namely GPgym2, whose framework is illustrated in Figure 2. This platform serves as a tool for validating the effectiveness of a designed algorithm and facilitating robust testing under asynchronous conditions. Furthermore, OLGPgym provides a means to test the maximum capability of designed algorithms for distributed GPs and iteratively improve the algorithms, which ensures their effectiveness in real-world scenarios. The GP model module (green block) functions as the distributed component within the GPgym framework. This module is responsible for processing and learning from data received from the central server (blue block) via User Datagram Protocol (UDP) communication. After the prediction is calculated, the GP model transmits its

2The GPgym platform, including data set, code, and instructions, is provided in (Dai and Yang 2024).

results back to the central server. The central server employs the Async DGP algorithm to process data originating from the plant (yellow block), which constitutes the dynamic system. The control policy is designed to govern the system s behavior, while the system states are utilized as training data. This framework establishes a feedback loop that continuously updates the control input in response to the evolving dynamics of the system. Moreover, the data required for this process can be obtained through sensors integrated into the system. This structured ﬂow of information within the GPgym platform illustrates the seamless interaction between the GP model, the central server, and the plant.

Safe Online Learning-based Control We illustrate the incorporation of Async DGP into a learningbased control framework, showcasing its effectiveness in employing to dynamical systems. Details regarding the controller design and implementation algorithm are provided, then a rigorous analysis of control performance using Async DGP is presented.

Control Law Design with Async DGP We demonstrate the application of async DGP in learningbased control, focusing on a dynamical system depicted in the plant (refer to Figure 2), wherein data from the system is continuously collected during closed-loop control through sensors. Speciﬁcally, we consider a nonlinear control afﬁne system governed by

x1 = x2, x2 = x3, , xn = f(x) + u, (20)

where the system state is denoted as x = [x1, , xn]T Rn, and the single control input as u R. While the system structure is known, the function f, encompassing factors such as environmental uncertainties and unmodeled components, is assumed to be unknown. Notably, the highorder form (20) encompasses a wide range of dynamical systems, such as robotic manipulators (Spong, Hutchinson, and Vidyasagar 2020), underwater vehicles (Fossen 2011), chemical processes (Subramanian 2021), etc. The objective is to devise a control policy steering the system towards a predeﬁned time-dependent reference xr(t), which follows an assumption that xr(t) is n times continuously differentiable with all its derivatives bounded. Formally, the tracking error, denoted as e = [e1, , en]T = x xd, where xd = [xr, xr, , dn 1

dtn 1 xr]T , should be minimized and bounded to ensure the guaranteed control performance and stability. In other words, the control goal is that the tracking error converges to the neighborhood of zero. By employing the proposed Async DGP, we design the control law as follows

dtn xr(t) ˆf(x(t))

i=1 λi xi di 1xr(t)

where the control gains λ1, , λn R>0 are chosen, such that all the eigenvalues of matrix A deﬁned as

A = 0(n 1) 1 In 1 λ1 [λ2, , λn]

are negative real numbers. Notably, the controllability induced by the structure in (20) ensures the existence of λ1, , λn for any desired eigenvalues of A.

Control Performance Analysis In order to analyze the stability, we ﬁrst establish the error dynamics, which is written as

e(t) = Ae(t) + b(f(x(t)) ˆf(x(t))), (23)

where b = [01 (n 1), 1]T . Building upon the presented error dynamics (23), we rigorously demonstrate the existence of an ultimate upper bound (Khalil 2002) for the norm of tracking error e . This upper bound is contingent upon both the control gains and the efﬁcacy of the learning process, which is shown in the following theorem.

Theorem 2. Consider a system (20) controlled by (21) for tracking tasks. The compensation ˆf(x(t)) is obtained by asynchronous aggregation (11) with Algorithm 1 using the GP experts satisfying Assumption 1 and Lipschitz continuous kernel. The tracking error is ultimately bounded by

lim t e(t) Q Q 1 | Λ| 1 ω, (24)

where Λ denotes the maximal value of the eigenvalue of A, Q = [v1, , vn] with vi, i = 1, , n be the eigenvector of A and ω = maxt R 0 ω(t).

It is evident that the bound on tracking error is directly proportional to the learning error bound ω, and inversely proportional to the control gains reﬂected by | Λ|. Therefore, this theorem signiﬁes the feasibility of determining adaptive control gains λi, that result in a diminished tracking error, which is crucial in safety critical applications.

Simulation Results In the subsections, we demonstrate the effectiveness of the proposed Async DGP through its application to regression tasks on real-world datasets, alongside a control task for a common dynamical system. Note that each GP model receives streaming data after the prediction process is ﬁnished, leading to variations in their training data and predictions. However, the real-time streaming data comes from the same dataset. Comparative analyses are conducted, evaluating the performance of Async DGP against various state-of-the-art DGP methods.

Regression Benchmark The regression performance on the three datasets, namely KIN40K (8-dimensional input, 10K data), SARCOS (21dimensional input, 44484 data), and PUMADYN32NM (32dimensional input, 7168 data) are evaluated. We consider a distributed system to have 4 GP models, where each model is built using the start-of-the-art online GP algorithm Lo G-GP (Lederer et al. 2021) and connected through UDP via Wi-Fi. The standardized mean squared errors (SMSE) for regression, with the information set threshold I set to 4 and 10, are shown in Figure 3, respectively. In general, Async DGP exhibits superior performance compared to MOE, POE, and

0 2k 4k 6k 10k Iteration

KIN40K ( I = 4)

0 10k 20k 30k 40k 44k Iteration

SARCOS ( I = 4)

Async DGP r BCM g POE BCM POE MOE

0 1k 3k 5k 7k Iteration

PUMADYN32NM ( I = 4)

0 2k 4k 6k 10k Iteration

KIN40K ( I = 10)

0 10k 20k 30k 40k 44k Iteration

SARCOS ( I = 10)

0 1k 3k 5k 7k Iteration

PUMADYN32NM ( I = 10)

1k 2k 4k 6k 10k

1.1 100 1.15 100

1k 2k 4k 6k 10k

1k 3k 5k 7k 100

1k 3k 5k 7k

1.1 100 1.15 100

Figure 3: Regression performance on 3 datasets.

BCM, with the exception observed in the KIN40K dataset for I = 4 before 3.5k iterations. However, Async DGP demonstrates enhanced performance throughout all iterations when I = 10, suggesting that increasing the information set leads to improved prediction accuracy. Similarly, in the PUMADYN32NM dataset, all methods demonstrate improvement when I = 10. In the SARCOS dataset, enlarging the information set does not yield a corresponding enhancement in prediction performance, thereby illustrating Async DGP with I = 4 as a better option.

Control Task By employing the proposed learning-based control deﬁned in (21) using the DGP composed of 4 models with I = 20, we evaluate the control performance of the dynamical system described as x1 = x2, x2 = f(x) + u, where

f(x)=1+x1x2

10 +cos(x2)

2 10 sin(5x1)+ 1 2(1+exp( x2

Furthermore, the desired reference is chosen with the form xr(t) = ar sin(wrt), with coefﬁcients ar, wr R. The control gains are set to λ1 = 2 and λ2 = 10 and I = 20. The minimum norm values of tracking errors and prediction errors are shown in Figure 4. Moreover, with Γ = 1 and the maximal data size is 100, the GP error bound is 35.5 and the tracking error bound is 4.9, which are both valid in the simulation results. Notably, both the mean and median values of Async DGP consistently outperform alternative approaches, which have similar and consistently inferior performance. Conclusion In this paper, we introduce the Async DGP algorithm designed to enhance prediction accuracy in real-time dis-

Async DGP r BCM g POE BCM POE MOE

Mean Median

Async DGP r BCM g POE BCM POE MOE

|f(x) ˆf(x)|

Mean Median

Figure 4: The violin plots of the minimum values over 100 Monte-Carlo simulations for tracking error (top) and prediction error (bottom).

tributed learning with GPs considering asynchronous scenarios. The superior performance of Async DGP in regression tasks reveals the inadequacy of aggregating only the latest available predictions. The incorporation of a guaranteed prediction error bound contributes to establishing the viability of employing Async DGP for addressing safety critical applications, as demonstrated through the control task of dynamical systems in the proposed GPgym platform.

Acknowledgments This work has been ﬁnancially supported by the Federal Ministry of Education and Research of Germany in the programme of Souver an. Digital. Vernetzt. under joint project 6G-life with project identiﬁcation number: 16KISK002, and by the European Research Council (ERC) Consolidator Grant Safe data-driven control for human-centric systems (CO-MAN) under grant agreement number 864686.

References Cao, Y.; and Fleet, D. J. 2015. Generalized Product of Experts for Automatic and Principled Fusion of Gaussian Process Predictions. Chen, J.; and Ran, X. 2019. Deep Learning With Edge Computing: A Review. Proceedings of the IEEE, 107(8): 1655 1674. Cohen, S.; Mbuvha, R.; Marwala, T.; and Deisenroth, M. 2020. Healing Products of Gaussian Process Experts. In III, H. D.; and Singh, A., eds., Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, 2068 2077. PMLR. Dai, X.; Lederer, A.; Yang, Z.; and Hirche, S. 2023. Can Learning Deteriorate Control? Analyzing Computational Delays in Gaussian Process-Based Event-Triggered Online Learning. In Learning for Dynamics and Control Conference, 445 457. PMLR. Dai, X.; and Yang, Z. 2024. GPgym: A Remote Service Platform with Gaussian Process Regression for Online Learning. ar Xiv:2412.13276. Dai, X.; Yang, Z.; Xu, M.; Zhang, S.; Liu, F.; Hattab, G.; and Hirche, S. 2024a. Decentralized event-triggered online learning for safe consensus control of multi-agent systems with Gaussian process regression. European Journal of Control, 80: 101058. Dai, X.; Yang, Z.; Zhang, S.; Zhai, D.-H.; Xia, Y.; and Hirche, S. 2024b. Cooperative Online Learning for Multiagent System Control via Gaussian Processes With Event Triggered Mechanism. IEEE Transactions on Neural Networks and Learning Systems, 1 15. Deisenroth, M.; and Ng, J. W. 2015. Distributed Gaussian Processes. In Bach, F.; and Blei, D., eds., Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, 1481 1490. Lille, France: PMLR. Egel e, R.; Guyon, I.; Vishwanath, V.; and Balaprakash, P. 2023. Asynchronous Decentralized Bayesian Optimization for Large Scale Hyperparameter Optimization. In 2023 IEEE 19th International Conference on e-Science (e Science), 1 10. Fossen, T. I. 2011. Handbook of marine craft hydrodynamics and motion control. John Wiley & Sons. Hashimoto, K.; Saoud, A.; Kishida, M.; Ushio, T.; and Dimarogonas, D. V. 2022. Learning-based Symbolic Abstractions for Nonlinear Control Systems. Automatica, 146: 110646.

Khalil, H. K. 2002. Control of nonlinear systems. Prentice Hall, New York, NY. Lederer, A.; Conejo, A. J. O.; Maier, K. A.; Xiao, W.; Umlauft, J.; and Hirche, S. 2021. Gaussian process-based real-time learning for safety critical applications. In International Conference on Machine Learning, 6055 6064. PMLR. Lederer, A.; Yang, Z.; Jiao, J.; and Hirche, S. 2023. Cooperative Control of Uncertain Multiagent Systems via Distributed Gaussian Processes. IEEE Transactions on Automatic Control, 68(5): 3091 3098. Liu, H.; Cai, J.; Wang, Y.; and Ong, Y. S. 2018. Generalized robust Bayesian committee machine for large-scale Gaussian process regression. In International Conference on Machine Learning, 3131 3140. PMLR. Maddalena, E. T.; Scharnhorst, P.; and Jones, C. N. 2021. Deterministic error bounds for kernel-based learning techniques under bounded noise. Automatica, 134: 109896. Masoudnia, S.; and Ebrahimpour, R. 2014. Mixture of Experts: A Literature Survey. 42(2): 275 293. Ng, J. W.; and Deisenroth, M. P. 2014. Hierarchical Mixtureof-Experts Model for Large-Scale Gaussian Process Regression. ar Xiv:1412.3078. Nguyen, J.; Malik, K.; Zhan, H.; Yousefpour, A.; Rabbat, M.; Malek, M.; and Huba, D. 2022. Federated Learning with Buffered Asynchronous Aggregation . In Camps-Valls, G.; Ruiz, F. J. R.; and Valera, I., eds., Proceedings of The 25th International Conference on Artiﬁcial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, 3581 3607. PMLR. Park, T.; and Saad, W. 2019. Distributed learning for low latency machine type communication in a massive Internet of Things. IEEE Internet of Things Journal, 6(3): 5562 5576. Peng, H.; Zhe, S.; Zhang, X.; and Qi, Y. 2017. Asynchronous distributed variational Gaussian process for regression. In International Conference on Machine Learning, 2788 2797. PMLR. Rasmussen, C. E.; and Williams, C. K. I. 2006. Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. Cambridge, Mass: MIT Press. ISBN 978-0-262-18253-9. Spong, M. W.; Hutchinson, S.; and Vidyasagar, M. 2020. Robot modeling and control. John Wiley & Sons. Srinivas, N.; Krause, A.; Kakade, S. M.; and Seeger, M. W. 2012. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting. IEEE Transactions on Information Theory, 58(5): 3250 3265. Subramanian, G. 2021. Process control, intensiﬁcation, and digitalisation in continuous biomanufacturing. John Wiley & Sons. Tresp, V. 2000a. A Bayesian committee machine. Neural computation, 12(11): 2719 2741. Tresp, V. 2000b. Mixtures of Gaussian Processes. In Leen, T.; Dietterich, T.; and Tresp, V., eds., Advances in Neural Information Processing Systems, volume 13. MIT Press.

Whitehouse, J.; Ramdas, A.; and Wu, S. Z. 2024. On the sublinear regret of GP-UCB. Advances in Neural Information Processing Systems, 36. Yan, Z.; Yang, Z.; Pan, X.; Zhou, J.; and Wu, D. 2020. Virtual leader based path tracking control for Multi-UUV considering sampled-data delays and packet losses. Ocean Engineering, 216: 108065. Yang, Z.; Dai, X.; Dubey, A.; Hirche, S.; and Hattab, G. 2024a. Whom to Trust? Elective Learning for Distributed Gaussian Process Regression. In Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 24, 2020 2028. Yang, Z.; Dai, X.; and Hirche, S. 2024. Asynchronous Distributed Gaussian Process Regression for Online Learning and Dynamical Systems: Complementary Document. ar Xiv:2412.11950. Yang, Z.; Dong, S.; Lederer, A.; Dai, X.; Chen, S.; Sosnowski, S.; Hattab, G.; and Hirche, S. 2024b. Cooperative Learning with Gaussian Processes for Euler-Lagrange Systems Tracking Control Under Switching Topologies. In 2024 American Control Conference (ACC), 560 567. Yang, Z.; Sosnowski, S.; Liu, Q.; Jiao, J.; Lederer, A.; and Hirche, S. 2021. Distributed Learning Consensus Control for Unknown Nonlinear Multi-Agent Systems Based on Gaussian Processes. In 2021 60th IEEE Conference on Decision and Control (CDC), 4406 4411. IEEE. Yuan, C.; and Neubauer, C. 2008. Variational mixture of Gaussian process experts. Advances in neural information processing systems, 21.