# neural_stochastic_control__6f53215e.pdf

Neural Stochastic Control

Jingdong Zhang1 Qunxi Zhu2,* Wei Lin1,2,3,4,5

1 School of Mathematical Sciences, SCMS, SCAM, and CCSB, Fudan University 2 Research Institute of Intelligent Complex Systems and MOE Frontiers Center for Brain Science, Fudan University 3 Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Fudan University 4 State Key Laboratory of Medical Neurobiology, Institutes of Brain Science, Fudan University 5Shanghai Artificial Intelligence Laboratory

{zhangjd20,qxzhu16,wlin}@fudan.edu.cn

Control problems are always challenging since they arise from the real-world systems where stochasticity and randomness are of ubiquitous presence. This naturally and urgently calls for developing efficient neural control policies for stabilizing not only the deterministic equations but the stochastic systems as well. Here, in order to meet this paramount call, we propose two types of controllers, viz., the exponential stabilizer (ES) based on the stochastic Lyapunov theory and the asymptotic stabilizer (AS) based on the stochastic asymptotic stability theory. The ES can render the controlled systems exponentially convergent but it requires a long computational time; conversely, the AS makes the training much faster but it can only assure the asymptotic (not the exponential) attractiveness of the control targets. These two stochastic controllers thus are complementary in applications. We also investigate rigorously the linear controller and the proposed neural stochastic controllers in both convergence time and energy cost and numerically compare them in these two indexes. More significantly, we use several representative physical systems to illustrate the usefulness of the proposed controllers in stabilization of dynamical systems.

1 Introduction

In the field of controlling dynamical systems, one of the major missions is to find efficient control policies for stabilizing ordinary differential equations (ODEs) to targeted equilibriums. The policies for stabilizing linear or polynomial dynamical systems have been fully developed using the standard Lyapunov stability theory, e.g., the linear quadratic regulator (LQR) Khalil (2002) and the sumof-squares (SOS) polynomials through the semi-definite planning (SDP) Parrilo (2000). As for stabilizing more general and nonlinear dynamical systems, linearization technique around the targeted states is often utilized and thus the existing control policies are effective in the vicinity of the targeted states (Sastry & Isidori, 1989) but likely lose efficacy in the region far away from those states. Moreover, in real applications, the explicit forms of the controlled nonlinear systems are often partially or completely unknown, so it is very difficult to directly design controllers only using the Lyapunov stability theory. To overcome these difficulties, designing the controllers via training neural networks (NNs) become one of the mainstream approaches in the community of cybernetics (Polycarpou, 1996). Recent outstanding developments using NNs include enlarging the safe region

To whom correspondence should be addressed: Q.Z. and W.L, https://faculty.fudan.edu.cn/wlin/zh_CN/index.htm.

36th Conference on Neural Information Processing Systems (Neur IPS 2022).

(Richards et al., 2018), learning the stable dynamics (Takeishi & Kawahara, 2021), and constructing the Lyapunov function and the control function simultaneously (Chang et al., 2019). In Kolter & Manek (2019) a projected NN has been constructed to directely learn a stable dynamical system and fits the observed time series data well, but it did not focus on learning a control policy to stabilize the original dynamics. All these existing developments are formulated only for deterministic systems but inapplicable directly to the dynamical systems described by stochastic differential equations (SDEs), requiring us to include the stochasticity appropriately into the use of neural controls to different types of dynamical systems.

Exponential Stabilizer

Asympyotic Stabilizer

ICNN or Quadratic form

Computational cost

Controlled orbit

= ( , ( ), ( ), , )

= ( , ( ), ( ), )

e t a r e c n e g r e v n o C

( ) = ( ) + ( )

( ) = ( ) + ( ( ) + ( ))

Initial state

Equilibrium

Original system

Controlled system

Figure 1: Sketches of the two frameworks of neural stochastic controller. Both the ES and AS find control function u with fully connected feedforward NN (FNN).

The stability theory for stochastic systems has been systematically developed in the past several decades. Representative contributions in the literature include the Lyapunov-like stability theory for SDEs Mao (2007), stabilization of unstable states in ODEs only using noise perturbations Mao (1994b), and the stability induced by randomly switching structures Guo et al. (2018). Generally, for any SDEs governed by dx = f(x)dt+g(x)d Bt, control policies as u = (uf, ug) are introduced, which transforms the original equations into the controlled system dx = [f(x) + uf(x)]dt + [g(x) + ug(x)]d Bt. Appropriate forms of control policies are able to steer the controlled system to the equilibriums that are unstable in the original SDEs. Traditional control methods focus on designing deterministic control uf and regard noise as negative part. Innovatively, we treat noise as a beneficial part and design stochastic control ug to achieve the stabilization.

In this article, we articulate two frameworks of neural stochastic control which can complement each other in terms of convergence rate and the computational time of training NNs. Additionally, we analytically investigate the convergence time and the energy cost for the classic linear control and the proposed neural stochastic control and numerically compare them. We further extend our frameworks to model-free case with existing data reconstruction methods. The major contributions of this article are multi-folded, including:

designing two frameworks of neural stochastic control, viz., the ES and the AS, and presenting their advantages in the stochastic control, providing theoretical estimation for ES/AS and classic linear control in terms of convergence time and the energy cost, computing the convergence time and the energy cost of particular stochastic neural control, demonstrating the efficacy of the proposed stochastic neural control in important control problems arising from representative physical systems, and we make our code available at https://github.com/jingddong-zhang/Neural-Stochastic-Control.

1.1 Related Works

Lyapunov Method in Machine Learning The recent work Chang et al. (2019) proposed an NN framework of learning the Lyapunov function and the linear control function simultaneously for stabilizing ODEs. In comparison, we select several specific types of NNs which have typical properties of the Lyapunov function. For instance, we use the input convex neural network (ICNN) Amos et al. (2017), constructing a positively definite convex function as a neural Lyapunov function (Kolter & Manek, 2019; Takeishi & Kawahara, 2021), and we construct the NN in a quadratic form (Richards et al., 2018; Gallieri et al., 2019) for linear or sublinear systems where the SDP method is often used to find the SOS-type Lyapunov function (Henrion & Garulli, 2005; Jarvis-Wloszek et al., 2003; Parrilo, 2000).

Stochastic Stability Theory of SDEs Stochastic stability theory for SDEs have been systematically and fruitfully achieved in the past several decades (Kushner, 1967; Arnold, 2007; Mao, 1991, 1994a). The positive effects of stochasticity have also been cultivated in control fields (Mao

et al., 2002; Deng et al., 2008; Caraballo et al., 2003; Appleby et al., 2008; Mao et al., 2007). These, therefore, motivate us to develop only neural stochastic control to stabilize different sorts of dynamical systems in this article. More stochastic stability theory for different kinds of systems are included in Appleby et al. (2006); Appleby (2003); Caraballo & Robinson (2004); Wang & Zhu (2017).

2 Preliminaries

To begin with, we consider the SDE which is written in a general form as:

dx(t) = F(x(t))dt + G(x(t))d Bt, t 0, x(0) = x0 Rd, (1)

where F : Rd Rd is the drift function, and G : Rd Rd r is the diffusion function with Rd r, a space of d r matrices with real entries, and Bt Rr, a r-dimensional (r-D) Brownian motion. Without loss of generality, we set F(0) = 0 and G(0) = 0 so that x0 = 0 is a zero solution Eq. (1).

Notations. Denote by the L2-norm for any given vector in Rd. Denote by | | the absolute value of a scalar number or the modulus length of a complex number number. For A = (aij), a matrix of dimension d r, denote by A 2 F = Pd i=1 Pr j=1 a2 ij the Frobenius norm.

Assumption 2.1 (Locally Lipschitzian Continuity) For every integer n 1, there is a number Kn > 0 such that

F(x) F(y) Kn x y , G(x) G(y) F Kn x y ,

for any x, y Rd with x y n.

Definition 2.1 (Derivative Operator) Define the differential operator L associated with Eq. (1) by

i,j=1 [G(x)G (x)]ij 2

Definition 2.2 (Exponential Stability) The zero solution of Eq. (1) is said to be almost surely exponentially stable, if lim supt 1 t log x(t; x0) < 0 a.s. for all x0 Rd. Here and throughout, a.s. stands for the abbreviation of almost surely.

Then, the following Lyapunov stability theorem will be used in the establishment of our main results. Theorem 2.2 Mao (2007) Suppose that Assumptions 2.1 holds. Suppose further that there exist a function V C2(Rd; R+) with V (0) = 0, constants p > 0, c1 > 0, c2 R and c3 0, such that (i) c1 x p V (x), (ii) LV (x) c2V (x), and (iii) | V (x)G(x)|2 c3V 2(x) for all x = 0 and t 0. Then,

lim sup t 1

t log x(t; t0, x0) c3 2c2

2p a.s.. (2)

In particular, if c3 2c2 > 0, the zero solution of Eq. (1) is exponentially stable almost surely.

The following asymptotic theorem also will be used in the establishment of our main results. Theorem 2.3 Appleby et al. (2008) Suppose that Assumptions 2.1 holds. Suppose further min x =M x G(x) > 0 for any M > 0 and there exists a number α (0, 1) such that

x 2(2 x, F(x) + G(x) 2 F) (2 α) x G(x) 2 0, x Rd. (3)

Then, the unique and global solution of Eq. (1) satisfies limt x(t, x0) = 0 a.s., and we call this property as asymptotic attractiveness.

3 Designing Stable Stochastic Controller

Here, we assume that the zero solution of the following SDE:

dx = f(x)dt + g(x)d Bt (4)

is unstable. Note that, for any nontrivial targeted equilibrium x , a direct transformation y = x x can make the zero solution as the equilibrium of the transformed system. Thus, our mission is to

stabilize the zero solution only. As such, we are to use the NNs to design the control u : Rd Rd r with u(0) = 0 and apply it to Eq. (4) as

dx = f(x)dt + [g(x) + u(x)]d Bt. (5)

Since u is integrated with d Bt in the controlled system (5), we regard it as a stochastic controller. In what follows, two frameworks of neural stochastic control, the exponential stabilizer (ES) and the asymptotic stabilizer (AS), are articulated, respectively, in Sections 3.1 and 3.2. All these control policies are intuitively depicted in Figure 1.

3.1 Exponential Stabilizer

Once we find the Lyapunov function V and the neural controller u, making the controlled system (5) meet all the conditions assumed in Theorem 2.2, the equilibrium 0 can be exponentially stabilized. To this end, we first provide two different types of functions for constructing V , which actually could be complementary in applications. Then, we design the explicit forms of control function and loss function.

ICNN V Function. We use the ICNN (Amos et al., 2017) to represent the candidate Lyapunov function V . This guarantees V as a convex function with respect to the input x. In order to further make V as a true Lyapunov function, we use the following form:

z1 = σ0(W0x + b0), zi+1 = σi(Uizi + Wix + bi), g(x) zk, i = 1, , k 1,

V (x) = σk+1(g(F(x)) g(F(0))) + ε x 2, (6)

as introduced in Manek & Kolter (2020). Here, Wi, bi are real-valued weights, Ui are positive weights, σi are convex, monotonically non-decreasing activation functions in the i-th layer, ε is a small positive constant, and F is a continuously differentiable and invertible function. In our framework, we require V C2(Rd; R+) according to Definition 2.1; however, each activation function σi σ in Manek & Kolter (2020) is C1 only. Thus, we modify the original function as:

Figure 2: The smoothed Re LU σ( ). σ(x) =

0, if x 0, (2dx3 x4)/2d3, if 0 < x d, x d/2, otherwise (7)

which not only approximates the typical Re LU activation but also becomes continuously differentiable to the second order (see Figure 2).

Quadratic V Function. For any x Rd, let Vθ Rd be a multilayered feedforward NN of the input x and with tanh( ) as the activation functions, where θ is the parameter vector. To meet the condition used in Definition 2.1, we cannot use the Re LU, a non-smooth function, as the activation function. Hence, we use the candidate Lyapunov function as:

V (x) = x εI + Vθ(x) Vθ(x) x, (8)

which was introduced in Gallieri et al. (2019). Here, ε is a small positive constant.

Control Function. We introduce a multi-layer feedforward NN (FNN), denoted by NN(x) Rr, to design the controller u. Since we require u(0) = 0, we set u(x) NN(x) NN(0) or u(x) diag(x)NN(x) with r = d. Here, diag(x) is a diagonal matrix with its i-th diagonal element as xi.

Remark 3.1 As reported in Chang et al. (2019), a single-layer NN without the bias constants in its arguments, which degenerates as linear control, could sufficiently take effect in the stabilization of many deterministic systems. However, this is NOT always the case for achieving the stabilization of highly nonlinear systems or even SDEs. The following proposition with Figure 3 provides an example, where neither the classic linear controller nor the stochastic linear controller can stabilize the unstable equilibrium in a particular SDE. The proof of this proposition is included in Appendix A.1.

Proposition 3.2 Consider the following 1-D SDE:

dx(t) = x(t) log |x(t)|dt + u(x(t))d Bt, (9)

with a zero solution x = 0. Then, for u(x) = kx with any k and x0 = 0, x = 0 is neither exponentially stable nor of globally asymptotic attractiveness almost surely. For u(x) = 2x2, x = 0 is of globally asymptotic attractiveness. For u(x) 0, the deterministic system cannot be stabilized by any classic linear controller.

0 0.5 1.0 1.5 2.0 Time

101 104 107 1010 1013

k = 1, x0 = 50 k = 2, x0 = 100 k = 3, x0 = 150

0 3e 4 6e 4 9e 4 1.2e 3

160 x0 = 50 x0 = 100 x0 = 150

Figure 3: (a) u(x) = kx, (b) u(x) = 2x2.

Loss Function. When the learning procedure updates the parameters in the NNs such that the constructed V and u with the coefficient functions, f and gu g + u, in the controlled system (5) meet all the conditions assumed in Theorem 2.2, the exponential stability of the controlled system is assured. Thus, we demand a suitable loss function to evaluate the likelihood that those conditions are satisfied. First, from Theorem 2.2, it follows that V (x) ε x 2 for all x Rd. Thus, Conditions (ii)-(iii) together with c3 2c2 > 0 in Theorem 2.2 equivalently become

inf x =0 ( V (x) gu(x))2

V (x)2 b sup x =0

V (x) , b > 2. (10)

These conditions further imply that

( V (x) gu(x))2

V (x)2 b LV (x)

V (x) 0, b > 2, x = 0. (11)

With these reduced conditions, we design the ES loss function for the controlled system (5) as follows.

Definition 3.1 (ES loss) Consider a candidate Lyapunov function V and a controller u for the controlled system (5). Then, the ES loss is defined as

Lµ,b,ε(θ, u) = Ex µ max 0, b LV (x)

V (x) ( V (x) gu(x))2

where the state variable x obeys the distribution µ. In practice, we consider the following empirical loss function:

LN,b,ε(θ, u) = 1

i=1 max 0, b LV (xi)

V (xi) ( V (xi) gu(xi))2

V (xi)2 , (12)

where {xi}N i=1 are sampled from the distribution µ = µ(Ω) and Ωis some closed domain in Rd.

For convenience, we summarize the developed framework in Algorithm 1. Here, b is a hyperparameter that can be adjusted as required by solving a specific problem.

Remark 3.3 In Section 5, we show numerically that the conditions reduced in (11) are sufficiently effective for designing the ES loss. Actually, it is not necessary to design the loss function using the conditions in (10).

Now, for controlling any nonlinear ODEs or SDEs, we design the ES according to Algorithm 1. As such, using the ES framework can not only stabilize those unstable equilibriums (constant states) of the given systems, but also can stabilize those unstable oscillators, e.g., the limit cycle. This is because the solution corresponding to the oscillator can be regarded as a zero solution of the controlled system after appropriate transformations are implemented.

Another point needs attention. During the construction of V in (6), ε x 2, the L2-regularization, is used to guarantee the positive definiteness of V . However, often in the application of the Lyapunov stability theory, the form of x 2 is not always a suitable candidate for the Lyapunov function. It may restrict the generalizability of using our framework, so it needs necessary adjustments. The following example illustrates this point.

Example 3.4 Consider a 2-D SDE as follows: dx1(t) = x2(t)dt, dx2(t) = [ 2x1(t) x2(t)]dt + x1(t)d Bt.

In Appendix A.2, the zero solution of this system is validated to be exponentially stable almost surely; however, k x 2 for any k R cannot be a useful auxiliary function to identify the exact stability of the zero solution.

To be candid, using the current framework takes a longer time for training and constructing the neural Lyapunov function. In the next subsection, we thus establish an alternative control framework that can reduce the training time.

3.2 Asymptotic Stabilizer

Here, in light of Theorem 2.3, we are to establish the second framework, the AS, for stabilizing the unstable equilibrium of system (5). This framework only makes the equilibrium asymptotically attractive almost surely. Its control function is designed in the same way as the one used in the ES framework, whereas the loss function is differently designed.

Definition 3.2 (AS loss) Utilization of the notations used in Definition 3.1, the loss function for the controlled system (5) with the controller u is defined as:

Lµ,α(u) = Ex µ max 0, (α 2) x gu(x) 2 + x 2( x, f(x) + gu(x) 2 F) .

Akin to Definition 3.1, we set the empirical loss function as:

LN,α(u) = 1

max 0, (α 2) x i gu(xi) 2 + xi 2( xi, f(xi) + gu(xi) 2 F) . (13)

Here, α is an adjustable parameter, which is related to the convergence time and the energy cost using the controller u. We show in Appendix A.8 the influence of selecting different α. For convenience, we summarize the AS framework in Algorithm 2.

4 Convergence Time and Energy Cost

The convergence time and the energy cost are the crucial factors to measure the quality of a controller (Yan et al., 2012; Li et al., 2017; Sun et al., 2017). In this section, we provide a comparative study between the traditional stochastic linear control and the ES/AS, the above-articulated neural stochastic control.

To this end, we first present a theorem on the estimations of the convergence time and the energy cost for the stochastic linear control on a general SDE.

Theorem 4.1 Consider the SDE with a stochastic linear controller as:

dx = f(x)dt + u(x)d Bt, x(0) = x0 Rd, (14)

where x, f(x) L x 2 and u(x) = kx with |k| >

2L. Then, for ϵ < x0 , we have

E[τϵ] Tϵ = 2 log ( x0 /ϵ)

E(τϵ, Tϵ) k2 x0 2

exp 2(k2 + 2L) log ( x0 /ϵ)

where, for a sufficiently small ϵ > 0, we denote the stopping time by τϵ inf{t > 0 : x(t) = ϵ} and denote the energy cost by

E(τϵ, Tϵ) E Z τϵ Tϵ

0 u 2dt = E Z Tϵ

0 u 21{t<τϵ}dt .

The proof of this theorem is provided in Appendix A.3.1.

We further consider the case for NN controller u(x). In general, the u(x) is Lipschitz continuous with Lipshcitz constant ku under a suitable activation function such as Re LU in NN (Fazlyab et al., 2019; Pauli et al., 2021). Then we have the following upper bound estimation of the convergence time and the energy cost for ES and AS, whose proofs are provided in Appendix A.3.2, A.3.3. Theorem 4.2 (Estimation for ES) For ES stabilizer u(x) in (14) with x, f(x) L x 2, ε < x0 , under the same notations and conditions in Theorem 2.2 with c3 2c2 > 0, we have

E[τϵ] Tϵ = 2 log (V (x0)/c1εp)

E(τϵ, Tϵ) ku 2 x0 2

exp 2(ku 2 + 2L) log (V (x0)/c1εp))

Theorem 4.3 (Estimation for AS) For (14) with x, f(x) L x 2, ε < x0 , under the same notations and conditions in Theorem 2.3, if the left term in (3) further satisfies max x ε x α 4( x 2(2 x, f(x) + u(x) 2 F) (2 α) x u(x) 2) = δε < 0, then for NN controller u(x) with Lipschizt constant ku, we have

E[τϵ] Tϵ = 2 ( x0 α εα)

E(τϵ, Tϵ) ku 2 x0 2

exp 2(ku 2 + 2L) ( x0 α εα)

Based on the theoretical results for ES and AS, we can further analyze the effects of hyperparameters b, α and neural network structures on the convergence time and energy cost in the control process. There are some interesting phenomena such as the monotonicity of Tε along α for AS change with the relative relationships of x0 and ε, this inspires us to select suitable α according to the specific problems. We leave more discussions in Appendix A.3.4.

Now, we numerically compare the performances of the linear controller u(x) = kx and the AS on the convergence time and the energy cost of the controls applied to system (14) with specific configurations (see Figure 4). We numerically find that u(x) = kx can efficiently stabilize the equilibrium for k > kc = 5.6. Without loss of generality, we fix k = 6.0, and compare the corresponding performances. As clearly shown in Figure 4, the AS outperforms u(x) = kx from both perspectives, the convergence speed and the energy cost. In the simulations, the energy cost E(τϵ, Tϵ) defined above is computed in a finite-time duration as E(τϵ, T Tϵ), where T < is selected to be appropriately large. We leave more results of the comparison study in Appendix A.5.5.

0 2 4 6 8 10 k

0 0.25 0.5 100

Linear control

0 0.075 0.15 20

Learned control

(a) (b) (c)

Figure 4: The performances of system (14) with specific configurations: f(x) = x log(1 + x). (a) u(x) = kx, k = 0.2 j, j = 1, , 50, plot log(1 + x(1)) against k. (b) Linear controller with k = 6.0, E(τ0.1, 1) = 38, 388. (c) AS control with E(τ0.1, 1) = 1438

5 Experiments

In this section, we demonstrate the efficacy of the above-articulated frameworks of stochastic neural control, the ES and the AS, on several representative physical systems. We also compare these two frameworks, highlighting their advantages and weaknesses. The detailed configurations for these experiments are included in Appendix A.5. Additional illustrative experiments are included in Appendix A.6.

5.1 Harmonic Linear Oscillator

0 1.0 2.0 3.0 10

0 1.0 2.0 3.0 Time

ES+ICNN ES+Quad AS

2.4 2.6 2.8 3.0 0.1

0 1.0 2.0 3.0 Time

ES+ICNN ES+Quad AS

2.4 2.6 2.8 3.0 0.2

Original system Controlled system

Figure 5: The solid lines are obtained through averaging the 20 sampled trajectories, while the shaded areas stand for the variance regions.

First, consider the harmonic linear oscillator y + 2β y + w2y = 0, where w is the natural frequency and β > 0 is the damping coefficient representing the strength of the external force on the vibrator (Dekker, 1981). Although this system is exponentially stable, the system with stochastic perturbations y+(2β+ξ2) y+(w2+ξ1)y = 0 becomes unstable even if E[ξ1(t)] = E[ξ2(t)] = 0 (Arnold et al., 1983). Now, we apply the nonlinear ES(+ICNN), the linear ES(+Quadratic), and the nonlinear AS, respectively, to stabilizing this unstable dynamics (A.5.1) with w2 = 1, β = 0.5, ζ1 = 3, and ζ2 = 2.15, the results are shown in Figure 5. Indeed, we find that the two nonlinear stochastic neural controls are more robust than the linear control, and that the ES(+ICNN), rather than the AS, makes the controlled system more stable.

Table 1: Performance on Harmonic Linear Oscillator

Tt Ni Di Ct

ES(+ICNN) 276.385s 121 1e-9 0.459 ES(+Quadratic) 78.071s 107 0.049 3.683 AS 4.839s 184 0.027 2.027

In Table 1 we compute the training time (Tt) used for the loss function converging to 0, the number of iterations (Ni), the distance (Di) between the trajectory and the targeted equilibrium at time T = 4, and the convergence time (Ct) when the distance between the trajectory and the targeted equilibrium is less than 0.05. The results are obtained through averaging the corresponding quantities produced by 20 randomly-sampled trajectories and the detailed training configurations are shown in Appendix A.5.1.

0 0.2 0.4 Time

0 0.2 0.4 Time

ES+ICNN HDSCLF BALSA LQR

Figure 6: Comparison with existing methods.

We further provide a comparison study of our newly proposed ES(+ICNN) with HDSCLF in (Sarkar et al., 2020), BALSA in (Fan et al., 2020) and classic LQR controller in controlling the harmonic linear oscillator. Both HDSCLF and BALSA are based on the Quadratic Program(QP), and they seek the control policy dynamically for each state in the control process. By contrast, our proposed learning control policy is directly used in the control process. Hence, our method is more efficient in the practical control problems. The results are shown in Figure. 6 (Please see more details in Appendix A.5.1). As can be seen that our learning control methods outperforms all others.

5.2 Stuart-Landau Equations

In this subsection, we show that our frameworks are beneficial to realizing the control and the synchronization of complex networks. To this end, we consider the single Stuart Landau oscillator which is governed by the following complex-valued ODE:

Z = (β + iγ + µ|Z|2)Z, Z C. (15)

This equation is a paradigmatic model undergoing the so-called Andronov bifurcation (Kuznetsov, 2013): Stability of the equilibrium changes and the limit cycle emerges as the parameter passes some critical value. In what follows, we consider two cases based on system (15).

Case 1 We set β = 25, γ = 1, and µ = 1, so that system (15) has a stable equilibrium ρ = 0 and an unstable limit cycle ρ = 5. Here, Z = x + iy = ρeiθ. Now, the AS steers the dynamics to the unstable limit cycle, as successfully shown in Figure 7. The trajectories (the left column) and the phase orbits (the right column) of system (15), initiated from 30 randomly-selected initial states, without control (the upper panels) and with control of the AS (the lower panels). The initial values inside (resp., outside) the limit cycle ρ = 5 are indicated by the blue (resp., purple) pentagrams.

Case 2 Next, we consider the synchronization problem of the coupled Stuart-Landau equations. Successful deterministic methods have been systematically developed for realizing synchronization, including the adaptive control with time delay Selivanov et al. (2012) and the open-loop temporal network controller Zhang & Strogatz (2021). These methods majorly depend on the technique

limit cycle 5 7.5

ρ=9.5 ρ=5.0

0 0.1 0.2 0

5 ρ=5.0 ρ=0.0

limit cycle 250 25

0 2e 3 4e 3 Time

With control Without control

Figure 7: The trajectories (the left column) and the phase orbits (the right column) of system (15).

Without Control x

0 1 2 3 Time

With Control x

0 1 2 3 Time

Figure 8: The dynamics of the first (resp., second) component of the coupled oscillators are shown in the panels in the left (resp., middle) column. The dynamics in the phase space (the right column).

of linearization in the vicinity of the synchronization manifold. Here, we show how to apply our framework to achieving the synchronization in the coupled system. We set the corresponding Laplace matrix L = (Ljk)n n as Pn k=1 Ljk = 0, which guarantees the synchronization manifold is an invariant manifold of the coupled system (Pecora & Carroll, 1998). Specifically, we select as n = 20, σ = 0.01, c1 = 1.8, c2 = 4, and Ljk = δjk 1

n, where δjk is the Kronecker function. Then, we apply the AS to this system and realize the stabilization of the synchronous manifold, as shown in Figure 8.

5.3 Data-Driven Pinning Control for Cell Fate Dynamics

0.0 2.5 5.0 Time

State variables

x0 x1 x2 x3 x4 x5

Figure 9: Pinning control for cell fate dynamics.

Indeed, our frameworks can be extended to the model-free version via a combination with existing data reconstruction method. To be concrete, we show that our framework can combine with Neural ODEs (NODEs) (Chen et al., 2018) to learn the control policy from time series data for the Cell Fate system (Sun et al., 2017; Laslo et al., 2006), which describes the interaction between two suppressors during cellular differentiation for neutrophil and macrophage cell fate choices. The system x = f(x), x = (x1, ..., x6) has three steady states: P1,2,3, where P2,3 correspond to different cell fates and are stable and P1 represents a critical expression level connecting the two fates and is unstable. The network structure of this 6-D system is a treemap, where one root node x1 can stabilize itself under the original dynamic. Hence, we choose root node x2 with maximum our degree and add pinning control on it to stabilize the system to unstable state P1. The original trajectory that converges to P2 (left) and the controlled trajectory that converges to P1 (right) are shown in Figure 17. The original trajectory is used to train the NODE to reconstruct the vector field ˆf, then we use the sample of ˆf as training data to learn our stochastic pinning control. We provide experimental details in Appendix A.5.6.

In addition to the above controlled systems, we include the other illustrative examples, the controlled inverted pendulum, reservoir computing and the controlled Lorenz system, in Appendix A.5,A.6.

6 Conclusion and Future Works

In this article, we have proposed two frameworks of neural stochastic control for stabilizing different types of dynamical systems, including the SDEs. We have shown that the neural stochastic control outperforms the classic stochastic linear control in both the convergence time and the energy cost for typical systems. More importantly, using several representative physical systems, we have demonstrated the advantages of our frameworks and showed part of their weaknesses possibly emergent in real applications. Also, we present some limitations of the proposed frameworks in Appendix A.9. Moreover, we suggest several directions for further investigations: (i) acceleration of the training process of the ES, (ii) the basin stability of the neural stochastic control (Menck et al., 2013), (iii) the trade-off between the deterministic controller ug using the NNs and the stochastic controller uf using the NNs, (iv) the safe learning in Robotic control with small disturbances (Berkenkamp et al., 2017), and (v) the design of the purely data-driven stochastic neural control.

7 Acknowledgments

We thank the anonymous reviewers for their valuable and constructive comments that helped us to improve the work. Q.Z is supported by the Shanghai Postdoctoral Excellence Program (No. 2021091) and by the STCSM (Nos. 21511100200 and 22ZR1407300). W.L. is supported by the National Natural Science Foundation of China (No. 11925103) and by the STCSM (Nos. 19511101404, 22JC1402500, 22JC1401402, and 2021SHZDZX0103).

Amos, B., Xu, L., and Kolter, J. Z. Input convex neural networks. In International Conference on Machine Learning, pp. 146 155. PMLR, 2017.

Anderson, C. W. Learning to control an inverted pendulum using neural networks. IEEE Control Systems Magazine, 9(3):31 37, 1989.

Appleby, J. A. Stabilisation of functional differential equations by noise. Systems & Control Letters, 2003.

Appleby, J. A., Mao, X., and Rodkina, A. On stochastic stabilization of difference equations. Discrete & Continuous Dynamical Systems, 15(3):843, 2006.

Appleby, J. A., Mao, X., and Rodkina, A. Stabilization and destabilization of nonlinear differential equations by noise. IEEE Transactions on Automatic Control, 53(3):683 691, 2008.

Arnold, L. Stochastic Differential Equations: Theory and Applications. Stochastic Differential Equations: Theory and Applications, 2007.

Arnold, L., Crauel, H., and Wihstutz, V. Stabilization of linear systems by noise. SIAM Journal on Control and Optimization, 21(3):451 461, 1983.

Berkenkamp, F., Turchetta, M., Schoellig, A. P., and Krause, A. Safe model-based reinforcement learning with stability guarantees. ar Xiv preprint ar Xiv:1705.08551, 2017.

Caraballo, T. and Robinson, J. C. Stabilisation of linear pdes by stratonovich noise. Systems & Control Letters, 53(1):41 50, 2004.

Caraballo, T., Garrido-Atienza, M. J., and Real, J. Stochastic stabilization of differential systems with general decay rate. Systems & Control Letters, 48(5):397 406, 2003.

Chang, Y.-C., Roohi, N., and Gao, S. Neural lyapunov control. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp. 3245 3254, 2019.

Chen, R. T., Rubanova, Y., Bettencourt, J., and Duvenaud, D. Neural ordinary differential equations. ar Xiv preprint ar Xiv:1806.07366, 2018.

Dekker, H. Classical and quantum mechanics of the damped harmonic oscillator. Physics Reports, 80(1):1 110, 1981.

Deng, F., Luo, Q., Mao, X., and Pang, S. Noise suppresses or expresses exponential growth. Systems & Control Letters, 57(3):262 270, 2008.

Fan, D. D., Nguyen, J., Thakker, R., Alatur, N., Agha-mohammadi, A.-a., and Theodorou, E. A. Bayesian learning-based adaptive control for safety critical systems. In 2020 IEEE international conference on robotics and automation (ICRA), pp. 4093 4099. IEEE, 2020.

Fazlyab, M., Robey, A., Hassani, H., Morari, M., and Pappas, G. Efficient and accurate estimation of lipschitz constants for deep neural networks. Advances in Neural Information Processing Systems, 32, 2019.

Gallieri, M., Salehian, S. S. M., Toklu, N. E., Quaglino, A., Masci, J., Koutník, J., and Gomez, F. Safe interactive model-based learning. ar Xiv preprint ar Xiv:1911.06556, 2019.

Guo, Y., Lin, W., and Chen, G. Stability of switched systems on randomly switching durations with random interaction matrices. IEEE Transactions on Automatic Control, 63(1):21 36, 2018.

He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770 778, 2016.

Henrion, D. and Garulli, A. Positive Polynomials in Control, volume 312. Springer Science & Business Media, 2005.

Huang, S.-J. and Huang, C.-L. Control of an inverted pendulum using grey prediction model. IEEE Transactions on Industry Applications, 36(2):452 458, 2000.

Jaeger, H. The echo state approach to analysing and training recurrent neural networks-with an erratum note. Bonn, Germany: German National Research Center for Information Technology GMD Technical Report, 148(34):13, 2001.

Jaeger, H. Echo state network. Scholarpedia, 2(9):2330, 2007.

Jaeger, H. and Haas, H. Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science, 304(5667):78 80, 2004.

Jarvis-Wloszek, Z., Feeley, R., Tan, W., Sun, K., and Packard, A. Some controls applications of sum of squares programming. In 42nd IEEE International Conference on Decision and Control (IEEE Cat. No. 03CH37475), volume 5, pp. 4676 4681. IEEE, 2003.

Khalil, H. K. Nonlinear systems third edition. Patience Hall, 115, 2002.

Kolter, J. Z. and Manek, G. Learning stable deep dynamics models. Advances in Neural Information Processing Systems, 32:11128 11136, 2019.

Kushner, H. J. Stochastic stability and control. Technical report, Brown Univ Providence RI, 1967.

Kuznetsov, Y. A. Elements of applied bifurcation theory, volume 112. Springer Science & Business Media, 2013.

Laslo, P., Spooner, C. J., Warmflash, A., Lancki, D. W., Lee, H.-J., Sciammas, R., Gantner, B. N., Dinner, A. R., and Singh, H. Multilineage transcriptional priming and determination of alternate hematopoietic cell fates. Cell, 126(4):755 766, 2006.

Leong, Y. P., Horowitz, M. B., and Burdick, J. W. Linearly solvable stochastic control lyapunov functions. SIAM Journal on Control and Optimization, 54(6):3106 3125, 2016.

Li, A., Cornelius, S. P., Liu, Y.-Y., Wang, L., and Barabási, A.-L. The fundamental advantages of temporal networks. Science, 358(6366):1042 1046, 2017.

Liptser, R. and Shiryayev, A. N. Theory of martingales, volume 49. Springer Science & Business Media, 2012.

Lukoševiˇcius, M. and Jaeger, H. Reservoir computing approaches to recurrent neural network training. Computer Science Review, 3(3):127 149, 2009.

Ma, H., Ho, D. W., Lai, Y.-C., and Lin, W. Detection meeting control: Unstable steady states in high-dimensional nonlinear dynamical systems. Physical Review E, 92(4):042902, 2015.

Manek, G. and Kolter, J. Z. Learning stable deep dynamics models. ar Xiv preprint ar Xiv:2001.06116, 2020.

Mao, X. Stability of Stochastic Differential Equations with Respect to Semimartingales. Longman, 1991.

Mao, X. Exponential Stability of Stochastic Differential Equations. Marcel Dekker, 1994a.

Mao, X. Stochastic stabilization and destabilization. Systems & Control Letters, 23(4):279 290, 1994b.

Mao, X. Stochastic differential equations and applications. Elsevier, 2007.

Mao, X., Marion, G., and Renshaw, E. Environmental brownian noise suppresses explosions in population dynamics. Stochastic Processes and Their Applications, 97(1):95 110, 2002.

Mao, X., Yin, G. G., and Yuan, C. Stabilization and destabilization of hybrid systems of stochastic differential equations. Automatica, 43(2):264 273, 2007.

Menck, P. J., Heitzig, J., Marwan, N., and Kurths, J. How basin stability complements the linearstability paradigm. Nature Physics, 9(2):89 92, 2013.

Parrilo, P. A. Structured Semidefinite Programs and Semialgebraic Geometry Methods in Robustness and Optimization. California Institute of Technology, 2000.

Pauli, P., Koch, A., Berberich, J., Kohler, P., and Allgöwer, F. Training robust neural networks using lipschitz bounds. IEEE Control Systems Letters, 6:121 126, 2021.

Pecora, L. M. and Carroll, T. L. Master stability functions for synchronized coupled systems. Physical Review Letters, 80(10):2109, 1998.

Polycarpou, M. Stable adaptive neural control scheme for nonlinear systems. IEEE Transactions on Automatic Control, 41(3):447 451, 1996. doi: 10.1109/9.486648.

Richards, S. M., Berkenkamp, F., and Krause, A. The lyapunov neural network: Adaptive stability certification for safe learning of dynamical systems. In Conference on Robot Learning, pp. 466 476. PMLR, 2018.

Sarkar, M., Ghose, D., and Theodorou, E. A. High-relative degree stochastic control lyapunov and barrier functions. ar Xiv preprint ar Xiv:2004.03856, 2020.

Sastry, S. and Isidori, A. Adaptive control of linearizable systems. IEEE Transactions on Automatic Control, 34(11):1123 1131, 1989. doi: 10.1109/9.40741.

Selivanov, A. A., Lehnert, J., Dahms, T., Hövel, P., Fradkov, A. L., and Schöll, E. Adaptive synchronization in delay-coupled networks of stuart-landau oscillators. Physical Review E, 85(1): 016201, 2012.

Sparrow, C. The Lorenz Equations: Bifurcations, Chaos, and Strange Attractors, volume 41. Springer Science & Business Media, 2012.

Sun, Y.-Z., Leng, S.-Y., Lai, Y.-C., Grebogi, C., and Lin, W. Closed-loop control of complex networks: A trade-off between time and energy. Physical Review Letters, 119(19):198301, 2017.

Takeishi, N. and Kawahara, Y. Learning dynamics models with stable invariant sets. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 9782 9790, 2021.

Tan, W. and Packard, A. Searching for control lyapunov functions using sums of squares programming. sibi, 1(1), 2004.

Wang, B. and Zhu, Q. Stability analysis of markov switched stochastic differential equations with both stable and unstable subsystems. Systems & Control Letters, 105:55 61, 2017.

Xie, C., Wu, Y., Maaten, L. v. d., Yuille, A. L., and He, K. Feature denoising for improving adversarial robustness. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 501 509, 2019.

Yan, G., Ren, J., Lai, Y.-C., Lai, C.-H., and Li, B. Controlling complex networks: How much energy is needed? Physical Review Letters, 108(21):218703, 2012.

Yan, H., Du, J., Tan, V. Y., and Feng, J. On robustness of neural ordinary differential equations. ar Xiv preprint ar Xiv:1910.05513, 2019.

Yang, S.-K., Chen, C.-L., and Yau, H.-T. Control of chaos in lorenz system. Chaos, Solitons & Fractals, 13(4):767 780, 2002.

Yang, T., Yang, L.-B., and Yang, C.-M. Impulsive control of lorenz system. Physica D: Nonlinear Phenomena, 110(1-2):18 24, 1997.

Zhang, Y. and Strogatz, S. H. Designing temporal networks that synchronize under resource constraints. Nature Communications, 12(1):1 8, 2021.

Zhu, Q., Ma, H., and Lin, W. Detecting unstable periodic orbits based only on time series: When adaptive delayed feedback control meets reservoir computing. Chaos: An Interdisciplinary Journal of Nonlinear Science, 29(9):093125, 2019.

1. For all authors...

(a) Do the main claims made in the abstract and introduction accurately reflect the paper s contributions and scope? [Yes] (b) Did you describe the limitations of your work? [Yes] See the Section 6

(c) Did you discuss any potential negative societal impacts of your work? [N/A] (d) Have you read the ethics review guidelines and ensured that your paper conforms to them? [Yes] 2. If you are including theoretical results...

(a) Did you state the full set of assumptions of all theoretical results? [Yes] (b) Did you include complete proofs of all theoretical results? [Yes] See the supplementary material. 3. If you ran experiments...

(a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See the supplementary material. (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See the supplementary material. (c) Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? [Yes] See the Section 5 (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See the supplementary material. 4. If you are using existing assets (e.g., code, data, models) or curating/releasing new assets...

(a) If your work uses existing assets, did you cite the creators? [Yes] (b) Did you mention the license of the assets? [N/A] The repos we used do not mention the license. (c) Did you include any new assets either in the supplemental material or as a URL? [Yes] (d) Did you discuss whether and how consent was obtained from people whose data you re using/curating? [No] (e) Did you discuss whether the data you are using/curating contains personally identifiable information or offensive content? [N/A] 5. If you used crowdsourcing or conducted research with human subjects...

(a) Did you include the full text of instructions given to participants and screenshots, if applicable? [N/A] (b) Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? [N/A] (c) Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation? [N/A]