# parameter_inference_with_bifurcation_diagrams__29426c65.pdf Parameter Inference with Bifurcation Diagrams Gregory Szep King s College London London, WC2R 2LS, UK gregory.szep@kcl.ac.uk Attila Csikász-Nagy Pázmány Péter Catholic University Budapest, 1083, Hungary csikasznagy@gmail.com Neil Dalchau Microsoft Research Cambridge Cambridge, CB1 2FB, UK ndalchau@gmail.com Estimation of parameters in differential equation models can be achieved by applying learning algorithms to quantitative time-series data. However, sometimes it is only possible to measure qualitative changes of a system in response to a controlled condition. In dynamical systems theory, such change points are known as bifurcations and lie on a function of the controlled condition called the bifurcation diagram. In this work, we propose a gradient-based approach for inferring the parameters of differential equations that produce a user-specified bifurcation diagram. The cost function contains an error term that is minimal when the model bifurcations match the specified targets and a bifurcation measure which has gradients that push optimisers towards bifurcating parameter regimes. The gradients can be computed without the need to differentiate through the operations of the solver that was used to compute the diagram. We demonstrate parameter inference with minimal models which explore the space of saddle-node and pitchfork diagrams and the genetic toggle switch from synthetic biology. Furthermore, the cost landscape allows us to organise models in terms of topological and geometric equivalence. 1 Introduction Inverse problems [1] arise in biology and engineering in settings when the model is not fully known and the desire is to match model behaviour to a given set of observations. This helps systematically guide both model and experimental design. While we would like to understand the quantitative details of a system, often only qualitative changes in response to varying experimental conditions can be robustly measured across independent studies [2, 3]. For example, several studies are likely to agree that the human immune system activates above a threshold concentration of a pathogen and deactivates at a lower threshold concentration, but may disagree on the exact quantities of the thresholds or the magnitudes of the immune response. Bifurcation theory provides us a framework for studying these transitions in a manner that is independent of quantitative details [4]. The emerging picture suggests that identification of the qualitative behaviour the bifurcation diagram should precede any attempt at inferring other properties of a system [5]. Inferring the parameters of a model directly from a bifurcation diagram is difficult because it is not obvious how multiple parameters in concert control the existence and position of a bifurcation. It could even be impossible for the model to bifurcate in the manner desired. For models with a sufficiently small number of parameters, finding specific bifurcation diagrams is typically done by 35th Conference on Neural Information Processing Systems (Neur IPS 2021). hand [6]. Several approaches exist to place bifurcations to desired locations once a manifold is present [7 9] yet typically resort to sampling techniques to search for them in the first place [10, 11]. It is always possible to design bespoke goodness of fit measures to find specific model behaviours, for example using the period and phase of limit cycle oscillations [12]. However, this approach does not generalise across a wider set of qualitative behaviours. Progress has been made in cases where model structure and stability conditions are used to refine the search space [13, 14] yet the resulting objectives are still not explicit in the bifurcation targets and also not differentiable. In the emerging field of scientific machine learning [15 17], parameters of structured mechanistic models are favoured over flexible models in larger parameter spaces. A scalable method for navigating the space of bifurcation diagrams would enable design of differential equations with high-level qualitative constraints. Furthermore one could begin organising models according to qualitatively distinct behaviours. Back-propagation through differential equation solvers has been a breakthrough over the past couple of years [18, 19] that enabled scalable parameter inference for differential equations from trajectory data. Although one could use trajectory data to create the aforementioned qualitative constraints [20, 21] this would entail over-constraining information originating from the kinetics and dynamical transients of the model. Furthermore, such data usually does not contain sufficient information about dynamical transients in order to identify kinetic parameters. Techniques for back-propagating through implicit equation solvers have also been developed [22, 23] although to the best of the authors knowledge have not been applied to bifurcation diagrams at the time of writing this paper. The problem of inferring differential equation parameters against a user-specified bifurcation diagram decomposes into two parts: searching for bifurcating regimes and matching the locations of bifurcation points to desired values. Matching bifurcation locations is a supervised problem where the data are expressed as bifurcations points [8, 11]. Searching for bifurcations is an unsupervised problem because when bifurcations are not present, there is no distance defined between data and prediction [10]. Therefore only properties of the model can be used to start the search. We propose an approach for performing both tasks in an end-to-end fashion. The bifurcation diagram encodes high-level qualitative information defined by state space structures, rather than kinetics. We apply the strategy of implicit layers [22, 23] to calculate gradients. To compute the diagram we use a predictor-corrector method called deflated continuation [24, 25] developed for partial differential equations. We find that the cost function landscape contains basins that not only allow us to synthesise models with a desired bifurcation diagram but also allow us to organise models in terms of topological and geometric equivalence. We discuss the relevance of this in model selection. In summary, our paper has the following main contributions: An end-to-end differentiable method for locating bifurcations in parameter space and then matching their dependency on a control condition to user-specified locations Implementation of the method as a Julia package Bifurcation Inference.jl Leveraging the cost landscape for a novel way of organising differential equation models in terms of geometric and topological equivalence 1.1 Preliminaries Suppose we collected observations along a scalar control condition p R and conclude that there are specific values of p for which there are qualitative changes in system behaviour. Let D be the set of those values and let us hypothesise that these transitions occur due to bifurcations in the dynamics that drive the underlying mechanism. Let us model the mechanism with a parametrised set of differential equations for states u RN with a vector function Fθ in a parameter space θ RM. For the purposes of introducing this work, we will consider the simplest class of bifurcations known as co-dimension one bifurcations not including limit cycles. Therefore D should contain conditions for which we hypothesise changes in multi-stable behaviour. Let the equations be t = Fθ(u, p) where Fθ : RN+1 RN (1) In the context of the differential equations, and not considering limit cycles for now, we show that a static non-degenerate bifurcation can be defined by a set of conditions on the determinant of the Jacobian Fθ u . The determinant of the Jacobian quantifies the rate at which trajectories in a local patch of state-space u RN converge or diverge. Let s R parametrise the curves that trace out the bifurcation diagram. Any location on the curve u(s) and p(s) must satisfy the steady-state of equations (1). Directional derivatives d ds along the diagram require the calculation of a vector that is tangent to the diagram (see Supplementary A). The determinant approaching zero along the diagram means that the dynamics of the system are slowing down, which is an important indicator for the onset of a transition between qualitative behaviours. Furthermore, the slowing down must necessarily be followed by a breakdown of stability; for this to be true it is sufficient but not necessary to require that the determinant cross zero with a finite slope, meaning that its directional derivative along the diagram d u is not zero. This is the non-degeneracy condition. The set of predicted values for the control condition P(θ) R at which bifurcations occur are defined as P(θ) := p | u : Fθ(u, p) = 0, Fθ A proof of how the conditions (2) are necessary and sufficient for static non-degenerate bifurcations is detailed in Supplementary B. The most common bifurcations between steady states, not including limit cycles, are saddle-nodes and pitchforks [26]. Saddle-node bifurcations, which often appear in pairs (Figure 1A) are defined by stable and unstable fixed points meeting and disappearing. Pitchfork bifurcations occur where a single steady state splits into two stable and one unstable steady state (Figure 1B shows an imperfect pitchfork; a perfect pitchfork arises when θ1 = 0). To illustrate these bifurcations, we define minimal models (Figure 1) that span the space of saddle-node and pitchforks, where indeed zero crossings in the determinant with a finite slope define the set of prediction P(θ). The location of these crossings in general may not match the targets D. Figure 1: Illustration of bifurcation diagrams for minimal models of bifurcations. A. Saddle-node bifurcations arise for Fθ(u, p) = p + θ1u + θ2u3 when θ = ( 5 2, 1). B. Pitchfork bifurcations arise for Fθ(u, p) = θ1 + pu + θ2u3 when θ = ( 1 2, 1). Targets are illustrated by light yellow vertical lines. Bifurcation curves are shown as solid blue and red lines, with lighter shades indicating the determinant crossing zero at locations P(θ) giving rise to unstable solutions. For a given set of parameters θ one could compute the set of predicted bifurcations P(θ) using parameter continuation methods [25, 24]. Our goal is to find optimal parameters θ that match predictions P(θ ) to specified targets D. We must design a suitable cost function L so that θ := argminθL(θ|D) (3) The optimal θ is not expected to always be unique, but is in general a manifold representing the space of qualitatively equivalent models. Ideally, the cost function L should reward θ for which the number of predicted bifurcations is equal to the number of targets, |P(θ)| = |D|. This is especially important in the case where there are no predictions |P(θ)| = 0. 2 Proposed Method 2.1 Cost Function To identify parameter sets that give rise to bifurcation diagrams with specified bifurcation points, we propose a cost function that comprises two terms. The role of the error term is simply to reward predicted bifurcations to coincide with the specified target locations. This of course relies on such bifurcations existing. The role of the eigenvalue term is to encourage an optimiser to move towards parameter regimes that do exhibit bifurcations. 2.1.1 Error term: matching bifurcations to target locations In order for predicted bifurcations p(θ) P(θ) to match targets p D we need to evaluate an error term |p(θ) p |. A naive approach might take an average over the norms for all prediction-target pairs. However this gives rise to unwanted cross-terms and the possibility of multiple predictions matching the same target without any penalty for unmatched targets. Therefore, we choose a geometric mean over the predictions and an arithmetic mean over targets: E(θ, D) = 1 |D| p(θ) P(θ) |p(θ) p | The error term is only zero when each target is matched by at least one prediction and allows for cases where the number of predictions is greater than or equal to the number of targets |P| |D|. An alternative approach, which undesirably introduces more hyper-parameters, would be to let each prediction P(θ) represent the centroid of a mixture distribution and use expectation-maximisation to match the centroids to targets D. 2.1.2 Eigenvalue term: encouraging bifurcations Figure 2: Bifurcation measure ϕθ(s) and determinant Fθ u along the arclength s of two different bifurcation curves demonstrating how maximising the measure along the curve maintains the existing bifurcation marked by a circle, while encouraging new bifurcations marked by stars. We can see from Figure 1 and definition (2) that predictions p(θ) can be identified by looking for points along the curve where the determinant crosses zero Fθ u = 0 with a finite slope d ds Fθ u = 0. Using these quantities we can define a positive semi-definite measure ϕθ(s) of zero crossings in the determinant along a curve parametrised by s which we define as The bifurcation measure ϕθ(s) is maximal at bifurcations and has finite gradients in nonbifurcating regimes (Figure 2). More specifically, the measure ϕθ(s) is one at bifurcation points and goes to zero an odd number of times between bifurcations. This is because Fθ u must eventually turn around in order to return back to zero, resulting in the directional derivative d u going to zero. Hence the measure ϕθ(s) goes to zero for each turning point (see Figure 2). On the other hand, as the determinant Fθ u diverges, we approach regimes far away from any bifurcations and hence ϕθ(s) 0. Since we would still like to have non-zero gradients with respect to θ in these regimes we designed the measure to go to zero sufficiently slowly. While the calculation of the determinant is straightforward, its directional derivative requires a tangent vector to the bifurcation curve. Fortunately the tangent vector Tθ(s) at the solution u(s), p(s) anywhere along the curve s can be calculated as the nullspace of the rectangular N (N + 1) Jacobian Fθ (u, p) Fθ(u(s),p(s))=0 Tθ(s) = 0 (6) This equation guarantees that the tangent vector Tθ(s) is orthogonal to all hyper-planes defined by the components of Fθ. In this setting the dimension of the nullspace is always known, and therefore can reliably be calculated using QR factorisation methods [27]. Equipped with a measure that quantifies the appearance of bifurcations along a bifurcation arc we can define the total measure for a bifurcation diagram as Fθ(u,p)=0ϕθ(s) ds Fθ(u,p)=0 ds . (7) Here we denote R Fθ(u,p)=0 ds as the sum of the line integrals in (u, p) RN+1 defined by the level set Fθ(u, p) = 0 with s being an arbitrary parametrisation of the curves. The total measure Ψ(θ) is normalised such that Ψ(θ) 1 in the regimes where the controlled condition region p is densely packed with bifurcations. The total measure Ψ(θ) is added to the error term as if it were a likelihood. This defines the cost function as L(θ|D) := |P| |D| log Ψ(θ) + E(θ, D), (8) The pre-factor |D| |P| in the eigenvalue term ensures that the gradients are always pushing optimisers towards a state where |D| = |P|. This can be seen as a step-wise annealing of the eigenvalue term until the desired state is reached. 2.2 Differentiating the cost function To make use of gradient-based optimisers to locate desired bifurcation diagrams, we show here how to differentiate the cost function. First, we note that while individual bifurcations p(θ) depend smoothly on θ, the total number of predictions |P| does not have gradient contributions with respect to θ. Therefore, we can safely drop the dependency in the prediction counter and now proceed in taking gradients with respect to θ knowing that the only dependencies we need to track are for individual bifurcations p(θ) within the definition the error term (4) and the total measure (7). Therefore, θ = |P| |D| λ Ψ θ Ψ(θ) 1 + 1 |D||P| p(θ) |p(θ) p | p θ (p(θ) p ) 1 (9) In a similar vein to back-propagation through neural differential equations [18] we would like to be able to calculate the gradient L θ without having to differentiate through the operations of the solver that finds the bifurcation diagram Fθ(u, p) = 0 and the bifurcation locations p(θ). To calculate the gradient of the measure Ψ θ we need to differentiate line integrals that depend on θ. Fortunately this can be done by the application of the generalised Leibniz integral rule, details of which can be found in Supplementary C. The gradient of the bifurcation points p θ is found by application of the implicit function theorem to a vector function Gθ : RN+1 RN+1 whose components represent the two constraints Fθ(u, p) = 0 and Fθ u = 0. By following a similar strategy to that used by implicit layers [22] we yield an (N + 1) M Jacobian representing a deformation field [28] for each θ direction. The gradient we are looking for becomes p θ = ˆp Gθ (u, p) Gθ(u,p)=0 where Gθ(u, p) := Fθ(u, p) Fθ Here ˆp is a unit vector in (u, p) RN+1 that picks out the deformations along the p-direction. If we wanted to place the bifurcation at target steady state u as well as target control condition p we would use the full (N + 1) M deformation matrix. Calculation of this matrix involves inverting an (N + 1) (N + 1) Jacobian Gθ (u,p). Instead of explicitly inverting the Jacobian the corresponding system of linear equations is solved. The determinant of this Jacobian goes to zero in the degenerate case where d ds Fθ u = 0, further justifying our choice of measure Ψ(θ) which discourages the degenerate case. The cost function is piece-wise smooth and differentiable with undefined gradients only in parameter contours where the number of predictions |P| changes; this is when Ψ(θ) is undefined and the inverse of Gθ (u,p) does not exist. Given a set of solutions to Fθ(u, p) = 0 and locations p(θ) the gradient L θ can be evaluated using automatic differentiation methods [29 31] without needing to back-propagate through the solver that obtained the level set Fθ(u, p) = 0 in the forward pass. 3 Experiments & Results In this section, we apply the method first to minimal examples that can produce saddle-node and pitchfork bifurcations (both N = 1, M = 2), and then a slightly more complex model (N = 2, M = 5) that has multiple parametric regimes producing saddle-node bifurcations. We also demonstrate our method on a model of greater complexity, to convince the reader that the method can be used on more realistic examples with practical significance. In Supplementary D we demonstrate the identification of saddle-node bifurcations and damped oscillations in a model (N = 4, M = 21) of a synthetic gene circuit in E. coli [3]. Figure 3: Saddle-node Fθ(u, p) = p+θ1u+θ2u3 and pitchfork Fθ(u, p) = θ1+up+θ2u3 optimised with respect to θ so that predicted bifurcations P(θ) match targets D in control condition p. The right panel shows bifurcations diagrams for the three optimal θ marked by stars on the left panel. The optimisation trajectories in white follow the gradient of the cost, approaching the black lines of global minima in the left panel 3.1 Minimal Models Optimisations of two parameters (θ1, θ2) using simple gradient descent from Flux.jl with learning rate η = 0.01 for the minimal saddle-node and pitchfork models (Figure 1) yield trajectories approach- ing lines of global minima in the cost function (Figures 3) which represent a set of geometrically equivalent models. Two bifurcation diagrams are geometrically equivalent if the number, type and locations of bifurcations match the specified targets D. We can see that the geometrically equivalent lines are contained within larger basins where the correct number and type of bifurcations are present but do not match the locations of targets D. All models within this basin are in some sense topologically equivalent. This hierarchical classification allows us to identify the set of models that satisfy observed qualitative behaviour [5] before any attempt at inferring kinetic parameters, which is done by choosing a model along the line of geometrically equivalent models. Optimisation trajectories for the two minimal models appear mostly circumferential. This is because the models were set up such that the radial direction from the origin in θ space mostly scale kinetics whereas the circumferential direction changes the bifurcation topology. This suggests that the gradients of our cost function seek to change model geometry over kinetics. 3.2 Genetic Toggle Switch In this section we optimise a model where the states share a Hill function relationship with cooperatively n = 2; these models often emerge from mass action kinetics with quasi-steady state approximations and are used to model species concentrations. After re-scaling the equations governing the dynamics of concentrations, the simplified equations for state u1 and u2 become tu1 = a1 + (pu2)2 1 + (pu2)2 µ1u1 tu2 = a2 + (ku1)2 1 + (ku1)2 µ2u2 (11) where ak is the baseline production rate for species k in the absence of the other species. Each species has a finite degradation rate µk. Finally we have two sensitivity constants p and k, one of which is chosen as our control condition. A baseline production rate ak > 1 recovers an inhibitor type hill function for species k and is an activator otherwise. The sensitivities are proportional to the slope of the hill productions. Solving for the steady states, substituting the equation for u1 into u2 and rearranging gives rise to the relationship k µ1 = (1 + ( p µ2 u )2) a2 u u 1 where u := u2µ2 (12) Figure 4: Bifurcation inference for the two-state model (11). A. Optimal parameter estimates θ for the targets D = {4, 5} reveal two clusters of qualitatively different regimes: mutual activation (a1 < 1; cluster 1) and mutual inhibition (a1 > 1; cluster 2). B. Example bifurcation diagrams indicate positively and negatively correlated dependencies between the two model states, as a function of the control condition. which reveals that only a1, a2 and the ratio between the sensitivity and degradation parameters, k µ1 , affect the solutions to this equation, and hence the locations of the bifurcations (Figure 4A). In 98% of 800 runs, optimisation using the ADAM optimiser [32] from Flux.jl with learning rate η = 0.1 converged to one of two clusters: mutual activation (a1 < 1, a2 < 1; cluster 1) and mutual inhibition (a1 > 1, a2 > 1; cluster 2) regimes. Example bifurcation diagrams illustrate how the bifurcation curves of each species are positively correlated in mutual activation and negatively correlated for mutual inhibition (Figure 4B). In order to maintain biological interpretability, optimisation was restricted to the positive parameter regime by transforming the parameters to log-space θ 10θ. At the beginning of each optimisation run an initial θ was chosen in the log-space by sampling from a multivariate normal distribution with mean zero and standard deviation one. 3.3 Complexity Performing one iteration of the optimisation requires the computation of the gradient of the cost (9), requiring a computation of the bifurcation diagram with parameter continuation methods, which includes the evaluation of matrix inversions (10). Instead of evaluating the inversions directly, we solve a system of linear equations, applying the same strategy as implicit layers [22, 23]. This leaves us with the computational bottleneck of calculating the determinant of the state space Jacobian, required in both the bifurcation measure (5) and gradient (10). This calculation scales like N 2 where N is the number of state space variables (Figure 5A). Figure 5: A. Execution time (time to calculate cost gradient) with respect to states N. B. Convergence times (the time it takes to find and match a bifurcation to within 1% of a specified target) with respect to the number of parameters M, comparing against a gradient-free approach: Nelder-Mead. Calculations were performed on an Intel Core i7-6700HQ CPU @ 2.60GHz x 8 without GPU acceleration. For the complexity study, a model was designed so that it is extensible both in the number of parameters M and the number of states N. There are many choices for this; we opted for a model of the form tu1 = sin2p (θ1 sin2p + 1)u1 tun = un 1 (µ2 n + 1)un 2 n N (13) In this model only the first state u1 defines the shape of the bifurcation diagram, while the remaining states are merely linearly proportional to the first. The parameters µn contain sums of θm allowing us a flexible choice on the number of parameters while maintaining stable solutions for the bifurcation diagram. While still tractable on laptop computers for states N < 100 our implementation currently does not scale well for partial differential equations where a large the number of states N arises from discretisation of the spatial variables. The only reason we need this determinant is because it is an indicator of bifurcations. We can address the computational bottleneck by finding a more computationally efficient way of calculating this indicator. One approach would be to take the product of a finite subset of eigenvalues of the system. Note that any more efficient calculation must still permit back-propagation through it. To demonstrate the benefits of the gradient-based aspect of our method we compare convergence times of gradient descent against a gradient-free approach. We use the Nelder-Mead method from Optim.jl [33] and obtain convergence times as the number of parameters M is increased (Figure 5B). We observe that for our method convergence times scale like M compared to M 2 for the gradient-free approach. 4 Conclusion & Broader Impact We proposed a gradient-based approach for inferring the parameters of differential equations that produce a user-specified bifurcation diagram. By applying implicit layers [22, 23] and the generalised Leibniz rule [34] to the geometry of the implicitly defined steady states [35] it is possible to use automatic differentiation methods to efficiently calculate gradients. We defined a bifurcation measure that uses the determinant of the state-space Jacobian as an indicator for bifurcating parameter regimes in the eigenvalue term of the cost function. The gradients of the cost can be efficiently computed using automatic differentiation methods. The computational bottleneck is the evaluation of the state-space Jacobian determinant which limits the implementation to ordinary differential equations. We demonstrated our approach on models with one bifurcation parameter that can give rise to pitchforks and saddle-nodes. The estimated parameters form distinct clusters, allowing us to organise models in terms of topological and geometric equivalence (Figure 3). In the case of the genetic toggle switch (Figure 4) and a more complex model [3] (Figure D.1) we recovered mutual activation and inhibition regimes. In the more complex model we found a damped oscillatory regime that was not known about in the original paper. Although we did not consider limit cycles, the bifurcation measure can be extended to detect PoincaréAndronov-Hopf bifurcations alongside changes in stability of fixed points (see Supplementary E for details). This measure enables detection of the onset of damped oscillations and/or the emergence of limit cycles (Figure E.1). Used together with a steady state solver that detects periodic solutions and gradient-based optimisation, we can specify regions of damped oscillation and limit cycles. Our approach generalises naturally to bifurcation manifolds such as limit point curves or surfaces. This is because the normal components of implicit derivatives can still be calculated for under-determined systems of equations [28, 36, 37]. In the case of manifolds it would be more appropriate to use isosurface extraction algorithms rather than continuation to obtain the steady-state manifold. Our approach does not depend on the details of the steady-state solver and therefore can still be applied. In dynamical systems theory the geometry of state-space determines all of the qualitative behaviours of a system. Our work makes progress towards designing models directly in state-space, rather than the spatial or temporal domain. This is valuable to experimentalists who only have qualitative observations available to them and wish to navigate the space of qualitative behaviours of their system. Our work lies within a trend of progress in the scientific machine learning community, where structured domain-informed models are favoured over flexible models that live in large parameter spaces. 5 Acknowledgements We would like to acknowledge Kieran Cooney for the fruitful conversations that helped guide the derivations and computational approach. A special thanks go to Romain Veltz and the Julia community for helpful pointers on package development and discussions over Slack. This work was supported by Microsoft Research through its Ph D Scholarship Programme and the EPSRC Centre for Doctoral Training in Cross-Disciplinary Approaches to Non-Equilibrium Systems (CANES, EP/L015854/1). [1] U. G. Abdulla, R. Poteau, A. Binder, H. W. Engl, C. Flamm, P. K. Â. Ugler, J. Lu, S. M. Â. Uller, and P. Schuster, Inverse problems in systems biology, Inverse Problems, vol. 25, p. 51, 2009. [2] J. J. Tyson, K. Chen, and B. Novak, Network dynamics and cell physiology, Nature reviews Molecular cell biology, vol. 2, no. 12, pp. 908 916, 2001. [3] P. Grant, G. Szep, O. Patange, J. Halatek, V. Coppard, A. Csikász-Nagy, J. Haseloff, J. Locke, N. Dalchau, and A. Phillips, Interpretation of morphogen gradients by a synthetic bistable circuit, Nature Communications, vol. 11, no. 1, 2020. [4] Y. A. Kuznetsov, Topological Equivalence, Bifurcations, and Structural Stability of Dynamical Systems, in Elements of Applied Bifurcation Theory, pp. 39 76, Springer New York, 2004. [5] M. P. H. Stumpf and E. Roesch, Parameter inference in dynamical systems with co-dimension 1 bifurcations, Royal Society, vol. 6, no. 10, 2019. [6] A. Csikász-Nagy, D. Battogtokh, K. C. Chen, B. Novák, and J. J. Tyson, Analysis of a Generic Model of Eukaryotic Cell-Cycle Regulation, Biophysical Journal, vol. 90, pp. 4361 4379, 6 2006. [7] K. Iwasaki and Y. Kamimura, An inverse bifurcation problem and an integral equation of the Abel type, Inverse Problems, vol. 13, pp. 1015 1031, 1997. [8] J. Lu, H. W. Engl, and P. Schuster, Inverse bifurcation analysis: Application to simple gene systems, Algorithms for Molecular Biology, vol. 1, pp. 1 16, 7 2006. [9] I. Dobson, Distance to Bifurcation in Multidimensional Parameter Space: Margin Sensitivity and Closest Bifurcations, in Bifurcation Control: Theory and Applications, pp. 49 66, Springer, Berlin, Heidelberg, 4 2004. [10] V. Chickarmane, S. R. Paladugu, F. Bergmann, and H. M. Sauro, Bifurcation discovery tool, BIOINFORMATICS APPLICATIONS NOTE, vol. 21, no. 18, pp. 3688 3690, 2005. [11] E. D. Conrad, J. Tyson, R. Laubenbacher, J. Phillips, and M. Renardy, Bifurcation Analysis and Qualitative Optimization of Models in Molecular Cell Biology with Applications to the Circadian Clock, Virginia Tech, 4 2006. [12] J. C. W. Locke, A. J. Millar, and M. S. Turner, Modelling genetic networks with noisy and varied experimental data: the circadian clock in Arabidopsis thaliana, Journal of theoretical biology, vol. 234, no. 3, pp. 383 393, 2005. [13] I. Otero-Muras and J. R. Banga, Optimization-based prediction of fold bifurcations in nonlinear ODE models, IFAC-Papers On Line, vol. 51, pp. 485 490, 1 2018. [14] I. Otero-Muras, P. Yordanov, and J. Stelling, A method for inverse bifurcation of biochemical switches: inferring parameters from dose response curves, BMC Systems Biology, vol. 8, p. 114, 2014. [15] C. Rackauckas and Q. Nie, Differentialequations.jl - a performant and feature-rich ecosystem for solving differential equations in julia, Journal of Open Research Software, vol. 5, no. 1, 2017. [16] C. Rackauckas, Y. Ma, V. Dixit, X. Guo, M. Innes, J. Revels, J. Nyberg, and V. Ivaturi, A comparison of automatic differentiation and continuous sensitivity analysis for derivatives of differential equation solutions, ar Xiv preprint ar Xiv:1812.01892, 2018. [17] C. Rackauckas, Y. Ma, J. Martensen, C. Warner, K. Zubov, R. Supekar, D. Skinner, and A. Ramadhan, Universal Differential Equations for Scientific Machine Learning, ar Xiv, 2020. [18] R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. Duvenaud, Neural Ordinary Differential Equations, NIPs, vol. 109, pp. 31 60, 6 2018. [19] C. Rackauckas, M. Innes, Y. Ma, J. Bettencourt, L. White, and V. Dixit, Diff Eq Flux.jl - A Julia Library for Neural Differential Equations, Ar Xiv, 2019. [20] S. Ranciati, C. Viroli, and E. Wit, Bayesian Smooth-and-Match strategy for ordinary differential equations models that are linear in the parameters, Ar Xiv, 2017. [21] F. Khadivar, I. Lauzana, and A. Billard, Learning dynamical systems with bifurcations, Robotics and Autonomous Systems, vol. 136, p. 103700, 2 2021. [22] A. Look, S. Doneva, M. Kandemir, R. Gemulla, and J. Peters, Differentiable Implicit Layers, Arv Xiv, 10 2020. [23] S. Bai, J. Z. Kolter, and V. Koltun, Deep Equilibrium Models, ar Xiv, 9 2019. [24] P. E. Farrell, C. H. L. Beentjes, and Ã. Birkisson, The computation of disconnected bifurcation diagrams, ar Xiv, 3 2016. [25] R. Veltz, Bifurcation Kit.jl, tech. rep., Inria Sophia-Antipolis, 7 2020. [26] M. Haragus and G. Iooss, Local Bifurcations, Center Manifolds, and Normal Forms in Infinite Dimensional Dynamical Systems. London: Springer London, 2011. [27] Z. Drmaˇc and Z. Bujanovi c, On the Failure of Rank-Revealing QR Factorization Software A Case Study, ACM Transactions on Mathematical Software, vol. 35, 7 2008. [28] S. Jos and R. Schmidt, On the velocity of an implicit surface, ACM Transactions on Graphics, vol. 30, pp. 1 7, 5 2011. [29] J. Revels, M. Lubin, and T. Papamarkou, Forward-Mode Automatic Differentiation in Julia, Arv Xiv, 2016. [30] M. Innes, E. Saba, K. Fischer, D. Gandhi, M. C. Rudilosso, N. M. Joy, T. Karmali, A. Pal, and V. Shah, Fashionable Modelling with Flux, Co RR, vol. abs/1811.01457, 2018. [31] M. Innes, Flux: Elegant Machine Learning with Julia, Journal of Open Source Software, 2018. [32] D. P. Kingma and J. Ba, Adam: A Method for Stochastic Optimization, ar Xiv, 12 2014. [33] P. K Mogensen and A. N Riseth, Optim: A mathematical optimization package for Julia, Journal of Open Source Software, vol. 3, p. 615, 4 2018. [34] H. Flanders, Differentiation Under the Integral Sign, The American Mathematical Monthly, vol. 80, p. 615, 6 1973. [35] R. Goldman, Curvature formulas for implicit curves and surfaces, in Computer Aided Geometric Design, vol. 22, pp. 632 658, Elsevier, 10 2005. [36] M. Tao, J. Solomon, and A. Butscher, Near-Isometric Level Set Tracking, Computer Graphics Forum, vol. 35, pp. 65 77, 8 2016. [37] M. Fujisawa, Y. Mandachi, and K. T. Miura, Calculation of Velocity on an Implicit Surface by Curvature Invariance, Information and Media Technologies, vol. 8, no. 4, pp. 674 680, 2013. [38] M. Kratochvíl, O. Hunewald, L. Heirendt, V. Verissimo, J. Vondrášek, V. P. Satagopam, R. Schneider, C. Trefois, and M. Ollert, Giga SOM.jl: High-performance clustering and visualization of huge cytometry datasets, Giga Science, vol. 9, pp. 1 8, 11 2020.