# processconstrained_batch_bayesian_optimisation__227ba012.pdf Process-constrained batch Bayesian Optimisation Pratibha Vellanki1, Santu Rana1, Sunil Gupta1, David Rubin2 Alessandra Sutti2, Thomas Dorin2, Murray Height2,Paul Sandars3, Svetha Venkatesh1 1Centre for Pattern Recognition and Data Analytics Deakin University, Geelong, Australia [pratibha.vellanki, santu.rana, sunil.gupta, svetha.venkatesh@deakin.edu.au] 2Institute for Frontier Materials, GTP Research Deakin University, Geelong, Australia [d.rubindecelisleal, alessandra.sutti, thomas.dorin, murray.height@deakin.edu.au] 3Materials Science and Engineering, Michigan Technological University, USA [sanders@mtu.edu] Prevailing batch Bayesian optimisation methods allow all control variables to be freely altered at each iteration. Real-world experiments, however, often have physical limitations making it time-consuming to alter all settings for each recommendation in a batch. This gives rise to a unique problem in BO: in a recommended batch, a set of variables that are expensive to experimentally change need to be fixed, while the remaining control variables can be varied. We formulate this as a process-constrained batch Bayesian optimisation problem. We propose two algorithms, pc-BO(basic) and pc-BO(nested). pc-BO(basic) is simpler but lacks convergence guarantee. In contrast pc-BO(nested) is slightly more complex, but admits convergence analysis. We show that the regret of pc-BO(nested) is sublinear. We demonstrate the performance of both pc-BO(basic) and pc-BO(nested) by optimising benchmark test functions, tuning hyper-parameters of the SVM classifier, optimising the heat-treatment process for an Al-Sc alloy to achieve target hardness, and optimising the short polymer fibre production process. 1 Introduction Experimental optimisation is used to design almost all products and processes, scientific and industrial, around us. Experimental optimisation involves optimising input control variables in order to achieve a target output. Design of experiments (DOE) [16] is the conventional laboratory and industrial standard methodology used to efficiently plan experiments. The method is rigid - not adaptive based on the completed experiments so far. This is where Bayesian optimisation offers an effective alternative. Bayesian optimisation [13, 17] is a powerful probabilistic framework for efficient, global optimisation of expensive, black box functions. The field is undergoing a recent resurgence, spurred by new theory and problems and is impacting computer science broadly - tuning complex algorithms [3, 22, 18, 21], combinatorial optimisation [24, 12], reinforcement learning [4]. Usually, a prior belief in the form of Gaussian process is maintained over the possible set of objective functions and the posterior is the refined belief after updating the model with experimental data. The updated model is used to seek the most promising location of function extrema by using a variety of criteria, e.g. expected improvement (EI), and upper confidence bound (UCB). The maximiser of such a criteria function is then recommended for the function evaluation. Iteratively the model is updated and recommendations are made till the target outcome is achieved. When concurrent function evaluations are possible, Bayesian optimisation returns multiple suggestions, and this is termed as the batch 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA. Temperature (T) t1 t2 Time (t) t3 t4 (a) Heat treatment for Al-Sc - temperature time profile Coagulant flow (𝑣𝑣𝑐𝑐) Polymer flow (𝑣𝑣𝑝𝑝) Constriction angle (𝛼𝛼) Channel width(ℎ) Device position(𝑑𝑑) Short Nano-fibers (b) Experimental setup for short polymer fibre production. Figure 1: Examples of real-world applications requiring process constraints. setting. Bayesian optimisation with batch setting has been investigated by [10, 5, 6, 9, 1] wherein different strategies are used to recommend multiple settings at each iteration. In all these methods, all the control variables are free to be altered at each iteration. However, in some situations needing to change all the variables for a single batch may not be efficient and this leads to the motivation of our process-constrained Bayesian optimisation. This work has been directly influenced from the way experiments are conducted in many real-world scenarios with a typical limitation on resources. For example, in our work with metallurgists, we were given a task to find the optimal heat-treatment schedule of an alloy which maximises the strength. Heat-treatment involves taking the alloy through a series of exposures to different temperatures for a variable amount of durations as shown in Figure 1a. Typically, a heat treatment schedule can last for multiple days, so doing one experiment at a time is not efficient. Fortunately, a furnace is big enough to hold multiple samples at the same time. If we have to perform multiple experiments in one batch yet using only one furnace, then we must design our Bayesian optimisation recommendations in such a way that the temperatures across a batch remain the same, whilst still allowing the durations to vary. Samples would be put in the same oven, but would be taken out after different elapsed time for each step of the heat treatment. Similar examples abound in other domains of process and product design. For short polymer fibre production a polymer is injected axially within another flow of a solvent in a particular geometric manifold [20]. A representation of the experimental setup marked with the parameters involved is shown in Figure 1b. When optimising for the yield it is generally easy to change the flow parameters (pump speed setting) than changing the device geometry (opening up the enclosure and modifying the physical configuration). Hence in this case as well, it is beneficial to recommend a batch of suggested experiments at a fixed geometry but allowing flow parameters to vary. Many such examples where the batch recommendations are constrained by the processes involved have been encountered by the authors in realising the potential of Bayesian optimisation for real-world applications. To construct a more familiar application we use the hyper-parameter tuning problem for Support Vector Machines (SVM). When we use parallel tuning using batch Bayesian optimisation, it may be useful if all the parallel training runs finished at the same time. This would require fixing the cost parameter, while allowing the the other hyper-parameters to vary. Whist this may or may not be a real concern depending on the use cases, we use it here as a case study. We formulate this unique problem as process-constrained batch Bayesian optimisation. The recommendation schedule needs to constrain a set of variables corresponding to control variables that are experimentally expensive (time, cost, difficulty) to change (constrained set) and varies all the remaining control variables (unconstrained set). Our approach involves incorporating constraints on stipulated control parameters and allowing the others to change in an unconstrained manner. The mathematical formulation of our optimisation problem is as follows. x = argmaxx X f(x) and we want a batch Bayesian optimisation sequence {{xt,0, xt,1, ..., xt,K 1}}T t=1 such that t and xt,k = [xuc t,kxc t,k], xc t,k = xc t,k k, k [0, ..., K 1] Where xc t,k is the kth constrained variable in tth batch and similarly xuc t,k is the kth unconstrained variable in the tth batch. T is the total number of iterations and K is the batch-size. We propose two approaches to the solve this problem: basic process-constrained Bayesian optimisation (pc-BO(basic)) and nested process-constrained batch Bayesian optimisation (pc-BO(nested)). pc-BO(basic) is an intuitive modification motivated by the work of [5] and pc-BO(nested) is based on a nested Bayesian optimisation method we will describe in section 3. We formulate the algorithms pc-BO(basic) and pc-BO(nested), and for pc-BO(nested) we present the theoretic analysis to show that the average regret vanishes superlinearly with iterations. We demonstrate the performance of pc-BO(basic) and pc-BO(nested) on both benchmark test functions and real world problems that involve hyper-parameter tuning for SVM classification for two datasets: breast cancer and biodegradable waste, the industrial problem of heat treatment process for an Aluminium-Scandium (Al-Sc) alloy, and another industrial problem of short polymer fibre production process. 2 Related background 2.1 Bayesian optimisation Bayesian optimisation is a sequential method of global optimisation of an expensive and unknown black-box function f whose domain is X, to find its maxima x = argmax x X f(x) (or minima). It is especially powerful when the function is expensive to evaluate and it does not have a closed-form expression, but it is possible to generate noisy observations from experiments. The Gaussian process (GP) is commonly used as a flexible way to place a prior over the unknown function [14]. It is are completely described by the mean function m(x) and the covariance function k(x, x ) and they imply our belief and uncertainties about the objective function. Noisy observations from the experiments are sequentially appended into the model, that in turn updates our belief about the objective function. The acquisition function is a surrogate utility function that takes a known tractable closed form and allows us to choose the next query point. It is maximised in the place of the unknown objective function and constructed such that it balances between exploring regions of high value (mean) and exploiting regions of high uncertainties (variances) across the objective function. Gaussian process based Upper Confidence Bound (GP-UCB) proposed by [19] is one of the acquisition functions which is shown to achieve sublinear growth in cumulative regret. It is define at tthiteration as αt GP UCB(x) = µt 1(x) + p βtσt 1(x) (1) where, v = 1 and βt = 2log(td/2+2π2/3δ) is the confidence parameter, wherein t denotes the iteration number, d represents the dimensionality of the data and δ (0, 1). We are motivated by GP-UCB based methods. Although our approach can be intuitively extended to other acquisition function, we do not explore this in the current work. 2.2 Batch Bayesian optimisation methods The GP exhibits an interesting characteristic that its predictive variance is dependent on only the input attributes while updating its mean requires knowledge about the outcome of the experiment. This leads us to a direction of strategies for multiple recommendations. There are several batch Bayesian optimisation algorithms for an unconstrained case. GP-BUCB by [6] recommends multiple batch points using the UCB strategy and the aforementioned characteristic. To fill up a batch, it updates the variances with the available attribute information and appends the outcomes temporarily by substituting them with most recently computed posterior mean. A similar strategy is used in the GP-UCB-PE by [5] that optimises the unknown function by incorporating some batch elements where uncertainty is high. GP-UCB-PE computes the first batch element by using the UCB strategy and recommends the rest of the points by relying on only the predictive variance, and not the mean. It has been shown that for these GP-UCB based algorithms the regret can be bounded tighter than the single recommendation methods. To the best of our knowledge these existing batch Bayesian optimisation techniques do not address the process-constrained problem presented in this work. The algorithms proposed in this paper are inspired by the previous approaches but address it in context of a process-constrained setting. 2.3 Constrained-batch vs. constrained-space optimisation We refer to the parameters that are not allowed to change (eg. temperatures for heat treatment, or device geometry for fibre production) as constrained set and the other parameters (heat treatment durations or flow parameters) as unconstrained set. We emphasise that our usage of constraint differs from the problem settings presented in literature, for example in [2, 11, 7, 8], where the parameters values are constrained or the function evaluations are constrained by inequalities. In the problem setting that we present, all the parameters exist in unconstrained space; for each individual batch, the constrained variables should have the same value. 3 Proposed method We recall the maximisation problem from Section 1 as x = argmaxx X f(x). In our case X = X uc X c, where X c is the constrained subspace and X uc is the unconstrained subspace. Algorithm 1 pc-BO(basic): Basic process-constrained pure exploration batch Bayesian optimisation algorithm. while (t < Max Iter) xt,0 = xuc t,0xc t,0 = argmaxx X αGP UCB (xt,0 | D) for k = 1, .., K 1 xuc t,k = argmax xuc X ucσ xuc t,k | D, xc t,0, xuc t,k k