# generative_pretraining_for_blackbox_optimization__acacddb1.pdf Generative Pretraining for Black-box Optimization Satvik Mashkaria * 1 Siddarth Krishnamoorthy * 1 Aditya Grover 1 Many problems in science and engineering involve optimizing an expensive black-box function over a high-dimensional space. In the offline black-box optimization (BBO) setting, we assume access to a fixed, offline dataset for pretraining and a small budget for online function evaluations. Prior approaches seek to utilize the offline data to approximate the function or its inverse but are not sufficiently accurate far from the data distribution. We propose BONET, a generative framework for pretraining a novel modelbased optimizer using offline datasets. In BONET, we train an autoregressive model on fixed-length trajectories derived from an offline dataset. We design a sampling strategy to synthesize trajectories from offline data using a simple heuristic of rolling out monotonic transitions from lowfidelity to high-fidelity samples. Empirically, we instantiate BONET using a causally masked Transformer (Radford et al., 2019) and evaluate it on Design-Bench (Trabucco et al., 2022), where we rank the best on average, outperforming stateof-the-art baselines. 1. Introduction Many fundamental problems in science and engineering, ranging from the discovery of drugs and materials to the design and manufacturing of hardware technology, require optimizing an expensive black-box function in a large search space (Larson et al., 2019; Shahriari et al., 2016). The key challenge here is that evaluating and optimizing such a black-box function is typically expensive, as it often requires real-world experimentation and exploration of a highdimensional search space. Fortunately, for many such black-box optimization (BBO) *Equal contribution 1Department of Computer Science, University of California, Los Angeles, US. Correspondence to: Siddarth Krishnamoorthy . Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright 2023 by the author(s). problems, we often have access to an offline dataset of function evaluations. Such an offline dataset can greatly reduce the budget for online function evaluation. This introduces us to the setting of offline black-box optimization (BBO) a.ka. model-based optimization (MBO). A key difference exists between the offline BBO setting and its online counterpart; in offline BBO, we are not allowed to actively query the black-box function during optimization, unlike in online BBO where most approaches (Snoek et al., 2012; Shahriari et al., 2016) utilize iterative online solving. One natural approach for offline BBO would be to train a surrogate (forward) model that approximates the black-box function using the offline data. Once learned, we can perform gradient ascent on the input space to find the optimal point. Unfortunately, this method does not perform well in practice because the forward model can incorrectly give sub-optimal and out-of-domain points a high score (see Figure 1a). To mitigate this issue, COMs (Trabucco et al., 2021) learns a forward mapping that penalizes high scores on points outside the dataset, but this can have the opposite effect of not being able to explore high-fidelity points that are far from the dataset. Further, another class of recent approaches (Kumar & Levine, 2020; Brookes et al., 2019; Fannjiang & Listgarten, 2020) propose a conditional generative approach that learns an inverse mapping from function values to the points. For effective generalization, such a mapping needs to be highly multimodal for high-dimensional functions, which in itself presents a challenge for current approaches. We propose Black-box Optimization Networks (BONET), a new generative framework for pretraining model-based optimizers on offline datasets. Instead of approximating the surrogate function (or its inverse), we seek to approximate the dynamics of online black-box optimizers using an autoregressive sequence model. Naively, this would require access to several trajectory runs of different black-box optimizers, which is expensive or even impossible in many cases. Our key observation is that we can synthesize synthetic trajectories comprised of offline points that mimic empirical characteristics of online BBO algorithms, such as Bayes Opt. While one could design many characteristic properties, we build off an empirical observation related to the function values of the proposed points. In particular, averaged over multiple runs, online black-box optimizers (e.g., Bayes Opt) tend to show improvements in the function Generative Pretraining for Black-Box Optimization Figure 1: (a) Example of offline BBO on toy 1D problem. Here, the observed domain ends at the red dashed line. Thus, the correct optimal value is x , whereas gradient ascent on the fitted function can extrapolate incorrectly to output the point x. (b) Example trajectory on the 2D-Branin function. The dotted lines denote the trajectories in our offline dataset, and the solid line refers to our model trajectory, with low-quality blue points and high-quality red points. (c) Function values of trajectories generated by a simple gaussian process (GP) based Bayes Opt model on several synthetic functions. values of the proposed points (Bijl et al., 2016), as shown in Figure 1c. While not exact, we build on this observation to develop a sorting heuristic that constructs synthetic trajectories consisting of offline points ordered monotonically based on their ascending function values. Even though such a heuristic does not apply uniformly for the trajectory runs of all combinations of black-box optimizers and functions, we show that it is simple, scalable, and effective in practice. Further, we augment every offline point in our trajectories with a regret budget, defined as the cumulative regret of the trajectory starting at the current point until the end of the trajectory. We train BONET to generate trajectories conditioned on the regret budget of the first point of the trajectory. Thus, at test time, we can generate good candidate points by rolling out a trajectory with a low regret budget. Figure 1b shows an illustration. We evaluate our method on several real-world tasks in the Design-Bench (Trabucco et al., 2022) dataset. These tasks are based on real-world problems such as robot morphology optimization, DNA sequence optimization, and optimizing superconducting temperature of materials, all of which requires searching over a high-dimensional search space. We achieve a normalized mean score of 0.772 and an average rank of 2.4 across all tasks, outperforming the next best baseline, which achieves a rank of 3.7. 2.1. Problem Statement Let f : X R be a black-box function, where X Rd is an arbitrary d-dimensional domain. In black-box optimization (BBO), we are interested in finding the point x that maximizes f: x arg max x X f(x) (1) Typically, f is expensive to evaluate and we do not assume direct access to it during training. Instead, we have access to an offline dataset of N previous function evaluations D = {(x1, y1), , (x N, y N)}, where yi = f(xi). For evaluating a black-box optimizer post-training, we allow it to query the black-box function f for a small budget of Q queries and output the point with the best function value obtained. This protocol follows prior works in offline BBO (Trabucco et al., 2021; 2022; Kumar & Levine, 2020; Brookes et al., 2019; Fannjiang & Listgarten, 2020). Overview of BONET We illustrate our proposed framework for offline BBO in Figure 2 and Algorithm 1 . BONET consists of 3 sequential phases: trajectory construction, autoregressive modelling, roll-out evaluation. In Phase 1 (Section 2.2), we transform the offline dataset D into a trajectory dataset Dtraj. This is followed by Phase 2 (Section 2.3), where we train an autoregressive model on Dtraj. Finally, we evaluate the model by rolling out Q candidate points in Phase 3 (Section 2.4). 2.2. Phase 1: Constructing Trajectories Our key motivation in BONET is to train a model to mimic the sequential behavior of online black-box optimizers. However, the difficulty is that we do not have the ability to generate trajectories by actively querying the black-box function during training. In BONET, we overcome this difficulty by synthesizing trajectories purely from an offline dataset based on two guiding desiderata. First, the procedure for synthesizing trajectories should efficiently scale to high-dimensional data points and large offline datasets. Second, each trajectory should mimic char- Generative Pretraining for Black-Box Optimization Trajectory dataset Offline dataset R1 x1 R2 x2 RT x T ˆx1 ˆx2 ˆx T Autoregressive Transformer Training (Phase 2) budget point R1 x1 RP x P ˆR ˆx P +1 ˆR ˆx T prefix predictions SORT-SAMPLE (Phase 1) Evaluation (Phase 3) Figure 2: Schematic for BONET. In Phase 1, we construct a trajectory dataset Dtraj using SORT-SAMPLE. In Phase 2, we learn an autoregressive model for Dtraj. In Phase 3, we condition the model on an offline prefix sequence and unroll it further to obtain candidate proposals ˆx P +1:T . Algorithm 1 Black-box Optimization Networks (BONET) Input Offline dataset D, Evaluation Regret Budget ˆR, Prefix length P, Query budget Q, Trajectory length T, num trajs, Smoothing parameter K, Temperature τ, Number of bins NB Output A set of proposed candidate points X with the constraint |X| Q 1: Phase 1: SORT-SAMPLE 2: Construct bins {B1, , BNB} from D, each bin covering equal y-range, as described in 2.2 3: Calculate the scores (n1, n2, , n NB) for each bin using K and τ 4: Dtraj ϕ 5: for i = 1, , num trajs do 6: Uniformly randomly sample ni points from Bi and concatenate them to construct T 7: Sort T in the ascending order of the function value 8: Represent T as (R1, x1, R2, x2, , RT , x T ), following equation 3, and append T to Dtraj 9: end for 10: Phase 2: Training 11: Train the model gθ to maximize the log-likelihood of Dtraj using the loss in equation 4 12: Phase 3: Evaluation 13: Construct a trajectory T from D following Phase 1, and delete the last T P points 14: Calculate Rt and feed (Rt, xt) to gθ sequentially, t = 1, , P 15: Roll-out gθ autoregressively while feeding Rt = ˆR, t = P + 1, , T 16: X is the set of last min(Q, T P) rolled-out points. acteristic behaviors commonly seen in online black-box optimizers. We identify one such characteristic of interest. In particular, we note that the moving average of function values of points proposed by such black-box optimizers tends to improve over the course of their runs barring local perturbations (e.g., due to exploration). While exceptions can and do exist, this phenomena is commonly observed in practice for optimizers such as Bayes Opt (Bijl et al., 2016). We also illustrate this behavior in Figure 1c for some commonly used test functions optimized via Bayes Opt. Sorted Trajectories We propose to satisfy the above desiderata in BONET by following a sorting heuristic. Specifically, given a set of T offline points, we construct a trajectory of length T by simply sorting the points in ascending order from low to high function values. We note that sorting is just a heuristic we use for constructing synthetic trajectories from the offline dataset, and this behavior may not be followed by any general optimizer over any arbitrary functions. We also perform ablations on different heuristics in Appendix C.1. Further, we note that sorting does not provide any guidance on the rate or the relative spacing between the points i.e., how fast the function values increase. This rate is important for controlling the sample budget for black-box optimization. Next, we discuss a sampling strategy for explicitly controlling this rate. Sampling Strategies for Offline Points So far, we have proposed a simple heuristic for transitioning a set of offline points into a sorted trajectory. To obtain these offline trajectory points from the offline dataset, one default strategy is to sample uniformly at random T points from D and sort them. However, we found this strategy to not work well in Generative Pretraining for Black-Box Optimization Figure 3: Plots showing the distribution of function values in the offline dataset D (left) and the trajectories in Dtraj (right) for the Ant Morphology benchmark (Trabucco et al., 2022). Notice how the overall density of points with high function values is up-weighted post our re-weighting. practice. Intuitively, we might expect a large volume of the search space to consist of points with low-function values. Thus, if our offline dataset is uniformly distributed across the domain, the probability of getting a high quality point will be very low with a uniform sampling strategy. To counter this challenge, we propose a 2 step sampling strategy based on binning followed by importance reweighting. Our formulation is motivated by a similar strategy proposed by Kumar & Levine (2020) for loss reweighting. First, we use the function values to partition the offline dataset D into NB bins of equal-width, i.e., each bin covers a range of equal length. Next, for each bin, we assign a different sampling probability, such that (a) bins where the average function value is high are more likely to be sampled, and (b) bins with more points are sampled more often. The former helps minimize the budget, whereas the latter ensures diversity in trajectories. Based on these two criteria, the score si for a bin Bi is given as: si = |Bi| |Bi| + K exp |ˆy ybi| where ˆy is the best function value in the dataset D, |Bi| refers to the number of points in the ith bin, and ybi is the midpoint of the interval corresponding to the bin Bi. Here, the first term |Bi| |Bi|+K allows us to assign a higher weight to the larger bins with smoothing. The second term gives higher weight to the good bins using an exponential weighting scheme. More details about K and τ can be found in Appendix B. Finally, we use these scores si to proportionally sample ni points from bin Bi where ni = T j si P for i {2, , NB} and n1 = T P i>1 ni, making the overall length of the trajectories equal to T. In Figure 3, we illustrate the shift in distribution of function values due to our sampling strategy. We refer to the combined strategy of sampling and then sorting as SORT-SAMPLE in Figure 2 and Algorithm 1. Augmenting Trajectories With Regret Budgets Our sorted trajectories heuristically reflect roll-outs of implicit blackbox optimizers. However they do not provide us with information on the rate at which a trajectory approaches the optimal value. A natural choice for such a knob would be cumulative regret. Moreover, as we shall show later, cumulative regret provides BONET a simple and effective knob to generalize outside the offline dataset during evaluation. Hence, we propose to augment each data point xi in our trajectory with a Regret Budget (RB). The RB Ri at timestep i is defined as the cumulative regret of the trajectory, starting at point xi: Ri = PT j=i(f(x ) f(xj)). Intuitively, a high (low) value for Ri is likely to result in a high (low) budget for the model to explore diverse points. Note, we are only assuming knowledge of an estimate for f(x ) (and not x ). Thus, each trajectory in our desired set Dtraj can be represented as: T = (R1, x1, R2, x2, , RT , x T ) (3) We will refer to R1 as Initial Regret Budget (IRB) henceforth. This will be of significance for evaluating our model in Phase 3 (Section 2.4), as we can specify a low IRB to induce the model to output points close to the optima. 2.3. Phase 2: Training the Model Given our trajectory dataset, we design our BBO agent as a conditional autoregressive model and train it to maximize the likelihood of trajectories in Dtraj. More formally, we denote our model parameterized by θ as gθ(xt|x