# automated_dynamic_algorithm_configuration__452f2645.pdf Journal of Artificial Intelligence Research 75 (2022) 1633-1699 Submitted 05/22; published 12/22 Automated Dynamic Algorithm Configuration Steven Adriaensen adriaens@cs.uni-freiburg.de Andr e Biedenkapp biedenka@cs.uni-freiburg.de Gresa Shala shalag@cs.uni-freiburg.de Noor Awad awad@cs.uni-freiburg.de University of Freiburg, Machine Learning Lab Theresa Eimer t.eimer@ai.uni-hannover.de Marius Lindauer m.lindauer@ai.uni-hannover.de Leibniz University Hannover, Institute of Artificial Intelligence Frank Hutter fh@cs.uni-freiburg.de University of Freiburg, Machine Learning Lab & Bosch Center for Artificial Intelligence The performance of an algorithm often critically depends on its parameter configuration. While a variety of automated algorithm configuration methods have been proposed to relieve users from the tedious and error-prone task of manually tuning parameters, there is still a lot of untapped potential as the learned configuration is static, i.e., parameter settings remain fixed throughout the run. However, it has been shown that some algorithm parameters are best adjusted dynamically during execution. Thus far, this is most commonly achieved through hand-crafted heuristics. A promising recent alternative is to automatically learn such dynamic parameter adaptation policies from data. In this article, we give the first comprehensive account of this new field of automated dynamic algorithm configuration (DAC), present a series of recent advances, and provide a solid foundation for future research in this field. Specifically, we (i) situate DAC in the broader historical context of AI research; (ii) formalize DAC as a computational problem; (iii) identify the methods used in prior art to tackle this problem; and (iv) conduct empirical case studies for using DAC in evolutionary optimization, AI planning, and machine learning. 1. Introduction Designing robust, state-of-the-art algorithms requires careful design of multiple components. It is infeasible to know how these components will interact for all possible applications. This is particularly true in the field of artificial intelligence (AI), pursuing ever more general problem-solving methods. This generality necessarily comes at the cost of an increased uncertainty about the problem instances the algorithm will have to solve in practice. To account for this uncertainty, it is common practice to expose difficult design choices as parameters of the algorithm, allowing users to customize them to their specific use case. These algorithm parameters can be numerical (e.g., crossover rate or population size in evolutionary algorithms, and the learning rate or batch size in deep learning), but also categorical (e.g., the choice of optimizer in deep learning, or the choice of heuristic or search operator in classical planning and meta-heuristics). It is widely recognized that appropriate parameter settings are often instrumental for AI algorithms to reach a desired performance level (Birattari et al., 2002; Hutter et al., 2010; 2022 AI Access Foundation. All rights reserved. Adriaensen, Biedenkapp, Shala, Awad, Eimer, Lindauer, & Hutter Probst et al., 2019). In this paper, we will use the term algorithm configuration (AC) to refer to the process of determining a policy for setting algorithm parameters as to maximize performance across a problem instance distribution. AC has been widely studied, both in general (Birattari et al., 2002; Hutter et al., 2009; Ans otegui et al., 2009; Hutter et al., 2011; L opez-Ib a nez et al., 2016), as well as in specific research communities (Lobo et al., 2007; Snoek et al., 2012; Feurer & Hutter, 2019). In this work, we focus on a particular kind of AC that is both (i) automated and (ii) dynamic. This general framework was recently proposed in a conference paper by Biedenkapp et al. (2020), and in this article we provide the first comprehensive treatment of the topic. Dynamic vs. Static AC: In static AC, parameter settings are fixed prior to execution, using the information available at that time, and remain invariant during execution. For example, in evolutionary optimization, the population size is commonly set statically, e.g., as a function of the input dimensionality. In contrast, in dynamic AC (DAC), parameter settings are varied during execution using information that becomes available at run time. For example, in machine learning, while static AC would choose a learning rate, possibly dependent on meta-data (e.g., size or modality of the dataset), DAC would propose a learning rate schedule that could additionally be a function of time, alignment of past gradients, training/validation losses, etc. While not all parameters can be varied dynamically, in practice many can, and it often makes sense to do so. As a general motivating use case, consider parameters that (indirectly) control the exploration/exploitation trade-off: Often, it makes sense to explore more early on, and to exploit this knowledge in later stages. Even if the optimal configuration happens to be static, predicting it upfront may be very hard, yet the best static configuration may quickly become apparent while solving the problem. For instance, if our learning rate is too high, training loss may diverge (Bengio, 2012). Another use case arises in the context of anytime algorithms, that aim to return an as good as possible solution, under an unknown termination criterion (Aine et al., 2009; L opez-Ib anez & St utzle, 2014). Here, early exploitation to quickly find a good solution, and continual, delayed exploration is often desirable. DAC has been an active research area that has produced various highly practical algorithms leveraging dynamic parameter adaptation mechanisms to empirically outperform their static counterparts, e.g., Reactive Tabu Search (Battiti & Tecchiolli, 1994), CMA-ES (Hansen et al., 2003), and Adam (Kingma & Ba, 2015). Beyond these empirical successes and dedicated case studies (e.g., Senior et al., 2013; van Rijn et al., 2018), the potential of DAC has also been shown theoretically (Moulines & Bach, 2011; Warwicker, 2019; Doerr & Doerr, 2020; Speck et al., 2021). Automated vs. Manual AC: The difference between manual and automated AC is who performs AC: A human or a machine. Over the last two decades, a variety of generalpurpose automated algorithm configurators have been proposed that effectively relieve users from the tedious and time-consuming task of optimizing parameter settings manually (Hutter et al., 2009; Ans otegui et al., 2009; Kadioglu et al., 2010; Xu et al., 2010; Hutter et al., 2011; Seipp et al., 2015; L opez-Ib a nez et al., 2016; Falkner et al., 2018; Pushak & Hoos, 2020). However, there is still a lot of untapped potential, as all of these tools perform static AC. In contrast, dynamic AC is mostly done manually. Clearly, the human does not directly adjust the parameters during execution; rather, the mechanisms doing this automatically, e.g., learning rate schedules, are products of human engineering. In this work, we Automated Dynamic Algorithm Configuration will consider deriving such dynamic configuration policies in an automated and data-driven fashion. Summary of Contributions: In this article, we provide the first comprehensive account of automated DAC. It subsumes and extends four prior conference papers, in which we 1. established DAC as a new meta-algorithmic framework and proposed solving it using contextual reinforcement learning (Biedenkapp et al., 2020); 2. applied DAC to evolutionary optimization, tackling the problem of step-size adaptation in CMA-ES (Hansen et al., 2003), and showed that existing manually-designed heuristics can be used to guide learning of DAC policies (Shala et al., 2020); 3. applied DAC to AI planning, tackling the problem of heuristic selection in Fast Downward (Helmert, 2006), and showed how DAC subsumes static algorithm configuration and can improve upon the best possible algorithm selector (Speck et al., 2021); and 4. presented DACBench, the first benchmark library for DAC, facilitating reproducible results through a unified interface (Eimer et al., 2021b). Here, we go well beyond this previous work, by i more thoroughly discussing and classifying related work in different areas (Section 2), placing recent work on automated DAC in its scientific and historical context; ii establishing a formal problem formulation (Section 3), offering a novel theoretical perspective on DAC and its relation to existing computational problems; iii discussing possible methods for solving DAC problems (Section 4), beyond reinforcement learning, and classifying previous work according to their methodology; iv extending and using DACBench (Section 5) to perform empirical case studies that - demonstrate recent successes of automated DAC - provide empirical validation for the benchmark library, and - show that DAC presents a practical alternative to static AC, in various areas of AI: evolutionary optimization (Section 6.1), AI planning (Section 6.2), and machine learning (Section 6.3); and v discussing current limitations of DAC (Section 7). As such, we provide the first comprehensive overview of automated DAC, a standard reference and a solid foundation for future research in this area. 2. Related Work Automated DAC is a recent and underexplored research area. However, it did not arise out of thin air, rather it closely relates to, builds on, and tries to consolidate past research efforts. In this section, we place recent work on automated DAC in its scientific and historical context. We start by introducing the terminology we use (Section 2.1). Then, we situate DAC in the broader context of AI (Section 2.2). Finally, we discuss some specific prior art, covering historical work as well as the most recent developments (Section 2.3). Adriaensen, Biedenkapp, Shala, Awad, Eimer, Lindauer, & Hutter 2.1 Terminology Algorithm parameters are omnipresent in computer science. Unsurprisingly, no single set of terms has been consistently used when discussing the problem of how to best set them. In this section, we briefly clarify some of the terms we use, relating them to known alternatives. We use the term algorithm configuration (AC) to refer to the process of determining a policy for setting an algorithm s parameters as to maximize performance (or equivalently, minimize cost) across an input distribution. In the classical AC literature (Birattari et al., 2002; Hutter et al., 2009; Ans otegui et al., 2009), this process results in a single parameter setting (i.e., a complete assignment of values to parameters) and is called a configuration. Later work generalized AC to produce configurations that are a function of the context in which they are used, e.g., the problem instance at hand (Kadioglu et al., 2010; Xu et al., 2010), and most recently the dynamic execution state (Biedenkapp et al., 2020). We will use the term configuration policy to refer to the result of AC in general. To disambiguate the aforementioned AC variants, we add the prefixes per-distribution (or also classical), per-instance and dynamic, respectively. Finally, while AC terminology was introduced in the context of attempts to automate this process, the term itself does not imply automation, i.e., we add prefixes automated and manual to specify whether configuration policies are determined automatically or through a manual engineering process, respectively. In this work, we follow a meta-algorithmic approach to automating AC: We will treat AC as a computational problem to be solved by executing an algorithm. Hence, we have problem instances and algorithms at two different levels and will use the prefixes (D)AC and target to disambiguate these: For example, research on automated DAC aims to find a DAC algorithm for tackling the general DAC problem. In a given DAC problem instance, we aim to find a policy for configuring the parameters of a given target algorithm as to optimize its performance across a distribution of target problem instances. We also use DAC method and DAC scenario as a synonym for DAC algorithm and DAC problem instance, respectively. In machine learning, the problem of setting the hyperparameters of the learning pipeline is known as hyperparameter optimization (HPO, Feurer & Hutter, 2019). We consider the more general problem of setting the parameters of any target algorithm and therefore adopt a more general terminology (Eggensperger et al., 2018). In meta-learning terms, AC problem solving corresponds to the outer-loop and target problem solving to the inner-loop. More generally, assuming the target is optimization, AC can be seen as a bilevel optimization problem (Bard, 2013), where the target algorithm optimizes the inner objective and the AC algorithm the outer objective. When a parameter maps directly onto the use of a certain sub-procedure, AC has also been called operator selection. In metaheuristics (Glover & Kochenberger, 2006), these sub-procedures are commonly called heuristics and in hyper-heuristic literature (Pillay & Qu, 2018) the procedure selecting the (low-level) heuristic called a selection hyperheuristic (Drake et al., 2020) corresponding to the configuration policy in our terminology. In heuristic optimization, the terms parameter tuning and parameter control are commonly used to refer to static and dynamic algorithm configuration, respectively (Lobo et al., 2007). Also, the terms online (during use) and offline (before use) are sometimes used as synonyms for dynamic and static, respectively. In this work, we refrain from doing so, reserving these terms to refer to when (D)AC takes place (see Figure 1). In the offline setting, Automated Dynamic Algorithm Configuration AC takes place in a dedicated configuration phase (similar to training in machine learning) where we determine which configuration to use later to solve the problems of actual interest to the user (i.e., at use time). In the online setting, AC happens at use time (Fitzgerald, 2021). In that sense, offline and dynamic are not mutually exclusive. In fact, most prior art does DAC offline, determining a dynamic policy offline by using a training set, and at use time simply executing that dynamic policy on new problem instances. Figure 1: Offline vs. online learning of DAC policies. 2.2 Related Research Areas While automating DAC is a relatively understudied problem, much research has been performed studying related problems. In what follows, we briefly characterize this work and how it relates to automating DAC. See Appendix B for a more formal treatment of this topic, where we provide problem definitions, possible reductions, and prove their correctness. 2.2.1 Automated Design of Algorithms / Components The idea of letting computers, rather than humans, design algorithms has been studied in many different communities, using a variety of different methods. Some well-known, historical examples are program synthesis, using logical inference (Manna & Waldinger, 1980), and genetic programming, using evolutionary algorithms (Koza, 1992). Recent advances in machine learning have prompted a surge in approaches that learn algorithms, e.g., Neural Turing machines (Graves et al., 2014), learning-to-learn (L2L , Andrychowicz et al., 2016; Lv et al., 2017; Bello et al., 2017; Metz et al., 2020), and learning-to-optimize (L2O, Li & Malik, 2017; Kool et al., 2018; Chen et al., 2021). Generally speaking, algorithm parameters can be seen as algorithmic design choices that are left open at design time. In that sense, automated configuration is naturally viewed as a way of automating part of the algorithm design process. This approach has been referred to as programming by optimization (Pb O, Hoos, 2012). While previous Pb O applications used static AC approaches, the original Pb O philosophy envisioned the possibility of varying design decisions at runtime, something naturally achieved by DAC.1 1. While we are not aware of any work under the Pb O banner considering the dynamic setting, most prior art automating DAC (Section 2.3) could be viewed as Pb O when taking an algorithm design perspective. Adriaensen, Biedenkapp, Shala, Awad, Eimer, Lindauer, & Hutter A key difference between Pb O and the aforementioned design automation approaches is that in Pb O algorithms are not designed from scratch , instead only design choices that are difficult for the human designer are made automatically by the configurator. For instance, DAC aims to design learning rate schedules (e.g., Daniel et al., 2016), but not entire optimizers as in L2L/L2O. In summary, DAC can be viewed as automatically designing parameter controlling components, and DAC powered Pb O as a general semi-automated algorithm design approach that enables the human designer to bias the design process by embedding prior knowledge (e.g., obtained through decades of algorithmic research), thereby reducing the computational requirements and improving generalization. It is worth noting that other semi-automated algorithm design approaches exist. For instance, genetic programming is nowadays often used to design specific algorithm components, e.g., in the hyper-heuristic literature (Pillay & Qu, 2018), as a procedure for automatically designing heuristics, referred to as a generation hyper-heuristic (Burke et al., 2009). Finally, if the designed components are reusable and control parameters (e.g., are reusable selection hyper-heuristics as in Fontoura et al., 2017), we view this approach as automated DAC. 2.2.2 Meta-Algorithmic Frameworks Algorithm Selection Problems can be solved using a variety of different algorithms. For example, if we want to sort a sequence of numbers, we could do so using insertion sort, merge sort, quick sort, etc. In algorithm selection, we determine a mapping from features of the problem instance (e.g., sequence length) to the algorithm best suited to solve it (e.g., that sorts the sequence fastest). While first formalized by Rice (1976), this computational problem only received wide-spread attention two decades later, when it was independently rediscovered by Fink (1998), and Leyton-Brown et al. (2003) proposed to solve it using machine learning methods. This approach has resulted in various successful applications, e.g., SATzilla (Xu et al., 2008), a portfolio solver selecting between state-ofthe-art SAT solvers to win multiple (gold) medals at the 2007 and 2009 SAT competitions. We refer to Kotthoff(2014) and Kerschke et al. (2019) for surveys on this topic. Algorithm Scheduling It is often difficult to efficiently predict which algorithm will perform best on a given problem instance. In many settings, poor choices may require orders of magnitude longer than optimal choices, and tend to dominate average performance. In algorithm scheduling, instead of selecting a single algorithm, we aim to find an optimal time allocation. Automated algorithm scheduling was first extensively studied in seminal work by Huberman et al. (1997) and Gomes and Selman (2001), and follow-up work, e.g., by Hoos et al. (2015), typically focuses on finding a fixed time allocation that works best on average across instances (i.e., per-distribution). These kind of algorithm schedules are also very popular in the AI planning community, e.g., in Fast Downward Stonesoup (Helmert et al., 2011). Scheduling has also been combined with algorithm selection to find instance-specific schedules (Kadioglu et al., 2011; Lindauer et al., 2016). Dynamic scheduling approaches allocate resources to the algorithms based on runtime information (e.g., Carchrae & Beck, 2004; Gagliolo & Schmidhuber, 2006; Kadioglu et al., 2017; Nguyen et al., 2021). This allows them to exploit the fact that, while it may be difficult to predict which algorithm performs best in advance, their relative performance may become apparent early-on in their Automated Dynamic Algorithm Configuration executions. DAC also takes advantage of this property. However, unlike DAC, dynamic scheduling is restricted to allocating resources to independent processes; i.e., in scheduling, no information is exchanged between algorithm runs, and resources allocated to all but the one producing the eventual solution are effectively wasted. Algorithm Configuration While algorithm selection chooses between multiple target algorithms on a per-instance basis, classical per-distribution algorithm configuration (AC) is concerned with finding the parameter setting of a single algorithm that performs best across all given instances. As the space of possible configurations grows exponentially in terms of the number of parameters, research on AC has traditionally focused on (i) efficient search methods, e.g., local search (Hutter et al., 2009), genetic algorithms (Ans otegui et al., 2009) and Bayesian optimization (Hutter et al., 2011); and (ii) efficient evaluation of configurations, e.g., using racing (Birattari et al., 2002), adaptive capping (Hutter et al., 2009), structured procrastination (Kleinberg et al., 2017) and multi-fidelity optimization (Li et al., 2018). This line of work has resulted in a variety of automated tools known as configurators that for any given target algorithm quickly find a configuration that performs well on average across a set of target problem instances, e.g., Param ILS (Hutter et al., 2009), GGA (Ans otegui et al., 2009, 2015, 2021), SMAC (Hutter et al., 2011), i Race (L opez Ib a nez et al., 2016), and Golden Parameter Search (Pushak & Hoos, 2020); as well as various theoretical insights (Kleinberg et al., 2017; Weisz et al., 2019; Hall et al., 2019, 2020). Configuration has further been combined with algorithm selection (Kadioglu et al., 2010; Xu et al., 2010), and algorithm scheduling (Seipp et al., 2015). However, all of these consider determining a static configuration policy, and the pursuit of similar automated tools and theory for DAC is a natural extension of this line of work. That being said, other works have previously explored tackling the dynamic setting (see Section 2.3) some even using the aforementioned static AC tools (see Section 4.2). 2.2.3 Adaptive Operator Selection and Parameter Control Heuristic Approaches The potential of varying parameters during execution time is widely recognized and has been extensively studied in various areas of AI. For instance, in heuristic optimization, this problem has been studied in the context of parameter control for evolutionary algorithms (Aleti & Moser, 2016), reactive search (Battiti et al., 2008), and selection hyper-heuristics (Drake et al., 2020). In machine learning, one hyperparameter that is typically varied is the learning rate, e.g., using global learning rate schedules (Loshchilov & Hutter, 2017; Smith, 2017) or adaptive gradient methods (Kingma & Ba, 2015) adopting weight-specific step-sizes. These works typically consider the dynamic configuration policy as a given and present an empirical/theoretical analysis thereof. Furthermore, the policies themselves were designed by human experts. In contrast, automated DAC is concerned with finding such policies automatically in a data-driven fashion. That being said, prior art automating DAC does exist and is discussed in Section 2.3. Before doing so, we will briefly discuss a broad class of methods that rely less on human expert knowledge, but that we nonetheless do not generally regard as automated DAC. Online Learning Approaches Many parameter control mechanisms integrate complex feedback loops, learning and optimization mechanisms, creating the potential that the DAC policy is not entirely predetermined by the human, but is rather learned online, while solving Adriaensen, Biedenkapp, Shala, Awad, Eimer, Lindauer, & Hutter the problem instance at hand. All depends on the relative contribution to performance due to (i) the exploration of the hand-crafted DAC algorithm, and (ii) the exploitation of the DAC policy it learns. In an offline setting, distinguishing between (i) and (ii) is easy, as (i) does not occur at test/use time. In online settings, both are intertwined by nature. Note that this does not rule out online DAC , but rather necessitates dedicated analysis that learning indeed takes place. Furthermore, in Section 3.2, we will define DAC as the problem of finding dynamic configuration policies that generalize across a distribution of target problem instances . Therefore, in our nomenclature, in order to qualify as automated DAC, an approach must demonstrate the ability to successfully transfer experience across runs of the target algorithm on target problem instances drawn from the same distribution. In machine learning terms, automated DAC does not only require learning, but also metalearning. Please note that most previous online learning approaches to parameter control (e.g., M uller et al., 2002; Carchrae & Beck, 2004; Chen et al., 2005; Eiben et al., 2006; Prestwich, 2008; Sakurai et al., 2010; Wessing et al., 2011; Gaspero & Urli, 2012; Sabar et al., 2013; Schaul et al., 2013; Karafotias et al., 2014; Baydin et al., 2018) trivially do not meet this criterion, as no information is transferred across runs on different instances. Note that massive parallel online HPO methods such as Population Based Training (PBT, Jaderberg et al., 2017) also fall into this category. 2.3 Prior Art: Automated Dynamic Algorithm Configuration The term dynamic algorithm configuration (DAC) was only recently introduced by Biedenkapp et al. (2020). However, various authors had previously (or, in a few cases, concurrently) investigated the possibility of automatically determining policies for varying the configuration of an algorithm on-the-fly. In what follows, we give a brief overview of literature on automated DAC ( avant-la-lettre ).2 Here, we discuss these by application domain, a methodological overview is presented in Section 4. Pioneering work by Lagoudakis and Littman (2000, 2001) explored this setting in the context of recursive algorithm selection, observing that sub-problems are better solved using different algorithms (e.g., sorting sub-sequences using different sorting algorithms). While initial results were promising, their approach was limited to recursive target algorithms. Pettinger and Everson (2002) considered a more general setting, learning a policy jointly selecting mutation and crossover operators in a genetic algorithm, per generation, based on statistics of the current population.3 Various other works have explored automating DAC in the context of genetic algorithms (Fialho et al., 2010; Andersson et al., 2016), evolution strategies (L opez-Ib a nez et al., 2012; Sharma et al., 2019), and heuristic optimization in general (Battiti & Campigotto, 2012; L opez-Ib anez & St utzle, 2014; Ans otegui et al., 2017; Fontoura et al., 2017; Sae-Dan et al., 2020). Similar investigations were also conducted in various other communities, e.g., machine learning (Daniel et al., 2016; Hansen, 2016; Fu, 2016; Xu et al., 2019; Almeida et al., 2021), AI planning (Gomoluch et al., 2019, 2020), exact search (Bhatia et al., 2021), and quadratic programming (Getzelman & Balaprakash, 2021; Ichnowski et al., 2021). 2. Appendix A provides a more detailed description. We also maintain a list of work on automated DAC: https://www.automl.org/automated-algorithm-design/dac/literature-overview/ 3. Notably, direct follow-up works (Chen et al., 2005; Sakurai et al., 2010), no longer transferred experience across runs and is therefore not considered prior art automating DAC (see Section 2.2.3). Automated Dynamic Algorithm Configuration Biedenkapp et al. (2020) introduced DAC in an attempt to consolidate these isolated efforts and to raise the level of generality in pursuit of algorithms similar to those that exist for static AC. Direct follow-up work has provided additional evidence for the practicality of DAC by learning step-size adaptation in CMA-ES (Shala et al., 2020), and by learning to select heuristics in the Fast Downward planner (Speck et al., 2021). These application domains, together with the learning rate control setting from (Daniel et al., 2016), have later been released as part of a benchmark suite, called DACbench (Eimer et al., 2021b), offering a unified interface that facilitates comparisons between different DAC methods across different DAC scenarios. In this article, we extend this initial discussion of Biedenkapp et al. (2020) and present a thorough empirical comparison of AC and DAC on these three different realworld DAC applications (Daniel et al., 2016; Shala et al., 2020; Speck et al., 2021) using the unified DACbench interface. 3. Problem Definition In this section, we formalize the computational problem underlying DAC. Here, we first introduce formulations for static AC variants (Section 3.1), and then define the dynamic AC problem (Section 3.2). 3.1 Static Algorithm Configuration In algorithm configuration, we have some target algorithm A with parameters p1, p2, . . . , pk that we would like to configure, i.e., assign a value in the domains Θ1, Θ2, . . . , Θk, respectively. Furthermore, we may wish to exclude certain invalid combinations, giving rise to the space of candidate configurations Θ Θ1 Θ2 Θk, called the configuration space of A. In classical per-distribution algorithm configuration, we aim to determine a single θ Θ that minimizes a given cost metric c in expectation across instances i I of our target problem distribution D. This problem can be formalized as follows: Definition 1: Classical / Per-distribution Algorithm Configuration (AC) Given A, Θ, D, c : A target algorithm A with configuration space Θ A distribution D over target problem instances with domain I A cost metric c : Θ I R assessing the cost of using A with θ Θ on i I Find a θ arg minθ Θ Ei D [c(θ, i)]. In practice, A, D, and c are not given in closed form. Instead, c is typically a black-box procedure that executes A with configuration θ on a problem instance i and quantifies cost as a function of the desirability of this execution, e.g., how long the execution took, the quality of the solution it found, or a measure of anytime performance (L opez-Ib anez & St utzle, 2014). Note that D is our true target distribution, i.e., the likelihood A is presented with an instance i at use time. In the online setting, we are given a sequence of samples from the actual distribution in real time (Fitzgerald, 2021). In the offline setting, we typically Adriaensen, Biedenkapp, Shala, Awad, Eimer, Lindauer, & Hutter do not have access to i D, and are given a set of instances I sampled i.i.d. from some representative training distribution D D instead. Note that unless a single configuration is non-dominated on all instances, better performance may be achieved by making the choice of θ dependent on the problem instance i at hand. This extension is known as: Definition 2: Per-instance Algorithm Configuration (PIAC) Given A, Θ, D, Ψ, c : A target algorithm A with configuration space Θ A distribution D over target problem instances with domain I A space of per-instance configuration policies ψ Ψ with ψ : I Θ that choose a configuration θ Θ for each instance i I. A cost metric c : Ψ I R assessing the cost of using A with ψ Ψ on i I Find a ψ arg minψ Ψ Ei D [c(ψ, i)] Note that the definition above is highly general. For example, by specifying Ψ accordingly PIAC can put arbitrary hard constraints on the configuration policies of interest. As a consequence, classical per-distribution AC can be seen as a special case of PIAC only considering constant ψ, i.e., Ψ = {ψ | ψ(i) = ψ(i ), i, i I}. More generally, configuration policies could be restricted to be a function of specific features of i, or to belong to a specific (e.g., linear) function class. Note that this definition is also strictly more general than unconstrained PIAC, which is itself a special case. Also worth noting is that the cost metric c in this definition can be any function of ψ (and i), allowing PIAC to put arbitrary soft constraints on Ψ. In particular, we do not constrain c to be a function of ψ(i), but allow it to quantify non-functional aspects of ψ, e.g., its minimal description length or computational complexity. That being said, most practical PIAC approaches are limited to minimizing Ei D [c (ψ(i), i)], given some c : Θ I R ( pure functional PIAC). Algorithm 1 Stepwise execution of a dynamically configured target algorithm A Input: Dynamic configuration policy π Π; target problem instance i I Output: Solution for i found by A (following π) 1: procedure A(π, i) 2: s init(i) Initial state by starting the execution of A on i 3: while is final(s, i) do 4: θ π(s, i) Reconfiguration point: Use π to choose next θ 5: s step(s, i, θ) Continue executing A using θ 6: return s Execution terminated: Return solution Automated Dynamic Algorithm Configuration 3.2 Dynamic Algorithm Configuration In dynamic AC, we aim to optimally vary θ Θ while executing A. In order to formalize this problem, we need to define points of interaction where A can be reconfigured. To this end, we decompose the execution of A with dynamic configuration policy π Π on problem instance i I as shown in Algorithm 1. Here, we start executing an init sub-routine bringing A in some initial state s S only depending on i. Subsequently, we iteratively execute step to determine the next state s S of A as a function the current state s, instance i, and configuration π(s, i) Θ. This process continues until is final(s, i) signalling termination and s is returned as solution. When such decomposition init, step, is final is given, we will call A stepwise reconfigurable and define DAC as follows: Definition 3: Dynamic Algorithm Configuration (DAC) Given A, Θ, D, Π, c : A stepwise reconfigurable target algorithm A with configuration space Θ. A distribution D over target problem instances with domain I A space of dynamic configuration policies π Π with π : S I Θ that choose a configuration θ Θ for each instance i I and state s S of A A cost metric c : Π I R assessing the cost of using π Π on i I. Find a π arg minπ Π Ei D [c(π, i)] Here, we define DAC as a generalization of PIAC, considering configuration policies that do not only depend on i, but also the dynamically changing state s S of the target algorithm A, i.e., Ψ {π |π(i, s) = π(i, s ), s, s S, i I}. This dynamic state, by definition, provides all information required for continuing the execution of A, however can additionally contain arbitrary features of the execution thus far. As in PIAC, c can be an arbitrary function of π (and i). However, often the total cost of executing A with π on i can be decomposed and attributed to the T individual execution steps. Formally: In DAC with stepwise decomposable cost, we are given functions cinit, cstep , such that c(π, i) = cinit(i) + t=0 cstep(st, i, π(st, i)) ( init(i) t = 0 step(st 1, i, π(st 1, i)) t > 0 is final(st, i) t = T. Note that, by this definition, a stepwise decomposable cost metric can, but does not have to be dense, e.g., attributing the full cost (e.g., final solution quality) to the final step (zero otherwise) is a valid decomposition. However, since cinit and cstep only depend on i and π(st, i), a stepwise decomposable cost metric cannot measure non-functional aspects of π. Rather, it captures the notion of pure functional DAC and subsumes pure functional PIAC, but not PIAC in general. Adriaensen, Biedenkapp, Shala, Awad, Eimer, Lindauer, & Hutter 4. Solution Methods In this section, we discuss methods for solving DAC. As discussed in Section 2.2.3, DAC has so far been primarily solved manually, i.e., dynamic configuration policies have been determined by humans and not in an automatic and data-driven way. In Section 2.3, we discussed previous work exploring automated DAC, and in what follows we will give an overview of the methods they used for doing so. Please note that no dedicated, general DAC solvers exist to date. Instead, prior art can be viewed as solving DAC by reduction to some other well-studied computational problem.4 Considering the fact that most of this work has been performed in isolation and tackles very different DAC scenarios, the high-level solution approaches followed are remarkably similar. In particular, we will roughly distinguish between two approaches: DAC by reinforcement learning (Section 4.1) and DAC by optimization (Section 4.2), and discuss their relative strengths and weaknesses (Section 4.3). 4.1 DAC by Reinforcement Learning In reinforcement learning (RL, Sutton & Barto, 2018), an agent learns to optimize an unknown reward signal by means of interaction with an unknown environment. The RL agent takes actions a A, observes a transition T from the current state s S of the environment to T(s, a) S, receives a reward R(s, a) R, and learns for any state the action maximizing its expected future reward. Formally, the RL agent solves a Markov decision problem S, A, T, R (MDP, Definition 10 in Appendix B.3.5). Here, the transition T and reward R are given in the form of a black box method. Also, the state space S is typically not given explicitly; instead, we are given a procedure for generating initial states and can generate further states using T. The RL problem described above is closely related to DAC, and prior art has commonly solved DAC using reinforcement learning methods. In DAC by RL, the environment consists of the target algorithm A solving some target problem instance i I. The state of this environment is s = (s , i) S with s S the state of the algorithm, and initial states (init(i), i) with i D. At every reconfiguration point, the RL agent interacts with this algorithm choosing a configuration θ Θ as action. The transition dynamics of the environment are fully determined by stepwise algorithm execution, i.e., T((s , i), θ) = (step(s , i, θ), i), and the reward is R((s , i), θ) = cstep(s , i, θ). See Figure 2 for an illustration of this approach. The power of this reduction lies in the fact that the resulting MDP can be solved using the full gamut of existing reinforcement learning methods. Traditional RL Early DAC by RL work (e.g., Lagoudakis & Littman, 2000, 2001; Pettinger & Everson, 2002; Battiti & Campigotto, 2012) used traditional value-based RL methods that learn the optimal state-action value function Q (s, a) and return the policy π(s) arg maxa A Q (s, a). These methods work well when S A is small enough to be represented explicitly by a table, but do not scale up. Note that both S and A = Θ are typically too large in DAC to be modelled in tables. Modern RL Over the last decade, a series of methodological advances have given rise to a new generation of RL methods that can tackle complex real-world problems (Mnih 4. In Appendix B, we define these related computational problems and discuss reductions more formally. Automated Dynamic Algorithm Configuration Figure 2: Illustration of DAC by Reinforcement Learning (DAC components in blue) et al., 2015; Silver et al., 2016; Barozet et al., 2020; Lee et al., 2020), and that have also been successfully applied to DAC. In particular, modern RL methods based on deep neural networks can effectively learn useful representations that allow them to handle complex state and action spaces, using, e.g., (double) deep Q-learning (DDQN, Hansen, 2016; Sharma et al., 2019; Speck et al., 2021; Bhatia et al., 2021), modern actor critic (Ichnowski et al., 2021), and policy gradient methods (Daniel et al., 2016; Xu et al., 2019; Gomoluch et al., 2019; Shala et al., 2020; Getzelman & Balaprakash, 2021; Almeida et al., 2021). Contextual RL It is worth noting that standard RL methods are not instance-aware and will generally not choose their initial state (see Figure 2, where i is hidden inside the environment). This is one of the reasons Biedenkapp et al. (2020) proposed to model DAC as a contextual MDP (c MDP, Hallak et al., 2015), which consists of a collection of MDPs, one for each instance i (see Definition 11 in Appendix B.3.5). Each MDP M(i) shares a common action space S and state space A as in traditional RL, but possesses an instancespecific transition function Ti and reward function Ri. This more general formulation allows DAC practitioners to explicitly model variation between instances: Variations in transition dynamics model the differences in target algorithm behaviour between instances (i.e., how the target algorithm progresses in solving an instance) while different reward functions reflect the instance-specific objectives. Although a single MDP can capture these dependencies implicitly, the explicit model allows the contextual RL agent to directly exploit this knowledge. For example, instances may vary in difficulty. A contextual RL agent, being aware of different instances and their characteristics, can more easily learn this, allowing the agent to more accurately assign credit for high/low rewards to (i) following a good/poor Adriaensen, Biedenkapp, Shala, Awad, Eimer, Lindauer, & Hutter policy or (ii) solving easy/hard instances. Furthermore, the agent can choose which MDP M(i) it interacts with, e.g., to gather more experience on harder instances (Klink et al., 2020; Eimer et al., 2021a). 4.2 DAC by Optimization Not all prior art automating DAC has done so using reinforcement learning. Instead, some previous works can be viewed as reformulating DAC as a (non-sequential) optimization problem: Given a search space Π and an objective function f(π) = Ei D [c(π, i)], find π arg minπ Π f(π). This approach is illustrated in Figure 3. Optimization covers a wide variety of different methods. In what follows, we give an overview of those used in prior art for DAC by optimization , and distinguish between different variants of optimization depending on (i) search space representation, and (ii) what information about f is used. Figure 3: Illustration of DAC by Optimization (DAC components in blue) Noisy Black Box Optimization In black box optimization (BBO), the only interaction between f and the optimizer is through an evaluation procedure e that returns f(π) for any given π Π. A wide variety of black box optimizers exist, specialized for particular kinds of representations. In the reduction, dynamic configuration policies can be represented in a variety of different ways. For example, prior art (Gomoluch et al., 2020) represents policies as real-valued vectors that correspond to the weights of a neural network policy, and optimizes these using evolution strategies. It is worth noting that a similar approach is currently state-of-the-art in learning-to-learn (Metz et al., 2020) (see Section 2.2.1). However, one could go further and also vary the architecture and optimize directly in the space of neural networks, e.g., using methods from neuroevolution (Stanley & Miikkulainen, 2002; Stanley et al., 2021). Alternatively, one could follow a symbolic approach, like Fontoura et al. (2017), representing policies as programs and use genetic programming (Koza, 1992). Remark that this freedom comes with responsibility, i.e., making an appropriate choice of representation may be crucial to achieve satisfactory performance. Next to representation, another difficult choice in this reduction is the evaluation procedure. Since D is unknown, e cannot evaluate f exactly in general. Instead, we typically evaluate the cost on some finite sample of target problem instances I I with i I : i D, and e(π) = 1 |I | P i I c(π, i). However, the choice of |I | still poses a trade-offbetween accuracy and cost of evaluation to the DAC by BBO practitioner. Static Algorithm Configuration We can also solve DAC using classical static algorithm configurators (e.g., SMAC and irace). Assuming we choose a parametric representation Λ for the policy space, i.e., Π = {πλ | λ Λ}, the DAC problem can be reformulated as Automated Dynamic Algorithm Configuration classical AC, where we configure the parameters λ of the dynamic configuration policy πλ, instead of configuring the parameters θ of the target algorithm.5 While solving DAC using static AC may at first sight seem contradictory, this reduction gives rise to a highly practical solution approach that has been explored extensively in prior art (Fialho et al., 2010; L opez-Ib a nez et al., 2012; L opez-Ib anez & St utzle, 2014; Andersson et al., 2016; Ans otegui et al., 2017; Sae-Dan et al., 2020). An important benefit specific to this approach is that algorithm configurators are instance-aware and therefore automate the trade-off between the accuracy/cost of evaluation (using so-called racing mechanisms), and can even vary I I dynamically to focus evaluation on those instances providing the most useful information. Gradient-based Optimization In AC, we typically use gradient-free optimization. The motivation is that we cannot generally compute analytical gradients. While this is true in general, we would like to argue that the specific cases where we can actually compute them are more prominent than one might expect. Assuming a stepwise decomposable cost, we can compute the derivative λci = c(πλ,i) λ from the derivatives of the stepwise cost, the step, and the policy, using the chain rule (see Appendix B.3.6). When cstep, step, and π can be implemented using the operations in an automated differentiation framework (e.g., autograd in Pytorch, Paszke et al., 2017), these gradients can be calculated efficiently, reliably, without requiring any additional mathematical knowledge from the DAC practitioner. In fact, in the machine learning community, in particular meta-learning, differentiating through the entire learning process is almost standard practice (Maclaurin et al., 2015; Andrychowicz et al., 2016; Finn et al., 2017). The potential benefit of this extra piece of information is not to be underestimated. DAC policies may have many hyperparameters, e.g., a neural network with thousands of weights. Gradient-based optimization is an efficient way to navigate extremely high-dimensional spaces, as is evidenced by deep neural networks with millions of parameters being trained almost exclusively using simple first order optimization methods. That being said, gradients for DAC are no silver bullet. Computing them, if possible, may require too many computational resources. Furthermore, gradients only provide local information, i.e., an infinitesimal change to every parameter that is guaranteed to reduce cost. When f is particularly rugged, gradients may not provide information about the effect of any reasonably sized change. This phenomenon has, in fact, been observed in the context of learning-to-learn (Metz et al., 2019). 4.3 Reinforcement Learning vs. Optimization? Now, we discuss the relative strengths and weaknesses, and argue for the potential of combining both approaches. Why DAC by RL? The sequential nature of the problem is arguably the key feature that sets DAC apart from static AC: In static AC, we only have to select a single configuration, while in DAC we must select a sequence of such configurations. RL provides a very general framework for tackling sequential decision problems and was presented as the method of choice for DAC by Biedenkapp et al. (2020). DAC by optimization approaches reduce DAC to a non-sequential optimization problem. In doing so, valuable information about 5. We prove the correctness of this reduction in Appendix B.3.1 Adriaensen, Biedenkapp, Shala, Awad, Eimer, Lindauer, & Hutter the problem is lost that may otherwise be used to solve it more efficiently (Adriaensen & Now e, 2016). While executing a target algorithm, an RL agent observes at every step what configurations were used, the (immediate) costs this incurred, and how this affected the dynamic state of the algorithm. In contrast, the same evaluation provides a black box optimizer with a single value (i.e., the sum of costs incurred), at the end of the run. This inherent relative sample-inefficiency of black box optimization is particularly problematic when target algorithm execution is costly, e.g., takes multiple hours, and/or policy spaces are extremely large / unconstrained. Why DAC by Optimization? Previous work has shown that optimization can be a practical alternative to RL in simulated environments (Mannor et al., 2003; Szita & L orincz, 2006; Salimans et al., 2017; Chrabaszcz et al., 2018; Majid, 2021). While RL aims to exploit sequential information, contemporary RL methods do not always do so successfully. Also, in some scenarios, this information may not add much value, or may even be deceptive (e.g., delayed rewards). Finally, these mechanisms add considerable computational overhead, and complicate implementation. In contrast, optimization methods tend to be simpler, have fewer failure modes, and their often-parallel nature makes them well-suited for modern high-performance computing infrastructure. Adding to these limitations of RL methods are limitations of the reduction. While DAC is generally reducible to a (noisy) black box optimization problem, the previously discussed reduction to an MDP implicitly assumes (i) the cost function c to be stepwise decomposable6 and (ii) the space of policies Π to be unconstrained.7 As a consequence, it cannot be used when optimizing non-functional aspects of the policy (e.g., resources it requires to make decisions) or to impose arbitrary hard constraints on Π (e.g., which of these N policies is best?). Beyond RL or Optimization Our discussion thus far focused on contrasting both approaches. In what remains, we look at their relation, and argue for the potential of combining them. First, our sequential vs. non-sequential discussion can be extended to a method s ability to exploit a certain characteristic of DAC , or not. A good example of a cross-cutting characteristic is instance-awareness, both contextual RL and static AC can be viewed as instance-aware extensions of RL and black box optimization, respectively. Second, the pitfalls of RL also apply to approaches exploiting other characteristics. For example, gradients in optimization can be similarly deceptive (e.g, exploding/vanishing gradient problem) as immediate rewards. Therefore, while artificially hiding information is useless, blindly relying on it introduces failure modes, and general DAC methods should be carefully designed to only rely on information that is available and useful for the scenario at hand. In the context of sequential vs. non-sequential , this suggests the importance of combining reinforcement learning and optimization. Further underpinning this conjecture, is the observation that state-of-the-art static AC methods combine optimization with machine learning, and reinforcement learning is essentially a dynamic extension of the latter. 6. As defined at the end of Section 3.2 ( pure functional DAC). This is not to be confused with a dense reward signal that, while desirable for RL, is not a requirement for MDP reducibility. 7. MDP extensions exist, e.g., constrained MDPs (CMDPs, Altman, 1999), that support hard constraints. However, CMDPs conventionally constrain agent behavior (actions taken in encountered states) and still cannot encode arbitrary non-functional hard constraints in policy space. Automated Dynamic Algorithm Configuration 5. Benchmark Library In this section, we present DACBench (Eimer et al., 2021b), a novel benchmark library for DAC that we will be using in our experiments in Section 6. We have seen related fields like hyperparameter optimization, static algorithm configuration and algorithm selection profit greatly from focusing on shared benchmark problems (Eggensperger et al., 2013; Hutter et al., 2014; Bischl et al., 2016). In these meta-algorithmic domains, standardizing the target algorithm setup did not only increase the accessibility of the field by reducing some of the specialized knowledge required to get started in the field, but it also made comparisons between different methods more reliable and reproducible. DACBench provides such a standard for DAC. In what follows, we give a brief overview of the interface it provides, the benchmarks it implements, and prior empirical validation it has undergone. We also discuss novel developments and highlight extensions that were motivated by and/or made specifically in the context of this work.8 Interface DACBench builds upon a common RL interface, Open AI s gym (Brockman et al., 2016), as it provides a flexible template for stepwise interaction with the target algorithm. The target algorithm init is handled in the gym.Env.reset method, with each stepwise interaction handled by the gym.Env.step method. DACBench extends the gym.Env.reset method to provide the ability to select the problem instance i to be solved. Listing 1 shows how DAC components are mapped onto the gym interface in the benchmarks. These essentially implement the DAC by contextual RL reduction, discussed in Section 4.1. The result is a simple-to-use interface, allowing DAC researchers to work across application domains, without requiring domain expertise, and providing an easy-touse template for applying DAC to new domains. While the interface is modelled after the RL formulation of DAC, it can be used with a variety of approaches described in Section 4.2. That being said, the original DACbench interface strongly focused on conventional RL. In the scope of this work, we have extended the interface from the first release of DACBench. In accordance with our proposed definition of DAC, we have taken a broader perspective beyond standard RL, and made various interface changes to provide better support for alternative approaches. For example, users can now specify rich structured configuration spaces as opposed to the simplistic action spaces supported by conventional RL methods. Directly controlling instance progression is easier now as well, providing a better base for developing instance-aware solution methods for DAC. Benchmarks An overview of the benchmarks currently included in DACBench is given in Table 1. It includes several benchmarks that we have either added in the latest release or at least improved significantly. The original SGD-DL benchmark (see Section 6.3 for a thorough description) was extended to mimic the experimental setup from Daniel et al. (2016) as closely as possible. The CMA-ES benchmarks (CMAStep Size and Mod CMA) are now based on IOHProfiler (Doerr et al., 2018) and thus provide a DAC interface for a well-known and important tool in the EA community.9 Theory Bench is a completely new benchmark, published by Biedenkapp et al. (2022), where one is to dynamically configure the mutation rate of a (1+1) random local search algorithm for the Leading Ones problem. 8. A new version of https://github.com/automl/DACBench (v. 0.1) was released alongside this article. 9. The original pycma version of CMAStep Size is still supported and used in Section 6.1. Adriaensen, Biedenkapp, Shala, Awad, Eimer, Lindauer, & Hutter This is a particularly interesting setting as the exact runtime distribution is very well understood (Doerr, 2019). In particular, it is possible to compute optimal dynamic configuration policies for various different problem sizes and configuration spaces. Finally, a continuous class DACEnv(gym.Env): def __init__(self , A, Θ, D, Πφ, cstep): self.A, self.D, self.cstep, self.φ = A, D, cstep, Πφ.φ self.action_space = Θ def reset(self , i=sample(self.D)): s = self.A.init(i) self.state = (s, i) return self.φ(s, i) def step(self , θ): s, i = self.state s = self.A.step(s, i, θ) r = self.cstep(s, i, θ) done = self.A.is final(s, i) self.state = (s, i) return self.φ(s, i), r, done , None Listing 1: A generic python implementation of a gym environment using components of a DAC scenario (in blue) with decomposable cost and an input-constrained policy space Πφ. Practical DACBench benchmarks implement a similar mapping, but DAC components are typically not strictly separated, e.g., A. step and cstep would typically be calculated jointly. Note that the gym.Env.step method, despite its name, does far more than merely computing A. step: It implements the transition dynamics (T) and reward signal (R). Furthermore, unlike A. step, it is stateful, does not take the state (s, i) as input, and does not necessarily return the new state. Instead, it more generally returns what is called an observation φ(s, i) which may abstract arbitrary aspects of the internal state, i.e., DACBench technically reduces DAC to a contextual partially observable MDP (c POMDP). Note that the learned policy in POMDPs is a function of the observable state, and hence φ can be viewed as modeling a policy space constraint of the form Πφ = {π | φ(s, i) = φ(s , i ) = π(s, i) = π(s , i )}. Benchmark Domain Status Description Sigmoid Toy Extended Control k parameters to trace a different sigmoids each (Biedenkapp et al., 2020). Luby Toy Original Select the correct next term in a shifted luby sequence (Biedenkapp et al., 2020). CMAStep Size EA Extended Control the step size in CMA-ES (Shala et al., 2020). Fast Downward Planning Original Control heuristic selection in Fast Downward (Speck et al., 2021). SGD-DL DL Extended Control the SGD for neural network training (Daniel et al., 2016). Theory Bench EA New Control the mutation rate of (1+1)RLS for Leading Ones (Biedenkapp et al., 2022). Mod CMA EA New Control design choices (e.g., base sampler used) of CMA-ES (Vermetten et al., 2019). Toy GD Toy New Control the learning rate of gradient descent on polynomial functions. Table 1: DACBench Benchmarks. Status compares the current state of each benchmark to the benchmarks originally introduced by Eimer et al. (2021b). Automated Dynamic Algorithm Configuration Sigmoid variation and SGD on polynomials provide additional artificial benchmarks for efficient evaluation of DAC algorithms. Empirical Validation DACBench is a very recent library. As a consequence, it has not yet been used in prior art. Eimer et al. (2021b) focused on providing a unified interface to a variety of benchmarks and analyzed specific properties of these benchmarks based on the behavior of static policies and simple hand-crafted dynamic policies. Here, we describe the first applications of practical DAC methods to these benchmarks, and provide important empirical validation for DACBench. 6. Empirical Case Studies In this section, we discuss in more detail three successful applications of automated DAC in different areas of AI: evolutionary optimization (Shala et al., 2020, Section 6.1), AI planning (Speck et al., 2021, Section 6.2), and machine learning (Daniel et al., 2016, Section 6.3). The primary purpose of this section is to complement the general, big picture discussions in previous sections with some concrete practical examples of automated DAC. Here, we cover our own work in this area (Shala et al., 2020; Speck et al., 2021), supplemented with a machine learning application (Daniel et al., 2016) for diversity. In these case studies, we also conducted additional experiments to answer the following research questions. RQ1: Can we reproduce the main results of the original papers using DACBench? Since it is well known that RL results are hard to reproduce (Henderson et al., 2018), in order to provide a solid foundation for experimental work in the field we believe it to be important to repeat the original experiments, this time using the publicly available re-implementations provided by DACBench (i.e., the CMAStep Size, Fast Downward, and SGD-DL benchmarks) and to compare the results obtained to those of the original papers. Beyond insights into the reproducibility of the prior work, this analysis provides empirical validation for DACBench: This is the first study investigating whether, and to what extent, the benchmarks in DACBench permit reproducing the original results. Further, it is worth noting that the work by Daniel et al. (2016) is closed source, and that this is the first reproduction of their experiments with open-source code. RQ2: Does DAC outperform static AC in practice? Theoretically, an optimal DAC policy will be at least as good as an optimal static AC policy. In practice, however, the superiority of DAC is not guaranteed, since practical DAC methods may not be capable of finding an optimal/better DAC policy and/or doing so may require more computational resources than available. To investigate this, for each scenario in our case studies, we compare the anytime performance of the DAC method used to that of static AC baselines: We run SMAC (as a classical AC method, Hutter et al., 2011; Lindauer et al., 2022) and Hydra10 (as a PIAC method, Xu et al., 2010) on the same problem, and compare the performance of the best dynamic/static policies found at any time during the configuration process. We further include the theoretical upper bounds for 10. Hydra combines SMAC with an algorithm selection method of choice. Since most of the considered benchmarks do not have instance features, we will assume an oracle selecting the best configuration in the portfolio. We treat the maximum size of the portfolio as a case study dependent hyperparameter and detail this choice in the respective experimental setups. Adriaensen, Biedenkapp, Shala, Awad, Eimer, Lindauer, & Hutter classical AC (SBS = minθ Θ 1 |I | P i I c(θ, i)) and PIAC (VBS = 1 |I | P i I minθ Θ c(θ, i)) as reference, to distinguish practical from inherent limitations of static AC.11 In what follows, we discuss our three case studies, in each case presenting an introduction to the domain, the problem formulation as an instance of DAC, the solution method, the experimental setup, the results, and a discussion thereof.12 6.1 Step Size Adaptation in CMA-ES The first problem we consider is to dynamically set the step-size parameter of the Covariance Matrix Adaptation Evolution Strategy (CMA-ES, Hansen et al., 2003), an evolutionary algorithm for continuous black box optimization. Each generation g, CMA-ES evaluates the objective value f of λ individuals x(g+1) 1 , ..., x(g+1) λ sampled from a non-stationary multi- variate Gaussian distribution N(µ(g), σ(g)2 C(g)). Then, based on the outcome of these evaluations, CMA-ES heuristically adapts the parameters µ, σ, C of the search distribution aiming to increase the likelihood of generating better individuals next generation. In particular, the step-size parameter σ controls the scale of the search distribution and CMA-ES by default adjusts it using Cumulative Step Length Adaptation (CSA, Hansen & Ostermeier, 1996). Note that variants of CMA-ES, e.g., IPOP-CMA-ES (Auger & Hansen, 2005), exist that also adapt the population size λ dynamically. While these hand-crafted parameter control heuristics are arguably key to CMA-ES s success, they are by no means optimal and implicitly make assumptions about the task distribution. For instance, these heuristics have themselves (hyper)parameters, and it has been shown that tuning these can further improve IPOP-CMA-ES s performance (Liao et al., 2011; L opez-Ib a nez et al., 2012). In Shala et al. (2020), we investigated the possibility of learning step-size adaptation in a data-driven fashion, optimized for the task distribution at hand, i.e. automated DAC. Problem Formulation: Below, we briefly detail each of the DAC components: A, Θ: The target algorithm to configure in this scenario is CMA-ES. As in Shala et al. (2020), we use the pycma distribution of CMA-ES. Its interface allows for stepwise execution of CMA-ES. CMA-ES is initialized with a given initial mean µ(0) and stepsize σ(0) (C(0) = 1). Each generation g, we 1. sample λ individuals x(g+1) 1 , ..., x(g+1) λ from N(µ(g), σ(g)2 C(g)) 2. evaluate the objective function values f(x(g+1) 1 ), ..., f(x(g+1) λ ) of these individuals 3. adapt the distribution parameters µ(g+1), σ(g+1), C(g+1) for the next generation. In this final step, the mean µ and covariance C are adapted as usual in CMA-ES, while the step-size σ is to be reconfigured dynamically in the range Θ = R+. 11. Note that the acronyms SBS (single best solver) and VBS (virtual best solver) stem from the algorithm selection literature. More details on how these theoretical bounds were determined can be found in the experimental setup of the respective case studies. 12. Code for reproducing these experiments is publicly available: https://github.com/automl/2022_JAIR_DAC_experiments Automated Dynamic Algorithm Configuration D, I: Instances correspond to tuples consisting of a black box function f and an initial search distribution. Here, the latter is isotropic and defined by an initial mean m(0) and step-size σ(0). Π : The policies are constrained to be functions of a specific observable state composed of: (i) the current step-size value σ(g); (ii) the current cumulative path length p(g) σ (Hansen & Ostermeier, 1996); (iii) the history of changes in objective value, i.e., the normalized differences between successive objective values, from h previous iterations; and (iv) the history of step-sizes from h previous iterations. c : The cost metric used is the likelihood of outperforming CSA . Assuming we perform two runs of CMA-ES, one using π, the other CSA, it measures how likely the latter is to obtain a better final solution than the former. We estimate this probability based on pairwise comparisons of n = 25 runs varying only the random seed of CMA-ES, i.e., Pn j Pn k 1πj