# doubleended_synthesis_planning_with_goalconstrained_bidirectional_search__3ba04127.pdf

Double-Ended Synthesis Planning with Goal-Constrained Bidirectional Search

Kevin Yu MIT kyu3@mit.edu

Jihye Roh MIT jroh99@mit.edu

Ziang Li Georgia Tech ziang@gatech.edu

Wenhao Gao MIT whgao@mit.edu

Runzhong Wang MIT runzhong@mit.edu

Connor W. Coley MIT ccoley@mit.edu

Computer-aided synthesis planning (CASP) algorithms have demonstrated expertlevel abilities in planning retrosynthetic routes to molecules of low to moderate complexity. However, current search methods assume the sufficiency of reaching arbitrary building blocks, failing to address the common real-world constraint where using specific molecules is desired. To this end, we present a formulation of synthesis planning with starting material constraints. Under this formulation, we propose Double-Ended Synthesis Planning (DESP), a novel CASP algorithm under a bidirectional graph search scheme that interleaves expansions from the target and from the goal starting materials to ensure constraint satisfiability. The search algorithm is guided by a goal-conditioned cost network learned offline from a partially observed hypergraph of valid chemical reactions. We demonstrate the utility of DESP in improving solve rates and reducing the number of search expansions by biasing synthesis planning towards expert goals on multiple new benchmarks. DESP can make use of existing one-step retrosynthesis models, and we anticipate its performance to scale as these one-step model capabilities improve.

1 Introduction

Synthesis planning proposing a series of chemical reactions starting from purchasable building blocks to synthesize one or more target molecules is a fundamental task in chemistry. For decades, chemists have approached the challenge of synthesis planning with retrosynthetic analysis [1, 2], the strategy by which a target molecule is recursively broken down into simple precursors with reversed reactions. In recent years, advances in machine learning have enabled a multitude of computer-aided synthesis planning (CASP) algorithms [3 6] that navigate a combinatorially large space of reactions to propose chemically sensible routes to a variety of drug-like molecules within seconds to minutes. However, fully data-driven algorithms still underperform when generalizing to realistic use cases such as planning for more complex targets or in constrained solution spaces. In practice, expert chemists may plan syntheses with specific starting materials in mind, called structure-goals" [1], that constrain the solution space. For instance, efficient syntheses of highly complex drugs are often most practical when synthesized from naturally-occurring starting materials that share complex features with the target, a practice known as semi-synthesis" [7, 8]. There is also immense interest in identifying pathways from waste or sustainable feedstocks to useful chemicals [9 11], but existing methods have thus far relied on heuristics and brute-force enumeration of reactions.

Though algorithms for planning synthetic routes from expert-specified starting materials have been proposed [12, 13], the vast majority of CASP algorithms today cannot address starting material-

38th Conference on Neural Information Processing Systems (Neur IPS 2024).

a. Prior Work: Single-ended search b. This Work: Double-ended search

Non-buyable

Building block

expert-speciﬁed starting material

arbitrary building blocks

Path in solution

Explored paths

Figure 1: (a) Existing search methods are single-ended, and aim to identify a synthetic route where all leaf nodes meet certain termination criteria, e.g., buyability. (b) DESP is a bidirectional search algorithm that enables a double-ended starting material-constrained search, better reflecting certain real-world use cases in complex molecule synthesis planning. Empirically, double-ended search finds starting material-constrained solutions with fewer node expansions.

constrained use cases, as they assume that solution states may comprise any combination of building blocks. It is non-trivial to extend from general" retrosynthesis planning to the constrained setting; by requiring a solution to contain a specific goal molecule, starting material-constrained synthesis planning presents the challenge of simultaneously guiding a search towards this goal molecule and any other necessary buyable molecules.

In this paper, we address these challenges by proposing a strategy for starting material-constrained synthesis planning with a bidirectional search algorithm and a goal-conditioned cost network learned offline from expert trajectories implicit to a validated reaction corpus. Given a target molecule and one or more specified starting materials, our Double-Ended Synthesis Planning (DESP) algorithm takes advantage of the natural reversibility of retrosynthesis by instantiating two AND-OR search graphs and alternately performing retrosynthetic expansions and forward synthetic expansions. We present two variations of DESP based on front-to-end (F2E) and front-to-front (F2F) bidirectional search. In F2E search, each direction of the search is conditioned on the root node of the opposing search graph, while in F2F, each search is conditioned on the closest" nodes of the opposing search graph. In both cases, finding solutions is accelerated when the bottom-up" search graph converges with the top-down retrosynthesis search graph. Each step of selection and expansion of bottom-up nodes is conditioned on a specific molecule in the retrosynthetic graph, and we devise a means of utilizing both our goal-conditioned cost network and an existing cost network for general retrosynthesis in the top-down search policy. The goal-conditioned cost network, which we term the synthetic distance" network, is trained offline based on the observation that multi-step synthetic routes can be interpreted as expert plans where any of the non-root molecules represents a starting material goal for the final target molecule, thus bypassing the need for self-play using reinforcement learning (RL). In order to train the network on negative experiences", we also sample pairs of molecules between which no path exists through known reactions. Our contributions can be summarized as follows:

We provide a formulation of starting material-constrained synthesis planning and release the first benchmarks for evaluating algorithms on this task, including new real-world benchmarks collected from the Pistachio database [14] addressing redundancies in the widely-used USPTO-190 test set. We present a starting material-constrained neural bidirectional search algorithm to tackle doubleended synthesis planning. Specifically, we present a cost network that estimates the synthetic distance" between molecules (instead of the distance between a molecule and arbitrary purchasable building blocks) and an A*-like bidirectional search algorithm that strictly reflects the constraints. We present strong empirical results that illustrate the efficiency of double-ended synthesis planning. Compared to existing algorithms, DESP expands fewer nodes and solves more targets under goal constraints, demonstrating its value in biasing CASP algorithms towards expert goals.

2 Background

2.1 Related work

Computer-aided retrosynthetic analysis Retrosynthetic analysis has traditionally been formulated as a tree search problem, where each step involves searching for chemically feasible transformations and corresponding reagents to derive the product molecule. In defining the feasible transformation,

template-based methods select graph transformation rules to apply based on expert rules [15] or use data-driven methods [16 18], such as a neural network policy trained on a reaction corpus [19]. Template-free methods frame one-step retrosynthesis prediction as a translation task of SMILES strings [20, 21] or a graph-edit prediction [22]. In searching for multi-step synthetic pathways, the focus of late has been on selecting non-terminal nodes for expansion. Initial efforts relied on expert-defined rules and heuristics [2, 15], whereas more recent efforts combine neural networks and Monte Carlo Tree Search (MCTS) [3], as well as AND-OR graph searches that address the hypergraph complexity of reaction routes [23, 6, 4, 24]. Notably, Chen et al. [6] proposed Retro*, a neural-guided A*-like search algorithm that we build on as part of our approach. Much additional work has been done to improve multi-step CASP algorithms [25 32], primarily via improvements of single-step policies in a multi-step context or value functions for improved search guidance. Unlike DESP, these methods do not address the problem formulation where the pathway search is constrained by one or more starting materials, as shown Fig. 4. One exception is GRASP [13], which utilizes RL with hindsight experience replay [33] for goal-conditioned value estimation. Additionally, starting material-oriented planning capablities were implemented in the LHASA program [12] but rely entirely on expert-defined rules. In this work, we instead train a cost network offline using historical reaction data and use bidirectional search to augment the retrosynthesis planner.

Synthesizable molecular design Recent advances in computer-aided molecular design have introduced novel approaches to synthesis planning. To ensure high synthetic accessibility of designed molecules, researchers have proposed assembling compounds in silico by applying valid chemical transformations to purchasable building blocks, effectively searching for molecules within a reaction network [34 39]. The advent of deep generative modeling has further enabled the generation of synthetic pathways with neural models [40 44]. These methods commonly employ a bottom-up strategy, constructing synthetic pathways from building blocks to the final product. Gao et al. [42] proposed that this framework could enable bottom-up synthesis planning," in which the goal of generation is to match a specified target molecule, and demonstrated the successful application of this approach despite a low empirical reconstruction rate. In this work, we build upon Gao et al. [42] s method of conditional synthetic route generation by increasing the number of reaction templates, training on a larger reaction corpus, and integrating the models into a bidirectional search algorithm.

Bidirectional search Bidirectional search is a general strategy that can accelerate search in problems that involve start and goal states by interleaving a traditional search from the start state and a reverse search from the goal state [45], usually guided with either neural networks or expert heuristics. It has demonstrated utility in problems such as robotic path planning [46, 47], program synthesis [48], traffic management [49], and puzzle solving [50]. However, the application of bidirectional search in synthesis planning has not been explored. When integrating an informed method of evaluating nodes, bidirectional search can be divided into front-to-end (F2E) and front-to-front (F2F) strategies [51, 52]. In F2E search, evaluations are made by estimating the minimal cost of a path between a frontier node and the opposite goal, while in F2F search, evaluations are made by estimating the minimal cost of a path between opposing frontier nodes. In this work, we implement both F2E and F2F variants of DESP to observe the empirical differences between the strategies in the synthesis planning setting.

2.2 Formulation of general and starting material-constrained synthesis planning problems

General synthesis planning In this work, we only consider template-based retrosynthesis methods, though any single-step model is compatible with our algorithm. Let M be the set of all valid molecules, R be the set of all valid reactions, and T be the set of all valid reaction templates. A reaction Ri R is a tuple (ri, pi, ti), comprising a set of reactants ri M, a single product pi M, and a retro template ti T . A retro template t is a function t : M 2M that maps a product to precursors such that i : ri ti(pi). Likewise, a forward template t T is a function t : 2M M where i : pi t (ri).

Given target molecule p M and set of building blocks (BBs) B M, synthesis planning finds a valid synthetic route a set of reactions S = {R1, . . . Rn} that satisfies the following constraints.

Constraint 1 (Synthesize all non-BBs). i : m ri, m / B = j s.t. m = pj;

Constraint 2 (Target is final molecule synthesized). i s.t. pi = p , i : p / ri;

Finally, we require that the graph formed by S is a directed acyclic graph (DAG), where for each i, the product pi is mapped to a node, which has a directed edge to a node mapping to reaction Ri, which in turn has directed edges to nodes mapping to the reactants ri.

Starting material-constrained synthesis planning Given a specific starting material (s.m.) r M, in addition to Constraint 2, a valid synthetic route satisfies the following constraints. Constraint 3 (s.m. used). i s.t. r ri, j s.t. r pj; Constraint 4 (s.m. not necessarily BB). i : m ri, m / B {r } = j s.t. m = pj;

Fig. 1b illustrates an example of a valid starting material-constrained route found through bidirectional search. DESP can also be used for the more general form of the problem in which a set of potential starting materials {r 1, . . . r n} is given on input, and at least one leaf node must map to r i for some 1 i n. For simplicity, we only consider the single r case unless otherwise specified.

DESP is built on the Retro* algorithm [6] and recent advances that enable conditional generation of synthetic routes from the bottom up [41, 42].

3.1 Definition of synthetic distance, a goal-conditioned cost function

Like Retro* [6], DESP is an A*-like search and thus requires a method of evaluating the expected cost of various frontier nodes. We follow Retro* and use the notation of Vt(m|T), Vm, and rn functions (Section A.2 details Retro* and these functions). We also define a function c : R R which maps a reaction to a scalar cost. For a valid synthetic route S = {R1, . . . , Rn}, the total cost of S is Pn i=1 c(Ri). Vm represents the minimum total cost across every valid synthetic route to molecule m, and is learned in Retro* and DESP to bias the search towards B.

However, to maintain consistency in guiding A* search in the starting material-constrained setting, we require not only an estimate of the cost of synthesizing molecule m from arbitrary building blocks, but also an estimate of the cost of synthesizing molecule m2 from m1 specifically (in addition to other arbitrary building blocks). As such, we define a new function D : M M R, which we term synthetic distance, as it effectively represents the minimum cost distance between two molecules in G, the graph constructed from the set of all possible reaction tuples R. More precisely, the synthetic distance from m1 to m2 is the difference between the minimum cost of synthesizing m2 across all valid synthetic routes containing m1 and the minimum cost of synthesizing m1 across all valid synthetic routes in general. Learning D then allows for the guidance of both top-down search towards the starting material and bottom-up search towards the target with rapid node comparisons.

3.2 DESP algorithm overview

In practice, synthesis planning problems are generally approached by simulating a search through the complete reaction graph G. We follow Xie et al. [30] in considering an AND-OR graph structure for search graphs, in which molecules are represented by OR nodes (only one child must be solved) and reactions are represented by AND nodes (all children must be solved). In implementing most synthesis planning algorithms [3, 6], one initializes the search graph G = {p }. With DESP, we instead initialize two search graphs GR = {p }, GF = {r } and introduce two expansion policies, one for top-down" retrosynthesis expansions on GR and another for bottom-up" forward expansions on GF . This allows us to perform a bidirectional graph search between the target and goal molecules by interleaving retro and forward expansions, with the goal of the two search graphs converging to more efficiently find a valid synthetic route. In this work, we implement F2E and F2F variants of DESP. Notably, our implementation of F2F performs node comparisons to all nodes in the opposing search graph rather than just frontiers. For m GR, we define a goal function γ : M M such that γ(m) = r in F2E and γ(m) = arg ming GF D(g , m) in F2F. Likewise, for m GF , let γ(m) = p in F2E and γ(m) = arg ming GR D(m, g ) in F2F.

The following quantities or functions are relevant in the algorithm: rn, Vt(m|G), and Vm from Retro*, and somewhat analogously dn, Dt(m|GR), and Dm. We briefly define the new quantities: (1) Dm represents D(γ(m), m). (2) dn(m|GR) represents the distance numbers" of a top node m.

1. Select promising frontier node from top 2. Expand the selected node and Update search graph

a. DESP algorithm

b. Expansion procedures

Classiﬁers rank templates

Obtain top n templates

Predict ﬁngerprint of building block

Apply templates

If unimolecular fwd. exp. or retro exp.

0 1 1 0 0 1 ... 0 1

Perform k-NN search on building block database

Building block

Add product or precursors to search graph

If F2E: Target

Selected m2

Retro expand

Forward expand

3. Select promising frontier node from bottom 4. Expand the selected node and Update search graph

If bimolecular fwd. exp.

Searchers met here

Still needs to reach BBs

If F2F: Closest m3

Figure 2: (a) DESP algorithm. Evaluation of top nodes is based on both Vm and Dm. For F2E search, synthetic distance is calculated between a molecule and the opposing goal, while for F2F, it is calculated based on the closest opposing molecule. (b) Overview of the one-step expansion procedures.

This is a multiset of values Dm Vm for related frontier nodes collected for dynamic programming from the bottom-up during the update phase. (3) Dt(m|GR) is a multiset of all values of Dm Vm across frontier nodes in the minimum cost synthetic route to the target p through molecule m. At a high level, we introduce these quantities and new policies to account for the fact that only one subgoal in a valid synthetic route needs to reach r . The top-down searcher of DESP is thus an extension of Retro* that simultaneously utilizes general retrosynthesis and synthetic distance cost networks.

Like most CASP algorithms, DESP cycles between steps of selection, expansion, and update until the termination criteria are satisfied. However, DESP also alternates between performing these steps for the top-down and bottom-up search graphs (Fig. 2), with each search having its own policies.

Selection For top-down selection, we select an frontier molecule node that minimizes the expected total cost of synthesizing the target p from the goal molecule r through that node. Let FR represent the set of frontier top molecules and FF represent the set of frontier bottom molecules. Then,

mselect,R arg min m FR [Vt(m|GR) + min(Dt(m|GR))] (1)

The bottom-up selection policy is identical to that of Retro*.

mselect,F arg min m FF Vt(m|GF ) (2)

Expansion For top-down expansion, we follow other AND-OR-based algorithms in calling a single-step retrosynthesis model, applying the top n predicted templates to the selected node and adding the resulting reactions and their precursors as nodes to the graph. For each added molecule node mi, we initialize: (1) rn(mi|GR) Vmi, equal to the Retro* value function, and (2) dn(mi|GR) {Dmi Vmi} = {D(γ(mi), mi) Vm}.

For bottom node m, we perform the forward expansion procedure detailed in Section 3.3, conditioned on γ(m). For each added product pi, we then initialize rn(pi|GF ) Vpi = D(pi, γ(pi))

Update For GR, we propagate updates to relevant values up the graph and then back down to related nodes, similar to other AND-OR algorithms. As the update rules for the Retro* quantities are the same, we only provide the update rules for the new quantities, and details of the Retro* updates is in Section A.2. GF is also updated according to the Retro* algorithm (as branching from multiple product OR nodes is not allowed in forward expansions), so the following new updates only apply to GR. We first uppropagate, performing updates up the graph for reaction (AND) nodes R and molecule (OR) nodes m, where the ch and pr functions return the children and parent nodes for an input node:

m ch(R) dn(m|GR) (3)

( [Dm Vm] if x FR dn arg min R ch(m) rn(R) GR otherwise (4)

We then downpropagate the following updates:

Dt(R|GR) dn(pr(R)|GR) dn arg min R ch(pr(R)) rn(R |GR) GR + dn(R|GR) (5)

Dt(m|GR) Dt arg min R pr(m) rn(R|GR) GR (6)

Justification for the rules and additional details, including DESP pseudocode, is in Section A.5.

3.3 Forward expansion policy with conditional generation of one-step reactions

To perform forward one-step synthesis expansions, we adapt the approach of Gao et al. [42]. Let zn m : M Rn and zn t : T Rn be functions mapping a molecule and template (respectively) to n-dimensional embeddings. We define two target functions:

ft : M M T σ(MLPt(zn m(m1) zn m(m2))) (7)

fb : M M T B k-NNB(MLPb(zn m(m1) zn m(m2) zn t (t ))) (8)

Together, the learned approximations of ft and fb define our forward expansion policy (Algorithm 1), which generates forward reactions for the expanded node m1 conditioned on m2. Algorithm 1: FORWARD_EXPAND(m1, m2, GF , N, K) m1: molecule selected for expansion, m2: molecule to condition expansion on, GF : bottom search graph, N: num. templates to propose, K: num. building blocks to search t TOP_N(σ(MLPt(zm(m1) zm(m2)))) ; /* Get top N forward templates */ for i 1 to N do

if t [i] is unimolecular then

p t [i](m1) ; /* Apply fwd. template to m */ GF .ADD_RXN({m1}, p, t [i]) ; /* Add reaction and product to GF */ else /* t [i] is bimolecular */ b KNNB(MLPb(zm(m1) zm(m2) zt(t [i]))) ; /* Get K nearest BBs by cosine sim. */ j 1 to K: GF .ADD_RXN({m1, b}, t [i](m1, b[j]), t [i]) ; /* Apply t [i] */ end end

3.4 Extracting multi-step reaction data from a reaction corpus for offline learning

To learn ft, fb, and D, we approximate G by generating the incomplete network from a reaction dataset. In this work, we use the public USPTO-Full dataset [53, 54] of approximately 1 million deduplicated reactions. The dataset is filtered and processed (details in Section A.3), and a template set TUSPTO is extracted with RDChiral [55]. The dataset is randomly divided into training and validation splits with ratio 90:10. From the training split RUSPTO we construct the graph GUSPTO. We filter reactions that (1) involve more than 2 reactants or (2) whose product cannot be recovered by applying the forward template t , yielding RFWD, GFWD, and T FWD.

Table 1: Benchmark dataset summary. Avg. In-Dist. % is the mean percentage of reactions in each route within the top 50 suggestions of the retro model. Unique Rxn.% is the ratio of deduplicated reactions to total reactions across all routes. Avg. # Rxns. is the mean number of reactions in each route, and Avg. Depth is the mean number of reactions in the longest path of each route.

Dataset # Routes Avg. In-Dist. % Unique Rxn. % Avg. # Rxns. Avg. Depth

USPTO-190 190 65.6 50.5 6.7 6.0 Pistachio Reachable 150 100 86.1 5.5 5.4 Pistachio Hard 100 59.9 95.2 7.5 7.2

To learn ft and fb, a full enumeration of all pathways (until reaching nodes in B) rooted at p is performed for each molecule node p in GFWD. Reaction nodes in the enumerated pathways then each provide a training example for ft and fb. Likewise, we enumerate pathways in GUSPTO, and each molecule node m in a pathway yields a training example for learning D(m, p ). The details for our training procedures are described in Section A.4.

Notably, we inject negative" examples into our training set for D, as the distribution of costs is highly skewed towards low values. We define a modified MSE loss function as in Kim et al. [56] for learning D:

L = (ypred ytrue)2 if ytrue Dmax max(0, Dmax + 1 ypred)2 else (9)

This strategy allows the model to default to an approximate value of Dmax + 1 for any highly distant" molecule inputs. Now, for each target p , we randomly sample a molecule m GUSPTO with no path to p and Tanimoto similarity < 0.7 and add a training example with label . In this work, we use Dmax = 9.

4 Experiments

Our experiments are designed to answer the following: (1) Does DESP significantly improve the performance of starting material-constrained synthesis planning compared to baseline methods? (2) To what extent do D and bidirectional search account for the performance of DESP? (3) Can DESP find routes to more complex targets than baseline methods? (4) What empirical differences do we see between F2E and F2F strategies?

4.1 Experimental setup

Datasets Few public datasets of multi-step synthetic routes exist. Previous works in multi-step synthesis planning have widely used the USPTO-190 dataset [6], a set of 190 targets with corresponding routes extracted from the USPTO-Full dataset. Others have tested on targets sampled from databases such as Ch EMBL or GDB17 [57, 27, 31], but their lack of ground truth routes precludes the systematic selection of starting materials for our task. Pa Routes [58] has been proposed as an evaluation set, but they do not provide a standardized training set to prevent data leakage.

In addition to USPTO-190, because of its large proportion of out-of-distribution and redundant reactions (Table 1), we create and release two additional benchmark sets, which we call Pistachio Reachable and Pistachio Hard. Details of their construction are provided in Section A.6. To obtain the ground-truth goal molecules for each of our test sets, we find the longest path from target to leaf node in each route DAG and pick the leaf node with more heavy atoms. For the building block set B, we canonicalize all SMILES strings in the set of 23 million purchasable building blocks from e Molecules used by Chen et al. [6].

Model training As in [6], we train a single-step retrosynthesis MLP (Neural Sym) and Retro* cost network on our processed training split of USPTO-Full. The synthetic distance and forward expansion models are trained as described in Sections 3.4 and A.4.

Multi-step algorithms Because we utilize an AND-OR search graph with no duplicate molecule nodes, our implementation of Retro* is more comparable to Retro Graph [30], but we do not employ

Table 2: Summary of starting material-constrained planning performance across the three benchmarks. Solve rate refers to the percentage of (p , r ) pairs in the dataset solved at the given expansion limits. The average number of expansions N is given for each method, with a max budget of 500.

Algorithm USPTO-190 Pistachio Reachable Pistachio Hard

Solve Rate (%) N Solve Rate (%) N Solve Rate (%) N

N=100 300 500 50 100 300 100 300 500

Random 4.2 4.7 4.7 479 16.0 26.7 40.7 325 6.0 12.0 13.0 452 BFS 12.1 20.0 24.2 413 48.7 57.3 74.0 169 16.0 26.0 29.0 390 MCTS 20.5 32.1 35.3 364 52.0 72.7 85.3 111 27.0 31.0 32.0 361 Retro* 25.8 33.2 35.8 351 70.7 78.0 92.7 73 32.0 35.0 37.0 342 GRASP 15.3 21.1 23.7 410 46.7 51.3 66.7 198 14.0 22.0 29.0 402

Bi-BFS 20.0 26.3 28.4 382 66.0 75.3 86.0 101 28.0 34.0 38.0 341 Retro*+D 27.4 32.6 37.4 348 77.3 87.3 96.0 49 31.0 40.0 42.0 323 DESP-F2E 30.0 35.3 39.5 340 84.0 90.0 96.0 41 35.0 44.0 50.0 300 DESP-F2F 29.5 34.2 39.5 336 84.5 88.9 97.3 38 39.0 45.0 48.0 293

their GNN guided value estimation and thus refer to the algorithm as Retro* for simplicity. This serves as both a baseline and ablated version of DESP (without bidirectional search or D). In addition, we test the performance of random selection, breadth-first search (BFS), bidirectional-BFS, and MCTS. Finally, we integrate our single-step model into GRASP [13] using the authors published code. Since their data is not publicly available, we retrained their model on 10,000 randomly sampled targets in our training set and run their search implementation on each benchmark. For all methods, we enforce a maximum molecule depth of 11, a maximum of 500 total expansions (retro or forward), and apply 50 retro templates per expansion. For DESP, we also enforce a maximum molecule depth of 6 for the bottom-up search, apply 25 forward templates per expansion, and use the top 2 building blocks found in the k-NN search. Due to the asymmetry of the bidirectional search, we also introduce a hyperparameter λ, the number of times we repeat a select, expand, and update cycle for GR before performing one cycle for GF . For all experiments, we set λ = 2. Details and tabular summaries of the evaluations performed and hyperparameters chosen are provided in Section A.7.

4.2 Results

Table 3: Average stdev of the number reactions in proposed routes. Comparisons are made across (p , r ) pairs solved by all methods.

Dataset USPTO-190 Pistachio Easy Pistachio Hard

# Routes 63 139 36

Avg. # Rxns.

Retro* 5.56 2.37 4.94 2.27 4.81 2.09 Retro*+D 5.87 2.37 4.92 2.27 4.80 2.08 DESP-F2E 5.56 2.55 4.86 2.17 4.67 2.35 DESP-F2F 5.95 3.93 5.17 2.37 4.78 2.60

Though it is notoriously difficult to quantitatively evaluate synthetic routes proposed in silico without expert evaluation, there are widelyused metrics thought to correlate with successful algorithms, such as higher solve rate (under varying computational budgets), lower average number of expansions, and lower average number of reactions in found routes [59, 57]. We focus on these metrics, as they are arguably most related to a search algorithm s efficiency. Because all methods employ the same one-step model and set of templates from USPTO-Full, we treat their proposals as equally chemically feasible.

Improvement on starting material-constrained synthesis planning Quantitative benchmarking results are summarized in Table 2. Both variants of DESP outperform all baseline methods in terms of solve rate and average number of expansions across all test sets. The solve rates of baseline methods on USPTO-190 are noticeably lower than commonly reported ranges for general synthesis planning [6], as the starting material constraint increases the difficulty of the task. Notably, unlike the other benchmarked methods, the Random, BFS, MCTS, and Retro* are standard single-ended search methods that do not make any use of the starting material constraint information.

Ablation studies To investigate the contributions of D and bidirectional search, we conduct an ablation study by running Retro* guided by D on all benchmarks (denoted as Retro*+D in Tables 2 and 3). We find that incorporating D generally results in higher solve rates and fewer average

a. b. Pistachio Hard Pistachio Hard

SCScore SAScore

Combined Benchmarks

# Fwd. Reactions

Figure 3: Ablation study. (a) Solve rate as a function of the binned complexity of target molecules in Pistachio Hard. (b) Number of forward reactions in DESP routes across all benchmark sets.

Appel reduction

hydrogenation

starting material

Figure 4: Exemplary synthetic route for a test case that DESP-F2F was able to solve but Retro* was unable to solve. DESP-F2F was able to match every step of the reference route in this case.

expansions across all datasets, but still does not outperform DESP, demonstrating the role of both D and bidirectional search in improving planning efficiency. As an indicator of route quality, we also investigate the average number of reactions in the outputs of DESP (Table 3). DESP-F2E is able to find shorter routes on average when compared to either Retro* or Retro* guided by D. An example of a route solved by DESP-F2F (but not by Retro*) is visualized in Fig. 4.

Performance on complex targets To investigate the degree to which DESP improves planning performance on complex targets, we bin each target in Pistachio Hard by two commonly-used metrics of synthetic complexity, SCScore [60] and SAScore [61]. Both variants of DESP equal or outperform Retro* on solve rates across all complexity ranges (Fig. 3a). This indicates that, in the starting material-constrained setting, DESP can improve planning performance on targets of higher complexity, a regime which current CASP algorithms struggle with.

F2E and F2F comparisons Though DESP-F2F consistently expands slightly fewer nodes on average, the empirical differences in efficiency between F2E and F2F are small. However, DESP-F2E is able to find noticeably shorter routes on average than DESP-F2F, which finds routes even longer than Retro* on multiple benchmarks (Table 3). A likely reason for this difference is due to the lack of consideration of the pathway depth of existing nodes in front-to-front search, which we elaborate on in Section A.8. We also investigate the extent to which reactions from forward expansions end up in the solutions of each variant. As visualized in Fig. 3b, DESP-F2F incorporates more forward reactions, while DESP-F2E solutions are dominated by top-down search almost half the time. We hypothesize

that the difficulty of bottom-up synthesis planning [42] further contributes to the increased length of DESP-F2F solutions, as DESP-F2F empirically relies more on forward reactions and thus is likely more sensitive to the performance of the forward models.

5 Conclusion

In this work, we introduce DESP, a novel framework for bidirectional search as applied to computeraided synthesis planning. DESP biases searches towards user-specified starting materials with a combination of a learned synthetic distance network and bottom-up generation of part of the synthetic route. This represents a task that aligns with a common use case in complex molecule synthesis planning. We demonstrate the efficiency of DESP on the USPTO-190 dataset and two new test sets derived from the Pistachio database. When compared to existing methods, both variants of DESP reduce the number of expansions required to find solutions that satisfy the specified goal, while DESP-F2E also finds more routes with fewer reactions per route. We anticipate that with future improvements to the synthetic distance network and bottom-up synthesis planning, bidirectional synthesis planning can emerge as an effective and efficient solution to navigating constraints and aligning computer-aided synthesis planning with the goals of domain experts. Additional outlook is provided in Section A.8.

Code and Data Availablity

Relevant code with documentation can be found at https://github.com/coleygroup/desp.

Acknowledgments and Disclosure of Funding

This research was supported by the Machine Learning for Pharmaceutical Discovery and Synthesis consortium. J.R. acknowledges funding support from the NSF Center for Computer Assisted Synthesis (C-CAS) under Grant CHE-2202693. W.G. is supported by the Google Ph.D. Fellowship and Office of Naval Research under grant number N00014-21-1-2195. We thank Prof. Sarah Reisman and Prof. Richmond Sarpong for discussions during the ideation of the project. We thank Dr. Roger Sayle and Next Move Software for granting us permission to release the Pistachio-derived benchmark sets. We thank Prof. Yunan Luo for providing computational resources used in an earlier prototype of DESP. We thank Dr. Babak Mahjour for discussions around the chemical feasibility of proposed routes.

[1] E. J. Corey and Xue-Min Cheng. Logic of chemical synthesis. Wiley, New York, new ed. edition, 1995. ISBN 978-0-471-50979-0 978-0-471-11594-6.

[2] David A. Pensak and E. J. Corey. LHASA Logic and Heuristics Applied to Synthetic Analysis. In Computer-Assisted Organic Synthesis, volume 61 of ACS Symposium Series, pages 1 32. American Chemical Society, June 1977. ISBN 978-0-8412-0394-5. doi: 10.1021/bk-1977-0061. ch001. URL https://doi.org/10.1021/bk-1977-0061.ch001. Section: 1.

[3] Marwin H. S. Segler, Mike Preuss, and Mark P. Waller. Planning chemical syntheses with deep neural networks and symbolic AI. Nature, 555(7698):604 610, March 2018. ISSN 1476-4687. doi: 10.1038/nature25978. URL https://www.nature.com/articles/nature25978. Publisher: Nature Publishing Group.

[4] Akihiro Kishimoto, Beat Buesser, Bei Chen, and Adi Botea. Depth-First Proof-Number Search with Heuristic Edge Cost and Application to Chemical Synthesis Planning. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/hash/ 4fc28b7093b135c21c7183ac07e928a6-Abstract.html.

[5] Philippe Schwaller, Riccardo Petraglia, Valerio Zullo, Vishnu H. Nair, Rico Andreas Haeuselmann, Riccardo Pisoni, Costas Bekas, Anna Iuliano, and Teodoro Laino. Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy. Chemical Science, 11(12):3316 3325, March 2020. ISSN 2041-6539. doi: 10.1039/

C9SC05704H. URL https://pubs.rsc.org/en/content/articlelanding/2020/sc/ c9sc05704h. Publisher: The Royal Society of Chemistry.

[6] Binghong Chen, Chengtao Li, Hanjun Dai, and Le Song. Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search, June 2020. URL http://arxiv.org/abs/2006. 15820. ar Xiv:2006.15820 [cs, stat].

[7] Iwao Ojima, Ivan Habus, Mangzhu Zhao, Martine Zucco, Young Hoon Park, Chung Ming Sun, and Thierry Brigaud. New and efficient approaches to the semisynthesis of taxol and its C-13 side chain analogs by means of β-lactam synthon method. Tetrahedron, 48(34): 6985 7012, January 1992. ISSN 0040-4020. doi: 10.1016/S0040-4020(01)91210-4. URL https://www.sciencedirect.com/science/article/pii/S0040402001912104.

[8] Zachary G. Brill, Matthew L. Condakes, Chi P. Ting, and Thomas J. Maimone. Navigating the Chiral Pool in the Total Synthesis of Complex Terpene Natural Products. Chemical Reviews, 117(18):11753 11795, September 2017. ISSN 0009-2665. doi: 10.1021/acs.chemrev.6b00834. URL https://doi.org/10.1021/acs.chemrev.6b00834. Publisher: American Chemical Society.

[9] Agnieszka Wołos, Dominik Koszelewski, Rafał Roszak, Sara Szymku c, Martyna Moskal, Ryszard Ostaszewski, Brenden T. Herrera, Josef M. Maier, Gordon Brezicki, Jonathon Samuel, Justin A. M. Lummiss, D. Tyler Mc Quade, Luke Rogers, and Bartosz A. Grzybowski. Computerdesigned repurposing of chemical wastes into drugs. Nature, 604(7907):668 676, April 2022. ISSN 1476-4687. doi: 10.1038/s41586-022-04503-9. URL https://www.nature.com/ articles/s41586-022-04503-9. Publisher: Nature Publishing Group.

[10] Lauren M. Lopez, Quan Zhang, Orion Dollar, Jim Pfaendtner, Brent H. Shanks, and Linda J. Broadbelt. Application of automated network generation for retrosynthetic planning of potential corrosion inhibitors. Molecular Systems Design & Engineering, 9(4):352 371, 2024. doi: 10.1039/D3ME00162H. URL https://pubs.rsc.org/en/content/articlelanding/ 2024/me/d3me00162h. Publisher: Royal Society of Chemistry.

[11] Anna Z adło Dobrowolska, Karol Molga, Olga O. Kolodiazhna, Sara Szymku c, Martyna Moskal, Rafał Roszak, and Bartosz A. Grzybowski. Computational synthesis design for controlled degradation and revalorization. Nature Synthesis, pages 1 12, April 2024. ISSN 27310582. doi: 10.1038/s44160-024-00497-6. URL https://www.nature.com/articles/ s44160-024-00497-6. Publisher: Nature Publishing Group.

[12] A. Peter Johnson, Chris Marshall, and Philip N. Judson. Starting material oriented retrosynthetic analysis in the LHASA program. 1. General description. Journal of Chemical Information and Computer Sciences, 32(5):411 417, September 1992. ISSN 0095-2338. doi: 10.1021/ci00009a003. URL https://doi.org/10.1021/ci00009a003. Publisher: American Chemical Society.

[13] Yemin Yu, Ying Wei, Kun Kuang, Zhengxing Huang, Huaxiu Yao, and Fei Wu. GRASP: Navigating Retrosynthetic Planning with Goal-driven Policy. Advances in Neural Information Processing Systems, 35:10257 10268, December 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/hash/ 42beaab8aa8da1c77581609a61eced93-Abstract-Conference.html.

[14] Pistachio, January 2024. URL https://www.nextmovesoftware.com/pistachio.html.

[15] Sara Szymku c, Ewa P. Gajewska, Tomasz Klucznik, Karol Molga, Piotr Dittwald, Michał Startek, Michał Bajczyk, and Bartosz A. Grzybowski. Computer-Assisted Synthetic Planning: The End of the Beginning. Angewandte Chemie International Edition, 55(20):5904 5937, 2016. ISSN 1521-3773. doi: 10.1002/anie.201506101. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/anie.201506101. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/anie.201506101.

[16] Connor W. Coley, Luke Rogers, William H. Green, and Klavs F. Jensen. Computer Assisted Retrosynthesis Based on Molecular Similarity. ACS Central Science, 3(12):1237 1245, December 2017. ISSN 2374-7943. doi: 10.1021/acscentsci.7b00355. URL https: //doi.org/10.1021/acscentsci.7b00355. Publisher: American Chemical Society.

[17] Hanjun Dai, Chengtao Li, Connor Coley, Bo Dai, and Le Song. Retrosynthesis Prediction with Conditional Graph Logic Network. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_ files/paper/2019/hash/0d2b2061826a5df3221116a5085a6052-Abstract.html.

[18] Shuan Chen and Yousung Jung. Deep Retrosynthetic Reaction Prediction using Local Reactivity and Global Attention. JACS Au, 1(10):1612 1620, October 2021. doi: 10.1021/jacsau.1c00246. URL https://doi.org/10.1021/jacsau.1c00246. Publisher: American Chemical Society.

[19] Marwin H. S. Segler and Mark P. Waller. Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction. Chemistry A European Journal, 23 (25):5966 5971, 2017. ISSN 1521-3765. doi: 10.1002/chem.201605499. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/chem.201605499. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/chem.201605499.

[20] Bowen Liu, Bharath Ramsundar, Prasad Kawthekar, Jade Shi, Joseph Gomes, Quang Luu Nguyen, Stephen Ho, Jack Sloane, Paul Wender, and Vijay Pande. Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models. ACS Central Science, 3 (10):1103 1113, October 2017. ISSN 2374-7943. doi: 10.1021/acscentsci.7b00303. URL https://doi.org/10.1021/acscentsci.7b00303. Publisher: American Chemical Society.

[21] Philippe Schwaller, Teodoro Laino, Théophile Gaudin, Peter Bolgar, Christopher A. Hunter, Costas Bekas, and Alpha A. Lee. Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction. ACS Central Science, 5(9):1572 1583, September 2019. ISSN 2374-7943. doi: 10.1021/acscentsci.9b00576. URL https://doi.org/10.1021/ acscentsci.9b00576. Publisher: American Chemical Society.

[22] Mikołaj Sacha, Mikołaj Bła z, Piotr Byrski, Paweł D abrowski-Tuma nski, Mikołaj Chromi nski, Rafał Loska, Paweł Włodarczyk-Pruszy nski, and Stanisław Jastrz ebski. Molecule Edit Graph Attention Network: Modeling Chemical Reactions as Sequences of Graph Edits, May 2021. URL http://arxiv.org/abs/2006.15426. ar Xiv:2006.15426 [physics, stat].

[23] Abraham Heifets and Igor Jurisica. Construction of New Medicines via Game Proof Search. Proceedings of the AAAI Conference on Artificial Intelligence, 26(1):1564 1570, 2012. ISSN 23743468. doi: 10.1609/aaai.v26i1.8331. URL https://ojs.aaai.org/index.php/AAAI/ article/view/8331. Number: 1.

[24] Austin Tripp, Krzysztof Maziarz, Sarah Lewis, Marwin Segler, and José Miguel Hernández Lobato. Retro-fallback: retrosynthetic planning in an uncertain world, April 2024. URL http://arxiv.org/abs/2310.09270. ar Xiv:2310.09270 [cs].

[25] John S. Schreck, Connor W. Coley, and Kyle J. M. Bishop. Learning Retrosynthetic Planning through Simulated Experience. ACS Central Science, 5(6):970 981, June 2019. ISSN 23747943. doi: 10.1021/acscentsci.9b00055. URL https://doi.org/10.1021/acscentsci. 9b00055. Publisher: American Chemical Society.

[26] Siqi Hong, Hankz Hankui Zhuo, Kebing Jin, Guang Shao, and Zhanwen Zhou. Retrosynthetic planning with experience-guided Monte Carlo tree search. Communications Chemistry, 6(1): 1 14, June 2023. ISSN 2399-3669. doi: 10.1038/s42004-023-00911-8. URL https://www. nature.com/articles/s42004-023-00911-8. Publisher: Nature Publishing Group.

[27] Guoqing Liu, Di Xue, Shufang Xie, Yingce Xia, Austin Tripp, Krzysztof Maziarz, Marwin Segler, Tao Qin, Zongzhang Zhang, and Tie-Yan Liu. Retrosynthetic Planning with Dual Value Networks, March 2024. URL http://arxiv.org/abs/2301.13755. ar Xiv:2301.13755 [cs].

[28] Junsu Kim, Sungsoo Ahn, Hankook Lee, and Jinwoo Shin. Self-Improved Retrosynthetic Planning, June 2021. URL http://arxiv.org/abs/2106.04880. ar Xiv:2106.04880 [cs, q-bio] version: 1.

[29] Songtao Liu, Zhengkai Tu, Minkai Xu, Zuobai Zhang, Lu Lin, Rex Ying, Jian Tang, Peilin Zhao, and Dinghao Wu. Fusion Retro: molecule representation fusion via in-context learning for retrosynthetic planning. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of ICML 23, pages 22028 22041, <conf-loc>, <city>Honolulu</city>, <state>Hawaii</state>, <country>USA</country>, </conf-loc>, July 2023. JMLR.org.

[30] Shufang Xie, Rui Yan, Peng Han, Yingce Xia, Lijun Wu, Chenjuan Guo, Bin Yang, and Tao Qin. Retro Graph: Retrosynthetic Planning with Graph Search. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2120 2129, August 2022. doi: 10.1145/3534678.3539446. URL http://arxiv.org/abs/2206.11477. ar Xiv:2206.11477 [cs].

[31] Dengwei Zhao, Shikui Tu, and Lei Xu. Efficient retrosynthetic planning with MCTS exploration enhanced A* search. Communications Chemistry, 7(1):1 12, March 2024. ISSN 23993669. doi: 10.1038/s42004-024-01133-2. URL https://www.nature.com/articles/ s42004-024-01133-2. Publisher: Nature Publishing Group.

[32] Jiasheng Guo, Chenning Yu, Kenan Li, Yijian Zhang, Guoqiang Wang, Shuhua Li, and Hao Dong. Retrosynthesis Zero: Self-Improving Global Synthesis Planning Using Reinforcement Learning. Journal of Chemical Theory and Computation, May 2024. ISSN 1549-9618. doi: 10. 1021/acs.jctc.4c00071. URL https://doi.org/10.1021/acs.jctc.4c00071. Publisher: American Chemical Society.

[33] Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob Mc Grew, Josh Tobin, Pieter Abbeel, and Wojciech Zaremba. Hindsight Experience Replay, February 2018. URL http://arxiv.org/abs/1707.01495. ar Xiv:1707.01495 [cs].

[34] H Maarten Vinkers, Marc R de Jonge, Frederik FD Daeyaert, Jan Heeres, Lucien MH Koymans, Joop H van Lenthe, Paul J Lewi, Henk Timmerman, Koen Van Aken, and Paul AJ Janssen. Synopsis: synthesize and optimize system in silico. Journal of medicinal chemistry, 46(13): 2765 2773, 2003.

[35] Markus Hartenfeller, Heiko Zettl, Miriam Walter, Matthias Rupp, Felix Reisen, Ewgenij Proschak, Sascha Weggen, Holger Stark, and Gisbert Schneider. Dogs: reaction-driven de novo design of bioactive compounds. PLo S computational biology, 8(2):e1002380, 2012.

[36] Alexander Button, Daniel Merk, Jan A. Hiss, and Gisbert Schneider. Automated de novo molecular design by hybrid machine intelligence and rule-driven chemical synthesis. Nature Machine Intelligence, 1(7):307 315, July 2019. ISSN 2522-5839. doi: 10.1038/s42256-019-0067-7. URL https://www.nature.com/articles/s42256-019-0067-7. Publisher: Nature Publishing Group.

[37] Ksenia Korovina, Sailun Xu, Kirthevasan Kandasamy, Willie Neiswanger, Barnabas Poczos, Jeff Schneider, and Eric P. Xing. Chem BO: Bayesian Optimization of Small Organic Molecules with Synthesizable Recommendations, October 2019. URL http://arxiv.org/abs/1908. 01425. ar Xiv:1908.01425 [physics, stat].

[38] Sai Krishna Gottipati, Boris Sattarov, Sufeng Niu, Yashaswi Pathak, Haoran Wei, Shengchao Liu, Karam M. J. Thomas, Simon Blackburn, Connor W. Coley, Jian Tang, Sarath Chandar, and Yoshua Bengio. Learning To Navigate The Synthetically Accessible Chemical Space Using Reinforcement Learning, May 2020. URL http://arxiv.org/abs/2004.12485. ar Xiv:2004.12485 [cs].

[39] Kyle Swanson, Gary Liu, Denise B Catacutan, Autumn Arnold, James Zou, and Jonathan M Stokes. Generative ai for designing and validating easily synthesizable and structurally novel antibiotics. Nature Machine Intelligence, 6(3):338 353, 2024.

[40] John Bradshaw, Brooks Paige, Matt J. Kusner, Marwin H. S. Segler, and José Miguel Hernández Lobato. A Model to Search for Synthesizable Molecules, December 2019. URL http: //arxiv.org/abs/1906.05221. ar Xiv:1906.05221 [physics, stat].

[41] John Bradshaw, Brooks Paige, Matt J. Kusner, Marwin H. S. Segler, and José Miguel Hernández Lobato. Barking up the right tree: an approach to search over molecule synthesis DAGs. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS 20, pages 6852 6866, Red Hook, NY, USA, December 2020. Curran Associates Inc. ISBN 978-1-71382-954-6.

[42] Wenhao Gao, Rocío Mercado, and Connor W. Coley. Amortized Tree Generation for Bottom-up Synthesis Planning and Synthesizable Molecular Design, March 2022. URL http://arxiv. org/abs/2110.06389. ar Xiv:2110.06389 [cs, q-bio].

[43] Miruna Cretu, Charles Harris, Julien Roy, Emmanuel Bengio, and Pietro Liò. Syn Flow Net: Towards Molecule Design with Guaranteed Synthesis Pathways, May 2024. URL http: //arxiv.org/abs/2405.01155. ar Xiv:2405.01155 [cs, q-bio].

[44] Shitong Luo, Wenhao Gao, Zuofan Wu, Jian Peng, Connor W. Coley, and Jianzhu Ma. Projecting Molecules into Synthesizable Chemical Spaces, June 2024. URL http://arxiv.org/abs/ 2406.04628. ar Xiv:2406.04628 [cs, q-bio].

[45] Ira Pohl. BI-DIRECTIONAL AND HEURISTIC SEARCH IN PATH PROBLEMS. 1969.

[46] Ping He, Zhixian Xu, Xiaoqing Long, Kang Hou, and Yu Xiang. Path Planning of Mobile Robot Based on Improved A-Star Bidirectional Search Algorithm. In 2023 IEEE International Conference on Unmanned Systems (ICUS), pages 1517 1522, October 2023. doi: 10.1109/ICUS58632. 2023.10318429. URL https://ieeexplore.ieee.org/abstract/document/10318429. ISSN: 2771-7372.

[47] Chenming Li, Han Ma, Jiankun Wang, and Max Q.-H. Meng. Bidirectional Search Strategy for Incremental Search-based Path Planning. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 7311 7317, October 2023. doi: 10.1109/IROS55552. 2023.10342039. URL https://ieeexplore.ieee.org/abstract/document/10342039. ISSN: 2153-0866.

[48] Simon Alford, Anshula Gandhi, Akshay Rangamani, Andrzej Banburski, Tony Wang, Sylee Dandekar, John Chin, Tomaso Poggio, and Peter Chin. Neural-Guided, Bidirectional Program Search for Abstraction and Reasoning. In Rosa Maria Benito, Chantal Cherifi, Hocine Cherifi, Esteban Moro, Luis M. Rocha, and Marta Sales-Pardo, editors, Complex Networks & Their Applications X, pages 657 668, Cham, 2022. Springer International Publishing. ISBN 978-3030-93409-5. doi: 10.1007/978-3-030-93409-5_54.

[49] A. Lakshna, S. Gokila, and K. Ramesh. Shortest Route * Bidirectional Search Algorithm for Predicting Shortest Routes in Traffic. In 2023 4th International Conference on Smart Electronics and Communication (ICOSEC), pages 533 539, September 2023. doi: 10.1109/ICOSEC58147. 2023.10275995. URL https://ieeexplore.ieee.org/abstract/document/10275995.

[50] Robert Holte, Ariel Felner, Guni Sharon, and Nathan Sturtevant. Bidirectional Search That Is Guaranteed to Meet in the Middle. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1), March 2016. ISSN 2374-3468. doi: 10.1609/aaai.v30i1.10436. URL https://ojs.aaai.org/index.php/AAAI/article/view/10436. Number: 1.

[51] Lenie Sint and Dennis De Champeaux. An Improved Bidirectional Heuristic Search Algorithm. Journal of the ACM, 24(2):177 191, April 1977. ISSN 0004-5411, 1557-735X. doi: 10.1145/ 322003.322004. URL https://dl.acm.org/doi/10.1145/322003.322004.

[52] H. Kaindl and G. Kainz. Bidirectional Heuristic Search Reconsidered. Journal of Artificial Intelligence Research, 7:283 317, December 1997. ISSN 1076-9757. doi: 10.1613/jair.460. URL https://www.jair.org/index.php/jair/article/view/10199.

[53] Daniel Mark Lowe. Extraction of chemical structures and reactions from the literature. October 2012. URL http://www.dspace.cam.ac.uk/handle/1810/244727.

[54] Daniel Lowe. Chemical reactions from US patents (1976-sep2016). URL https://figshare. com/articles/Chemical_reactions_from_US_patents_1976-Sep2016_/5104873.

[55] Connor W. Coley, William H. Green, and Klavs F. Jensen. RDChiral: An RDKit Wrapper for Handling Stereochemistry in Retrosynthetic Template Extraction and Application. Journal of Chemical Information and Modeling, 59(6):2529 2537, June 2019. ISSN 1549-9596. doi: 10.1021/acs.jcim.9b00286. URL https://doi.org/10.1021/acs.jcim.9b00286. Publisher: American Chemical Society.

[56] Hyeongwoo Kim, Kyunghoon Lee, Chansu Kim, Jaechang Lim, and Woo Youn Kim. DFRscore: Deep Learning-Based Scoring of Synthetic Complexity with Drug-Focused Retrosynthetic Analysis for High-Throughput Virtual Screening. Journal of Chemical Information and Modeling, 64(7):2432 2444, April 2024. ISSN 1549-9596. doi: 10.1021/acs.jcim.3c01134. URL https://doi.org/10.1021/acs.jcim.3c01134. Publisher: American Chemical Society.

[57] Krzysztof Maziarz, Austin Tripp, Guoqing Liu, Megan Stanley, Shufang Xie, Piotr Gai nski, Philipp Seidl, and Marwin Segler. Re-evaluating Retrosynthesis Algorithms with Syntheseus, February 2024. URL http://arxiv.org/abs/2310.19796. ar Xiv:2310.19796 [cs, q-bio].

[58] Samuel Genheden and Esben Bjerrum. Pa Routes: towards a framework for benchmarking retrosynthesis route predictions. Digital Discovery, 1(4):527 539, August 2022. ISSN 2635-098X. doi: 10.1039/D2DD00015F. URL https://pubs.rsc.org/en/content/ articlelanding/2022/dd/d2dd00015f. Publisher: RSC.

[59] Paula Torren-Peraire, Alan Kai Hassen, Samuel Genheden, Jonas Verhoeven, Djork-Arne Clevert, Mike Preuss, and Igor Tetko. Models Matter: The Impact of Single-Step Retrosynthesis on Synthesis Planning. Digital Discovery, 3(3):558 572, 2024. ISSN 2635-098X. doi: 10.1039/D3DD00252G. URL http://arxiv.org/abs/2308.05522. ar Xiv:2308.05522 [physics, q-bio].

[60] Connor W. Coley, Luke Rogers, William H. Green, and Klavs F. Jensen. SCScore: Synthetic Complexity Learned from a Reaction Corpus. Journal of Chemical Information and Modeling, 58(2):252 261, February 2018. ISSN 1549-9596. doi: 10.1021/acs.jcim.7b00622. URL https://doi.org/10.1021/acs.jcim.7b00622. Publisher: American Chemical Society.

[61] Peter Ertl and Ansgar Schuffenhauer. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of Cheminformatics, 1(1):8, June 2009. ISSN 1758-2946. doi: 10.1186/1758-2946-1-8. URL https://doi.org/10.1186/1758-2946-1-8.

[62] David Weininger. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences, 28 (1):31 36, Feb 1988. doi: 10.1021/ci00057a005.

[63] Gregory Landrum. RDKit. URL https://www.rdkit.org/.

[64] Raymond E. Carhart, Dennis H. Smith, and R. Venkataraghavan. Atom pairs as molecular features in structure-activity studies: definition and applications. Journal of Chemical Information and Computer Sciences, 25(2):64 73, May 1985. ISSN 0095-2338. doi: 10.1021/ci00046a002. URL https://doi.org/10.1021/ci00046a002. Publisher: American Chemical Society.

[65] Connor W. Coley, Dale A. Thomas, Justin A. M. Lummiss, Jonathan N. Jaworski, Christopher P. Breen, Victor Schultz, Travis Hart, Joshua S. Fishman, Luke Rogers, Hanyu Gao, Robert W. Hicklin, Pieter P. Plehiers, Joshua Byington, John S. Piotti, William H. Green, A. John Hart, Timothy F. Jamison, and Klavs F. Jensen. A robotic platform for flow synthesis of organic compounds informed by AI planning. Science, 365(6453):eaax1566, August 2019. doi: 10.1126/science.aax1566. URL https://www.science.org/doi/10.1126/science. aax1566. Publisher: American Association for the Advancement of Science.

[66] Michał Zawalski, Michał Tyrolski, Konrad Czechowski, Tomasz Odrzygó zd z, Damian Stachura, Piotr Pi ekos, Yuhuai Wu, Łukasz Kuci nski, and Piotr Miło s. Fast and Precise: Adjusting Planning Horizon with Adaptive Subgoal Search, May 2024. URL http://arxiv.org/abs/ 2206.00702. ar Xiv:2206.00702.

[67] José Cambronero, Sumit Gulwani, Vu Le, Daniel Perelman, Arjun Radhakrishna, Clint Simon, and Ashish Tiwari. Flash Fill++: Scaling Programming by Example by Cutting to the Chase. Proceedings of the ACM on Programming Languages, 7(POPL):952 981, January 2023. ISSN 2475-1421. doi: 10.1145/3571226. URL https://dl.acm.org/doi/10.1145/3571226.

A Appendix / supplemental material

A.1 Summary of notation

Symbol Note

M set of all valid molecules R set of all valid reactions T set of all valid retro templates T set of all valid single-product fwd. templates B set of building blocks, where B M G the AND-OR graph constructed from all possible reaction tuples R R single-product reaction t retro reaction template t fwd. reaction template p target molecule r starting material goal S valid synthetic route c reaction cost function γ(m) given m, opposing molecule to condition selection or expansion on GR top-down search graph GF bottom-up search graph FR frontier molecule nodes in GR FF frontier molecule nodes in GF Vm (retro) estimated minimum cost of synthesizing m Vm (fwd.) estimated value of D(m, γ(m)) rn(m|G) reaction number," estimated cost of synthesizing m given search graph G Vt(m|G) estimated cost of synthesizing p using m given search graph G D synthetic distance (network) Dm estimated value of D(γ(m), m) dn(m|G) distance numbers," multiset of descendent Dm Vm values for m in G Dt(m|G) multiset of related Dm Vm values for m in G ft forward template predictor model fb building block predictor model L loss function Dmax maximum value of D considered in L λ # retro expansions to perform before one fwd. expansion

A.2 Retro* algorithm details

Retro* defines the following quantities:

1. Vm. For a molecule m, Vm is an unconditional estimate of the minimum cost required to synthesize m. It is estimated by a neural network.

2. rn(m|G). For a molecule m, given search graph G, the reaction number" rn(m|G) represents the estimated minimum cost of synthesizing m.

3. Vt(m|G). For a molecule m, given search graph G with goal p , Vt(m|G) represents the estimated minimum cost of synthesizing p using m.

Retro* also cycles between selection, expansion, and update phases. We implement Retro* as follows.

Selection The molecule in the set of frontier nodes F that minimizes the expected cost of synthesizing p given the current search graph G is selected:

mselect = arg min m F Vt(m|G) (10)

Expansion As in Alg. 2, a one-step retrosynthesis model is called on the selected node and the resulting reactions and precursors are added to G. Each molecule node is then initialized with rn(m|G) Vm.

Update First, reaction number values are uppropagated to ancestor nodes. For reaction node R, the reaction number is updated as the sum of its childrens reaction numbers.

rn(R|G) c(R) + X

m ch(R) rn(m|G) (11)

For molecule node m, the reaction number becomes the minimum reaction number among its children. rn(m|G) min R ch(m) rn(R|G) (12)

Next, Vt(m|p ) values are downpropagated to descendent nodes. Starting from p itself,

Vt(p |G) rn(p |G) (13)

For subsequent reaction nodes R, the value is updated

Vt(R|G) rn(R|G) rn(pr(R)|G) + Vt(pr(R)|G) (14)

Finally, for molecule node m that is not p ,

Vt(m|G) min R pr(m) rn(R|G) (15)

A.3 Reaction pre-processing

Reactions in the USPTO-Full dataset are represented with simplified molecular-input line-entry system (SMILES) [62] strings, where the SMILES string of reactants, reagents, and products are separated by > as REACTANTS>REAGENTS>PRODUCTS. Each field can have one or more chemical species delineated with a dot (.) or be left blank in the case of reagents.

For processing reaction SMILES, multi-product reaction SMILES are first separated into singleproduct reaction SMILES by creating separate entries for each product species with the same reactants and reagents. Each single-product reaction SMILES then undergoes the following process:

1. Reagents in the SMILES string are moved to the reactant side. 2. Chemical species with identical atom mapped SMILES in both reactants and products are moved to reagents. 3. Any products that do not contain at least one mapped atom or have fewer than 5 heavy atoms are removed. 4. Any atom mapping numbers that exist exclusively on either the reactant side or product side are removed. 5. Any reactants without atom mapping are moved to the reagent side.

Resulting reaction SMILES without either reactants or products are then filtered out.

A.4 Model training details

Dataset construction To learn ft and fb, a full enumeration of all pathways (until reaching nodes in B) rooted at p is performed for each molecule node p in GFWD. For learning ft, each reaction node Ri = (ri, pi, ti) is then used as a training example for each reactant mj ri with input zn1 m (mj) zn1 m (p ) and one-hot encoded label ti. Likewise, for learning fb, each reaction node Ri = ({m1, m2}, pi, ti) that is bimolecular and involves at least one building block yields a training example with input zn1 m (m1) zn1 m (p ) zn1 t (ti) and output zn2 m (m2) if m2 B and with input zn1 m (m2) zn1 m (p ) zn1 t (ti) and output zn2 m (m1) if m1 B. The procedure for such training example generation is illustrated in Fig. 5. With n1 = 2048, n2 = 256, we use the RDKit implementation of the Morgan Fingerprint [63] with radius 2 for zm and the Atom Pair fingerprint [64] for zt.

Because D is used to bias both the top-down and bottom-up searches, we perform the same pathway enumeration for all molecule nodes p / B, p GUSPTO. In this case, however, we only consider p for which we find valid synthetic routes. Training examples are then extracted for all other molecule nodes mi in a solved search graph, with input zn m(mi) zn m(p ) and label Vp (mi|GR) rn(mi|GR), with n = 512. For calculating this label, we propagate the Retro* functions as described in Section

A.2 such that D can be calculated as the minimum cost of synthesizing p subtracted by the minimum cost of synthesizing mi. Here, we set c(Ri) = 1 for all Ri, as a synthetic route s number of steps is an important metric in evaluating the route cost, and it is otherwise difficult to objectively quantify the cost of a reaction. This training example generation is also depicted in Fig. 5. Finally, to obtain additional training examples, we also recover pairs of (m, p ) where p was not solved" by the enumerative search but would have been solved if m B.

For validation of the ft and fb models, we construct the graph Gval corresponding to all reactions across both the training and validation splits. We perform the same pathway enumeration described above, and each training example" that corresponds to a reaction not in the original training split is used as a validation example. For validation of the D model, we construct Gval from RUSPTO and perform the pathway enumeration only on p / GUSPTO to obtain validation examples.

a. Enumerate full search tree across reaction network

Non-buyable

Building block

b. Extract distance and forward reaction data from subgraph

Figure 5: Illustration of data extraction procedure for offline training of ft, fb, and D. (a) For each target, the full search graph is enumerated by recursively tracing outgoing edges and propagating Retro* quantities. (b) For each bimolecular reaction with at least one buyable reactant, training examples for ft and fb are labeled. For each molecule node m other than the target, D(m, p ) is computed.

Figure 6: Predicted vs. actual values of synthetic distance on the validation examples. Actual values above 9 are set to 10.

Model hyperparameters All models are MLPs trained with the Adam optimizer, early stopping (patience 2), and decayed learning rate on plateau with factor 0.3 and patience 1 on a single NVIDIA RTX 4090. The following table summarizes the hyperparameters and details of each model used in experiments.

Model Batch Size Dropout Activation # Hidden Layers Hidden Units Learning Rate Input Dim. Output Dim.

Retro Template 2048 0.5 Sigmoid 2 2048 0.002758 2048 270794 Fwd Template 4096 0.4 Si LU 2 1024 0.005113 4096 196339 BB Model 4096 0.3 Re LU 3 2048 0.001551 6144 256 Retro* Vm 4096 0.2 Si LU 4 128 0.0025 2048 1 Synthetic Dist. D 4096 0.3 Sigmoid 4 256 0.00489 1024 1

Hyperparameters were selected based on best performance on the validation split while performing a Bayesian search through the following parameter space:

1. Dropout: [0.1, 0.2, 0.3, 0.4, 0.5]

2. Activation: [Re LU, Si LU, Sigmoid, Tanh]

3. Hidden layers: [2, 3, 4]

4. Hidden units: [1024, 2048] for retro, forward, and BB. [128, 256] for D and Vm

5. Learning rate: [0.00001 - 0.01]

The template relevance module from the open-source ASKCOS codebase was used to train the one-step retro model.1

A.5 DESP additional details

rn = 8 + 2 + 1 = 11 dn = [1 - 8, 4 - 2] = [-7, 2]

Vm = 2 Dm = 4 Vm = 4 Dm = 4

rn = min([5, 11]) = 5 dn = dn(arg minr([5, 11])) = [0]

rn = 4 + 1 = 5 dn = [4 - 4] = [0]

Vm = 8 Dm = 1

Vt = 11 Dt = [-7, 2]

Vt = 11 Dt = [-7, 2] Vt = 5 Dt = [0]

Vt = 5 Dt = [0]

Vt = 5 Dt = [0]

Vt = 11 Dt = [-7, 2]

Vt + min(Dt) = 4 Vt + min(Dt) = 5

a. Uppropagation b. Downpropagation

Figure 7: Simple visual example of DESP update procedure for guiding top-down search with synthetic distance, where each reaction has cost 1. In the unguided Retro* algorithm, the left-most frontier node would be chosen for expansion, as a route through that node minimizes Vt(m|G), the expected cost to the target from building blocks (5 reaction steps). In DESP, either of the other two nodes would be prioritized, as the middle frontier node is predicted to be only 1 reaction step away from the starting material, resulting in only Vt(m|G) + min Dt(m|G) = 4 predicted reaction steps total in the final solution.

Algorithm 2: RETRO_EXPAND(m, GR, N) m: expanded molecule node, GR: top search graph, N: num. templates to propose t TOP_N(σ(MLPR(zm(m))))) ; /* Get top N templates from retro model */ for i 1 to N do

r t[i](m) ; /* Apply retro template to m */ GR.ADD_RXN(r, m, t[i]); /* Add reaction and precursors to GR */ end

1Template relevance module can be found at https://gitlab.com/mlpds_mit/askcosv2/retro/ template_relevance.

Algorithm 3: DESP(p , r , N1, N2, K, L, λ, s) p : target, r : starting material goal, N1: num. retro templates to propose, N2: num. forward templates to propose, K: num. building blocks to search, L: max num. expansions, λ: num. times to expand top before expanding bottom, s: strategy (F2E or F2F) GR {p } ; /* Initialize search graphs */ GF {r }; l 0; while not solved OR l L do

for i 1 to λ do

m arg minm FR [Vt(m|GR) + min(Dt(m|GR))] ; /* Select best top */ RETRO_EXPAND(m, GR, N1) ; /* Expand with Alg.2 */ TOP_UPDATE(GR); /* Update GR with Section 3 rules */ if met bottom then

mmet.rn 0 ; /* Set expected cost of met node to 0 */ BOT_UPDATE(GF ); /* Retro* updates on GF */ end l l + 1; end m arg minm FF Vt(m|GF ) ; /* Select best bot */ if s = F2E then

FORWARD_EXPAND(m, p , GF , N2, K); /* Expand conditioned on p */ BOT_UPDATE(GF , s); /* Retro* updates GF */ else if s = F2F then

q arg minq GR D(m, q); /* Find closest node */ FORWARD_EXPAND(m, q, GF , N2, K); /* Expand conditioned on q */ BOT_UPDATE(GF , s); /* Retro* updates on GF */ if met top then

mmet.rn 0, mmet.dn [0] ; /* Set expected costs of met node to 0 */ TOP_UPDATE(GF ); /* Section 3 updates on GR */ end l l + 1; end

Design of new quantities and update rules We recall that the minimum total cost of synthesizing the target p from a molecule m under the Retro* framework is estimated as:

Vt(m|GR) = X

r Ar(m|GR) c(r) + X

m Vm(GR),pr(m ) Ar(m|GR) rn(m |GR) (16)

where A(m|GR) represents the set of reaction node ancestors of m and Vm(GR) represents the set of molecule nodes in the search graph. This is equivalent to

Vt(m|GR) = g(m|GR) + X

m N(m|GR) Vm (17)

where g(m|GR) aggregates the current cost from all reaction nodes in GR contributing to Vt(m|GR), and N(m|GR) FR accordingly represents the set of frontier top nodes for the subgraph of GR corresponding to nodes contributing to Vt(m|GR). If we add the constraint that one frontier node must implicitly be the ancestor of r , the estimate of the minimal cost then becomes:

V t (m|GR) = g(m|GR) + min mj N(m|GR)

mi N(m|GR),mi!=mj Vmi + D(r , mj)

= g(m|GR) + X

mi N(m|GR) Vmi + min mj N(m|GR)

D(r , mj) Vmj (19)

= Vt(m|GR) + min mj N(m|GR) Dmj (20)

Our update rules are implemented such that Dt(m|GR) = minmj N(m|GR) Dmj, thus justifying our design of the selection and update procedures. Note that this design relies on the assumption that N(m|GR) remains static upon adding the goal node constraint, when in reality the introduction of D may change the optimal set of frontier nodes to consider in the search graph. To avoid the combinatorial complexity of this situation and retain the efficiency from dynamic programming for our update policy, we maintain this assumption and find that introducing D in this way empirically works well (Section 4.2). A simple visual example of the update procedures is provided in Figure 7.

A.6 New benchmark set details

We follow the test set extraction procedure of Chen et al. [6], applied within patents of the Pistachio dataset [14] (version: 2023Q4) to obtain 1,004,092 valid synthetic routes. We randomly sample synthetic routes from this set until we obtained 150 routes that satisfied the following constraints: (1) No reactions in the route are found in the training data. (2) No reactions are shared between any routes within the test set. (3) All reactions are found in the top 50 proposals of our single-step retrosynthesis model. (4) No two targets in the test set have a Tanimoto similarity of more than 0.7. (5) We enforce a minimum number of routes for different route lengths (Fig. 8, Fig. 9). We term this set of 150 routes Pistachio Reachable. We perform the same procedure but modify condition (2) to require only 50% or more of the reactions to be reproducible (in-distribution) and obtain 100 routes which we term Pistachio Hard. Due to a bug in our implementation of criterion (2), a small number of routes share the same reaction in the final datasets, but the degree of inter-route reaction duplication is still significantly less than that of USPTO-190 for both benchmark sets (Table 1).

Figure 8: Distribution of reaction counts in Pistachio Reachable.

A.7 Additional experimental details

Implementation details For random search, all node selections were performed at random among frontier molecule nodes. For BFS, the molecule with the lowest depth was selected at each step, with precedence for nodes whose parent reactions had the highest plausibility scores from the retro one-step model. MCTS was run by integrating our one-step model into the open-source ASKCOS code base [65]. For Retro*, we removed the synthetic distance network and bottom-up expansions from our DESP implementation. Notably, reaction costs for Retro* and DESP are both calculated as log p, where p is the template plausibility from the one-step model (retro or forward). For GRASP, we used the authors search implementation [13]. For a fair comparison with the AND-OR graph structure, we did not increment the iteration counter when a molecule that had previously been expanded was expanded again. In training the GRASP value network, we use the authors reported hyperparameters where applicable and the default values in their code base otherwise.

Figure 9: Distribution of reaction counts in Pistachio Hard.

Table 4: Summary of hyperparameters used for evaluated algorithms.

Algorithm Hyperparameter Value

All Max # expansions 500 Max mol. depth (top) 11 Max mol. depth (bottom) 6 Max # retro templates 50

DESP / Bi-BFS Max # fwd. templates 25 Max # BBs retrieved 2 λ 2

MCTS Exploration weight 1.0

Table 5: Summary of components of each evaluated algorithm.

Algorithm Selection Policy Guidance Bidirectional?

Random Randomly selected frontier node None No BFS Lowest depth frontier node, ties broken by reaction cost None No MCTS Start from root and use UCB to select children None No until reaching frontier node Retro* Node minimizing Vt(m|G) BB No GRASP Same as MCTS BB or s.m. No Bi-BFS Same as BFS None Yes Retro* + D Node minimizing Vt(m|G) + min(Dt(m|G)) BB and s.m. No DESP Alternate between top-down and bottom-up, BB and s.m. and target Yes both using Retro* and D

Approximate nearest neighbors search In selecting building blocks for the forward expansion with k-NN search, the Python library Faiss is used. A 256-dimension Morgan fingerprint of each building block is stored in a vector database and compressed using product quantization for approximate nearest neighbor search at dramatically faster speeds and significantly lower memory overhead.

Compute All experiments were performed on a 32-core AMD Threadripper Pro 5975WX processor and with a single NVIDIA RTX 4090 GPU. Running experiments on all benchmark sets for a given

method took around 10 hours. DESP requires 3 GB of GPU memory to store the building block database for fast k-NN.

A.8 Limitations and Outlook

Convergent syntheses Convergent synthetic routes, in which multiple non-BBs are combined, are often desirable in chemistry due to their relative efficiency. The top-down search has no problems proposing convergent routes. However, the bottom-up searcher in DESP only performs forward expansions and thus cannot handle convergent routes by adding and merging new synthetic trees. Resultantly, the bottom-up search can only plan one branch if the final route requires convergent steps. Implementing additional modules of Syn Net [42] into the bottom-up search would enable planning of convergent synthetic routes and potentially further reduce the average number of reactions in solutions and improve solve rates.

GPU reliance and computational overhead DESP requires GPU acceleration to tractably perform a k-NN search over 23 million building blocks in the forward expansion policy. DESP-F2F also requires GPU inference to rapidly perform node comparisons at each iteration. In all, forward expansions take around 50% more time than retro expansions, though this is in part because our implementation of forward synthesis applies retro templates to each product proposed by the forward model to ensure template reversibility (i.e., to confirm that the increased success in finding routes during the bidirectional search is not an artifact of having access to different transformations), which creates additional overhead. Overall, we view these limitations primarily as engineering problems that do not take away from the empirical benefits demonstrated in the paper. In principle, one could also implement DESP-F2E as a parallel bidirectional search in pursuit of additional efficiency gains.

Building block specification Though DESP is designed to address starting material-constrained synthesis planning, we envision that future work could optimize bidirectional search to improve general retrosynthesis capabilities by conditioning on one or more starting materials instead of constraining the solution space. These starting materials could be expert-designed or predicted algorithmically as in Gao et al. [42].

DESP-F2F implementation In general, neither DESP-F2E or DESP-F2F guarantee that the costoptimal route is found upon termination. Additionally, our implementation of DESP-F2F does not take into account the total known cost of the opposing graph s nodes Vt(m |GF ) rn(m |GF ) when calculating dn(m|GR), and likewise the value of rn(m|GF ) does not take into account Vt(m |GR) rn(m |GR). As a result, the selection policy DESP-F2F selects nodes that minimize the lowest expected cost of reaching the opposing search graph, but does not select to minimize the lowest expected cost of the final route directly. This is likely a primary contributor to DESP-F2F finding longer routes on average than DESP-F2E. As the values of Vt(m|G) change after each expansion, it would be computationally expensive to re-compare nodes across the search graphs at each iteration. We have not devised an efficient means of handling the number of re-comparisons that would be required and leave such optimizations to future exploration.

Sub-goal and divide-and-conquer strategies Goal-oriented synthesis planning bring to mind potential methods that involve sub-goals or divide-and-conquer strategies that have been utilized in general purpose planning [66] or program synthesis [67]. In general, there are rich bodies of literature in fields like hierarchical planning and program synthesis that remain largely untapped in applications to computer-aided synthesis planning.

Neur IPS Paper Checklist

Question: Do the main claims made in the abstract and introduction accurately reflect the paper s contributions and scope? Answer: [Yes] Justification: The contributions are explicitly enumerated in Section 1 and are each backed by our methods (Section 3) and experimental results (Section 4). Guidelines:

The answer NA means that the abstract and introduction do not include the claims made in the paper. The abstract and/or introduction should clearly state the claims made, including the contributions made in the paper and important assumptions and limitations. A No or NA answer to this question will not be perceived well by the reviewers. The claims made should match theoretical and experimental results, and reflect how much the results can be expected to generalize to other settings. It is fine to include aspirational goals as motivation as long as it is clear that these goals are not attained by the paper. 2. Limitations

Question: Does the paper discuss the limitations of the work performed by the authors? Answer: [Yes] Justification: The limitations are explicitly outlined and discussed in Section A.8. Guidelines:

The answer NA means that the paper has no limitation while the answer No means that the paper has limitations, but those are not discussed in the paper. The authors are encouraged to create a separate "Limitations" section in their paper. The paper should point out any strong assumptions and how robust the results are to violations of these assumptions (e.g., independence assumptions, noiseless settings, model well-specification, asymptotic approximations only holding locally). The authors should reflect on how these assumptions might be violated in practice and what the implications would be. The authors should reflect on the scope of the claims made, e.g., if the approach was only tested on a few datasets or with a few runs. In general, empirical results often depend on implicit assumptions, which should be articulated. The authors should reflect on the factors that influence the performance of the approach. For example, a facial recognition algorithm may perform poorly when image resolution is low or images are taken in low lighting. Or a speech-to-text system might not be used reliably to provide closed captions for online lectures because it fails to handle technical jargon. The authors should discuss the computational efficiency of the proposed algorithms and how they scale with dataset size. If applicable, the authors should discuss possible limitations of their approach to address problems of privacy and fairness. While the authors might fear that complete honesty about limitations might be used by reviewers as grounds for rejection, a worse outcome might be that reviewers discover limitations that aren t acknowledged in the paper. The authors should use their best judgment and recognize that individual actions in favor of transparency play an important role in developing norms that preserve the integrity of the community. Reviewers will be specifically instructed to not penalize honesty concerning limitations. 3. Theory Assumptions and Proofs

Question: For each theoretical result, does the paper provide the full set of assumptions and a complete (and correct) proof? Answer: [NA]

Justification: This paper does not include theoretical results. Guidelines:

The answer NA means that the paper does not include theoretical results. All the theorems, formulas, and proofs in the paper should be numbered and crossreferenced. All assumptions should be clearly stated or referenced in the statement of any theorems. The proofs can either appear in the main paper or the supplemental material, but if they appear in the supplemental material, the authors are encouraged to provide a short proof sketch to provide intuition. Inversely, any informal proof provided in the core of the paper should be complemented by formal proofs provided in appendix or supplemental material. Theorems and Lemmas that the proof relies upon should be properly referenced. 4. Experimental Result Reproducibility

Question: Does the paper fully disclose all the information needed to reproduce the main experimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper (regardless of whether the code and data are provided or not)? Answer: [Yes] Justification: Experimental setup is provided in Section 4 and further details are provided in Section A.7. Source code is also included in the supplementary material. Guidelines:

The answer NA means that the paper does not include experiments. If the paper includes experiments, a No answer to this question will not be perceived well by the reviewers: Making the paper reproducible is important, regardless of whether the code and data are provided or not. If the contribution is a dataset and/or model, the authors should describe the steps taken to make their results reproducible or verifiable. Depending on the contribution, reproducibility can be accomplished in various ways. For example, if the contribution is a novel architecture, describing the architecture fully might suffice, or if the contribution is a specific model and empirical evaluation, it may be necessary to either make it possible for others to replicate the model with the same dataset, or provide access to the model. In general. releasing code and data is often one good way to accomplish this, but reproducibility can also be provided via detailed instructions for how to replicate the results, access to a hosted model (e.g., in the case of a large language model), releasing of a model checkpoint, or other means that are appropriate to the research performed. While Neur IPS does not require releasing code, the conference does require all submissions to provide some reasonable avenue for reproducibility, which may depend on the nature of the contribution. For example (a) If the contribution is primarily a new algorithm, the paper should make it clear how to reproduce that algorithm. (b) If the contribution is primarily a new model architecture, the paper should describe the architecture clearly and fully. (c) If the contribution is a new model (e.g., a large language model), then there should either be a way to access this model for reproducing the results or a way to reproduce the model (e.g., with an open-source dataset or instructions for how to construct the dataset). (d) We recognize that reproducibility may be tricky in some cases, in which case authors are welcome to describe the particular way they provide for reproducibility. In the case of closed-source models, it may be that access to the model is limited in some way (e.g., to registered users), but it should be possible for other researchers to have some path to reproducing or verifying the results. 5. Open access to data and code

Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material?

Answer: [Yes] Justification: We provide the full source code and new benchmarks for DESP in our submission and will release the exact model weights, processed data files, and data processing scripts upon publication. Trained model weights are too large to share through an anonymous Github repo during review. Guidelines:

The answer NA means that paper does not include experiments requiring code. Please see the Neur IPS code and data submission guidelines (https://nips.cc/ public/guides/Code Submission Policy) for more details. While we encourage the release of code and data, we understand that this might not be possible, so No is an acceptable answer. Papers cannot be rejected simply for not including code, unless this is central to the contribution (e.g., for a new open-source benchmark). The instructions should contain the exact command and environment needed to run to reproduce the results. See the Neur IPS code and data submission guidelines (https: //nips.cc/public/guides/Code Submission Policy) for more details. The authors should provide instructions on data access and preparation, including how to access the raw data, preprocessed data, intermediate data, and generated data, etc. The authors should provide scripts to reproduce all experimental results for the new proposed method and baselines. If only a subset of experiments are reproducible, they should state which ones are omitted from the script and why. At submission time, to preserve anonymity, the authors should release anonymized versions (if applicable). Providing as much information as possible in supplemental material (appended to the paper) is recommended, but including URLs to data and code is permitted. 6. Experimental Setting/Details

Question: Does the paper specify all the training and test details (e.g., data splits, hyperparameters, how they were chosen, type of optimizer, etc.) necessary to understand the results? Answer: [Yes] Justification: We provide dataset construction and model training details in Section A.4. Guidelines:

The answer NA means that the paper does not include experiments. The experimental setting should be presented in the core of the paper to a level of detail that is necessary to appreciate the results and make sense of them. The full details can be provided either with the code, in appendix, or as supplemental material. 7. Experiment Statistical Significance

Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments? Answer: [Yes] Justification: We provide standard deviations for the average number of reactions in the solved routes in Table 3. The general performance experiments for all methods (solve rate) are deterministic because they are statistics over the entire test set, following previous papers. Guidelines:

The answer NA means that the paper does not include experiments. The authors should answer "Yes" if the results are accompanied by error bars, confidence intervals, or statistical significance tests, at least for the experiments that support the main claims of the paper. The factors of variability that the error bars are capturing should be clearly stated (for example, train/test split, initialization, random drawing of some parameter, or overall run with given experimental conditions).

The method for calculating the error bars should be explained (closed form formula, call to a library function, bootstrap, etc.) The assumptions made should be given (e.g., Normally distributed errors). It should be clear whether the error bar is the standard deviation or the standard error of the mean. It is OK to report 1-sigma error bars, but one should state it. The authors should preferably report a 2-sigma error bar than state that they have a 96% CI, if the hypothesis of Normality of errors is not verified. For asymmetric distributions, the authors should be careful not to show in tables or figures symmetric error bars that would yield results that are out of range (e.g. negative error rates). If error bars are reported in tables or plots, The authors should explain in the text how they were calculated and reference the corresponding figures or tables in the text.

8. Experiments Compute Resources

Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments?

Answer: [Yes]

Justification: Compute details are described in Sections A.4 and A.7.

Guidelines:

The answer NA means that the paper does not include experiments. The paper should indicate the type of compute workers CPU or GPU, internal cluster, or cloud provider, including relevant memory and storage. The paper should provide the amount of compute required for each of the individual experimental runs as well as estimate the total compute. The paper should disclose whether the full research project required more compute than the experiments reported in the paper (e.g., preliminary or failed experiments that didn t make it into the paper).

9. Code Of Ethics

Question: Does the research conducted in the paper conform, in every respect, with the Neur IPS Code of Ethics https://neurips.cc/public/Ethics Guidelines?

Answer: [Yes]

Justification: This work does not involve human subjects or privacy concerns. We do not anticipate any particular negative impacts from this work.

Guidelines:

The answer NA means that the authors have not reviewed the Neur IPS Code of Ethics. If the authors answer No, they should explain the special circumstances that require a deviation from the Code of Ethics. The authors should make sure to preserve anonymity (e.g., if there is a special consideration due to laws or regulations in their jurisdiction).

10. Broader Impacts

Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work performed?

Answer: [Yes]

Justification: This work directly addresses challenges in chemistry that can advance health and sustainability as briefly described in Section 1.

Guidelines:

The answer NA means that there is no societal impact of the work performed. If the authors answer NA or No, they should explain why their work has no societal impact or why the paper does not address societal impact.

Examples of negative societal impacts include potential malicious or unintended uses (e.g., disinformation, generating fake profiles, surveillance), fairness considerations (e.g., deployment of technologies that could make decisions that unfairly impact specific groups), privacy considerations, and security considerations. The conference expects that many papers will be foundational research and not tied to particular applications, let alone deployments. However, if there is a direct path to any negative applications, the authors should point it out. For example, it is legitimate to point out that an improvement in the quality of generative models could be used to generate deepfakes for disinformation. On the other hand, it is not needed to point out that a generic algorithm for optimizing neural networks could enable people to train models that generate Deepfakes faster. The authors should consider possible harms that could arise when the technology is being used as intended and functioning correctly, harms that could arise when the technology is being used as intended but gives incorrect results, and harms following from (intentional or unintentional) misuse of the technology. If there are negative societal impacts, the authors could also discuss possible mitigation strategies (e.g., gated release of models, providing defenses in addition to attacks, mechanisms for monitoring misuse, mechanisms to monitor how a system learns from feedback over time, improving the efficiency and accessibility of ML).

11. Safeguards

Question: Does the paper describe safeguards that have been put in place for responsible release of data or models that have a high risk for misuse (e.g., pretrained language models, image generators, or scraped datasets)?

Answer: [NA]

Justification: We do not anticipate any significant risks associated with releasing data and models in this work.

Guidelines:

The answer NA means that the paper poses no such risks. Released models that have a high risk for misuse or dual-use should be released with necessary safeguards to allow for controlled use of the model, for example by requiring that users adhere to usage guidelines or restrictions to access the model or implementing safety filters. Datasets that have been scraped from the Internet could pose safety risks. The authors should describe how they avoided releasing unsafe images. We recognize that providing effective safeguards is challenging, and many papers do not require this, but we encourage authors to take this into account and make a best faith effort.

12. Licenses for existing assets

Question: Are the creators or original owners of assets (e.g., code, data, models), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected?

Answer: [Yes]

Justification: We cite all originators of data or code that was used.

Guidelines:

The answer NA means that the paper does not use existing assets. The authors should cite the original paper that produced the code package or dataset. The authors should state which version of the asset is used and, if possible, include a URL. The name of the license (e.g., CC-BY 4.0) should be included for each asset. For scraped data from a particular source (e.g., website), the copyright and terms of service of that source should be provided.

If assets are released, the license, copyright information, and terms of use in the package should be provided. For popular datasets, paperswithcode.com/datasets has curated licenses for some datasets. Their licensing guide can help determine the license of a dataset. For existing datasets that are re-packaged, both the original license and the license of the derived asset (if it has changed) should be provided. If this information is not available online, the authors are encouraged to reach out to the asset s creators.

13. New Assets

Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets?

Answer: [Yes]

Justification: We introduce new benchmark targets using a small number of molecules from the Pistachio database (license for this subset: CC BY-SA-NC). We describe how they were extracted in Section A.6.

Guidelines:

The answer NA means that the paper does not release new assets. Researchers should communicate the details of the dataset/code/model as part of their submissions via structured templates. This includes details about training, license, limitations, etc. The paper should discuss whether and how consent was obtained from people whose asset is used. At submission time, remember to anonymize your assets (if applicable). You can either create an anonymized URL or include an anonymized zip file.

14. Crowdsourcing and Research with Human Subjects

Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation (if any)?

Answer: [NA]

Justification: This paper does not involve crowdsourcing or human subjects.

Guidelines:

The answer NA means that the paper does not involve crowdsourcing nor research with human subjects. Including this information in the supplemental material is fine, but if the main contribution of the paper involves human subjects, then as much detail as possible should be included in the main paper. According to the Neur IPS Code of Ethics, workers involved in data collection, curation, or other labor should be paid at least the minimum wage in the country of the data collector.

15. Institutional Review Board (IRB) Approvals or Equivalent for Research with Human Subjects

Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or institution) were obtained?

Answer: [NA]

Justification: This paper does not involve crowdsourcing or human subjects.

Guidelines:

The answer NA means that the paper does not involve crowdsourcing nor research with human subjects.

Depending on the country in which research is conducted, IRB approval (or equivalent) may be required for any human subjects research. If you obtained IRB approval, you should clearly state this in the paper. We recognize that the procedures for this may vary significantly between institutions and locations, and we expect authors to adhere to the Neur IPS Code of Ethics and the guidelines for their institution. For initial submissions, do not include any information that would break anonymity (if applicable), such as the institution conducting the review.