# gflownet_assisted_biological_sequence_editing__7b70bde6.pdf

GFlow Net Assisted Biological Sequence Editing

Pouya M. Ghari University of California Irvine Alex M. Tseng Genentech

Gökcen Eraslan Genentech Romain Lopez Genentech, Stanford University Tommaso Biancalani Genentech

Gabriele Scalia Genentech Ehsan Hajiramezanali Genentech

Editing biological sequences has extensive applications in synthetic biology and medicine, such as designing regulatory elements for nucleic-acid therapeutics and treating genetic disorders. The primary objective in biological-sequence editing is to determine the optimal modifications to a sequence which augment certain biological properties while adhering to a minimal number of alterations to ensure predictability and potentially support safety. In this paper, we propose GFNSeq Editor, a novel biological-sequence editing algorithm which builds on the recently proposed area of generative flow networks (GFlow Nets). Our proposed GFNSeq Editor identifies elements within a starting seed sequence that may compromise a desired biological property. Then, using a learned stochastic policy, the algorithm makes edits at these identified locations, offering diverse modifications for each sequence to enhance the desired property. The number of edits can be regulated through specific hyperparameters. We conducted extensive experiments on a range of real-world datasets and biological applications, and our results underscore the superior performance of our proposed algorithm compared to existing state-of-the-art sequence editing methods.

1 Introduction

Editing biological sequences has a multitude of applications in biology, medicine, and biotechnology. For instance, gene editing serves as a tool to elucidate the role of individual gene products in diseases [28] and offers the potential to rectify genetic mutations in afflicted tissues and cells for therapeutic interventions [9]. The primary objective in biological-sequence editing is to enhance specific biological attributes of a starting seed sequence, while minimizing the number of edits. This reduction in the number of alterations not only has the potential to improve safety but also facilitates the predictability and precision of modification outcomes.

Existing methodologies that leverage generative modeling in the context of biological sequences have predominantly concentrated on de novo generation of sequences with desired properties [52, 61, 3]. A common feature of these approaches is generating entirely new sequences from scratch. As a result, there is an inherent risk of deviating significantly from naturally occurring sequences, compromising safety (e.g., the risk of designing sequences that might trigger an immune response) and predictability (e.g., obtaining misleading predictions from models that are trained on genomic sequences due to

Work has been done while interning at Genentech Corresponding author: hajiramezanali.ehsan@gene.com.

38th Conference on Neural Information Processing Systems (Neur IPS 2024).

out-of-distribution effects). Despite the paramount importance of editing biological sequences, there has been a noticeable scarcity of research leveraging generative modeling to address this aspect specifically.

Generative flow networks (GFlow Nets) [5, 6], a generative approach recognized for their ability to sequentially generate new objects, have shown remarkable performance in generating novel biological sequences from scratch [19, 30]. Drawing inspiration from the emerging field of GFlow Nets, this paper introduces a novel biological-sequence editing algorithm: GFNSeq Editor. GFNSeq Editor assesses the potential for significant property enhancement within a given seed sequence by iteratively identifying and subsequently editing specific positions in the input sequence. More precisely, using the trained flow function, GFNSeq Editor first identifies positions in the seed sequence that require editing. Then, it constructs a stochastic policy using the flow function to select a substitution from the available options for the identified positions. Our stochastic approach empowers GFNSeq Editor to generate a diverse set of edited sequences for each input sequence, which, due to the diverse nature of biological targets, is an important consideration in biological sequence design [34, 19].

In summary, this paper makes the following contributions:

We introduce GFNSeq Editor, a novel sequence-editing method which identifies and edits positions within a given sequence. GFNSeq Editor generates diverse edits for each input sequence based on a stochastic policy.

We theoretically analyze the properties of the sequences edited through GFNSeq Editor, deriving lower and upper bounds on the property of edited sequences. Additionally, we demonstrate that the lower and upper bounds for the number of edits performed by GFNSeq Editor can be controlled through the adjustment of hyperparameters (Subsection 4.3).

We conduct experiments across various DNA and protein sequence editing tasks, showcasing GFNSeq Editor s remarkable efficiency in enhancing properties with a reduced number of edits when compared to existing state-of-the-art methods. (Subsection 5.1).

We highlight the versatility of GFNSeq Editor, which can be employed not only for sequence editing but also alongside biological-sequence generation models to produce novel sequences with improved properties and increased diversity (Subsection 5.2).

We demonstrate the usage of GFNSeq Editor for sequence length reduction, allowing the creation of new, relatively shorter sequences by combining pairs of long and short sequences (Subsection 5.3).

2 Related Works

De Novo Sequence Design. The generation of biological sequences has been tackled using a diverse range of methods, including reinforcement learning [1], Bayesian optimization [51], deep generative models for search and sampling [18], generative adversarial networks [61], diffusion models [3], model-based optimization approaches [52, 7], adaptive evolutionary strategies [16, 49], likelihood-free inference [55], and surrogate-based black-box optimization [10], and GFlow Net [19]. It is important to note that all these sequence-generation methods generate sequences from scratch. However, ab initio generation carries the risk of deviating too significantly from naturally occurring sequences, which can compromise safety and predictability. In contrast, our proposed method enhances a target property while maintaining the similarity to seed sequences (e.g., naturally occurring sequences), thus improving predictability and potentially enhancing safety.

Sequence Editing. Traditional approaches commonly employed for biological sequence editing are evolution-based methods, where over many iterations a starting seed sequence is randomly mutated, retaining only the best sequences (i.e., highest desired property) for the next round [2, 46, 50, 41]. These approaches have several important limitations. First, they require the evaluation of numerous candidate sequences at every iteration. This computational demand can become prohibitively expensive, particularly for lengthy sequences. Additionally, evolution-based methods heavily rely on evaluations provided by a proxy model capable of assessing the properties of unseen sequences; the efficacy of these methods is thus limited by the reliability of the underlying proxy. Moreover, these methods may require repeated rounds of interactions with the lab [41], which can be costly and time-consuming.

A T G T C C G C

DNA sequence x with property y

Sequence Editor E A C G T C C A C

DNA sequence ˆx with property ˆy

Figure 1: An example of editing the DNA sequence ATGTCCGC . The goal is to make a limited number of edits to maximize the property ˆy. Each token in the sequence in this example is called a base and can be any of [ A , C , T , G ]. The editor function E accepts the initial sequence as an input and determines that the second and seventh bases require editing (highlighted in red). Then, E modifies the bases at these identified locations to improve the property value.

Beyond evolution-based methods, a handful of optimization-based methods have been proposed by [42, 48, 21]. By treating sequence editing as an optimization task, Ledidi [42] learns to perturb specific positions within a given sequence. Utilizing Bayesian optimization, La MBO [48] generates new sequences by optimizing a batch of starting seed sequences. Building upon the La MBO framework, MOGFN-AL [21] leverages GFlow Nets to generate candidates in each round of Bayesian optimization loop, improving computational efficiency compared to La MBO. Akin to evolution-based models, these optimization-based methods require the evaluation of unseen sequences. Consequently, their effectiveness is contingent on the quality of the proxy model, which can compromise their performance if the proxy model lacks sufficient generalizability for unseen sequences. Furthermore, both evolution-based and optimization-based methods perform local searches given either a single seed sequence or a batch of seed sequences. Thus, these methods face issues related to low sample efficiency. In contrast, GFNSeq Editor relies on a pre-trained flow function that amortizes the search cost over the learning process, allocating probability mass across the entire space to facilitate exploration and diversity. Furthermore, GFNSeq Editor can be employed for editing without necessitating the evaluation of unseen sequence properties. Theoretical analysis presented in this paper establishes that the bounds of edited sequence rewards, property improvement, and the number of edits can be effectively regulated through GFNSeq Editor hyperparameters. Therefore, GFNSeq Editor offers increased reliability and operational suitability in comparison to counterparts lacking a robust theoretical analysis.

We provide an extensive overview of the related literature, with additional discussion available in the Appendix F.

3 Preliminaries and Problem Statement

Let x be a biological sequence with property y. For example, x may be a DNA sequence, and y may be the likelihood it binds to a particular protein of interest. The present paper considers the problem of searching for edits in x to improve y. To this end, the goal is to learn an editor function E( ) which accepts a sequence x and outputs the edited sequence E(x) = ˆx with property ˆy. The editor function E( ) should maximize ˆy, while at the same time minimizing the number of edits between x and ˆx. To achieve this goal, we propose GFNSeq Editor. GFNSeq Editor first identifies positions in a given biological sequence such that editing those positions leads to considerable improvement in the property of the sequence. Then, the learned editor function E edits these identified locations (Figure 1). GFNSeq Editor uses a trained GFlow Net [5, 6] to identify positions that require editing and subsequently generate edits for those positions. The following Subsections present preliminaries on GFlow Nets.

3.1 Generative Flow Networks

Generative Flow Networks (GFlow Nets) [5, 6] learn a stochastic policy π( ) to sequentially construct a discrete object x. Let X be the space of discrete objects x. It is assumed that the space X is compositional, meaning that an object x can be constructed using a sequence of actions taken from an action set A. At each step t, given a partially constructed object st, GFlow Net samples an action at+1 from the set A using the stochastic policy π( |st). Then, GFlow Net appends at+1 to st to obtain st+1. In this context, st can be viewed as the state at step t. The above procedure continues until reaching a terminating state, which yields the fully constructed object x. To construct an object x, the GFlow Net starts from an initial empty state s0, and applying actions sequentially, all fully constructed objects must end in a special final state sf. Therefore, the trajectory of states to construct

an object x can be written as τx = (s0 s1 x sf). Let T be the set of all possible trajectories. Furthermore, let R( ) : X R+ be a non-negative reward function defined on X. The goal of GFlow Net is to learn a stochastic policy π( ) such that π(x) R(x). This means that the GFlow Net learns a stochastic policy π( ) to generate an object x with a probability proportional to its reward.

As described later, to obtain the policy π( ), the GFlow Net uses trajectory flow F : T R+. The trajectory flow F(τ) assigns a probability mass to the trajectory τ. Then, the edge flow from state s to state s is defined as F(s s ) = P

τ:s s τ F(τ). Moreover, the state flow is defined as F(s) = P τ:s τ F(τ). The trajectory flow F( ) induces a probability measure PF ( ) over completed trajectories that can be expressed as PF (τ) = F (τ)

Z where Z = P

τ T F(τ) represents

the total flow. The probability of visiting state s can be written as PF (s) =

τ T:s τ F (τ)

Z . Then, the

forward transition probability from state s to state s can be obtained as PF (s |s) = F (s s )

F (s) . The trajectory flow F( ) is called a consistent flow if for any state s it satisfies P

s :s s F(s s) = P

s :s s F(s s ), which constitutes that the in-flow and out-flow of state s are equal. [5] shows that if F( ) is a consistent flow such that the terminal flow is set as reward (i.e. F(x sf) = R(x)), the policy π( ) defined as π(s |s) = PF (s |s) satisfies π(x) = R(x)

Z which means that the policy π( ) samples an object x proportional to its reward.

3.2 Training GFlow Net Models

In order to learn the policy π( ), a GFlow Net model approximates trajectory flow with a flow function Fθ( ) where θ includes learnable parameters of the flow function. To learn the flow function that can provide consistency condition, [5] formulates flow-matching loss function as follows:

LFM(s; θ) = log P

s :s s Fθ(s s) P

s :s s Fθ(s s )

Moreover, as an alternative objective function, [31] introduces trajectory balance as:

LTB(s; θ) = log Zθ Q

s s PFθ(s |s)

where Zθ is a learnable parameter. The trajectory-balance objective function in (2) can accelerate training GFlow Nets and provide robustness to long trajectories. Given a training dataset, optimization techniques such as stochastic gradient descent can be applied to objective functions in (1) and (2) to train the GFlow Net model. We use trajectory balance in this paper due to its well-documented performance. Furthermore, it is worth noting that generating sequences in an autoregressive fashion using GFlow Net involves only one path to generate a particular sequence. In such cases, generating biological sequences with GFlow Net can be viewed as a Soft-Q-Learning [15, 13, 33] and path consistency learning (PCL) [35] problem.

4 Sequence Editing with GFlow Net

To edit a given sequence x, we propose identifying sub-optimal positions of x such that editing them can lead to considerable improvement in the sequence property. Assume that the flow function Fθ( ) is trained on available offline training data. GFNSeq Editor uses the trained GFlow Net s flow function Fθ( ) to identify sub-optimal positions of x, and subsequently replace the sub-optimal parts with newly sampled edits based on the stochastic policy π( ).

4.1 Sub-Optimal-Position Identification

This Subsection provides intuition on how GFNSeq Editor uses a pre-trained flow function Fθ( ) to identify sub-optimal positions in a sequence x to edit. Let xt and x:t denote the t-th element and the first t elements in the sequence x, respectively. For example, in the DNA sequence x = ATGTCCGC , we have x2 = T and x:2 = AT . GFNSeq Editor constructs edited sequences token by token, and for each position t + 1, it examines whether xt+1 should be edited or not. Using the flow function Fθ( ), given x:t, GFlow Net evaluates the average reward obtained by appending any possible token

to x:t. In this context, each token can be viewed as an action. Let x:t + a denotes the expanded x:t by appending token a. Let A represent the available action set. For each a A, using the state flow Fθ(x:t + a), the value of action a given x:t can be evaluated. As discussed in Section 3, the state flow Fθ(x:t + a) is proportional to the total reward of all possible sequences that have x:t + a as their prefix. Therefore, Fθ(x:t + a1) > Fθ(x:t + a2) indicates that taking action a1 instead of action a2 can lead to obtaining better candidates for the final sequence. We can leverage this property of the flow function Fθ( ) to examine if xt+1 is sub-optimal or not. If the reward resulting from having xt+1 in the seed sequence is evaluated by Fθ( ) to be relatively small compared to other possible actions, then xt+1 is considered sub-optimal. In particular, xt+1 is identified as sub-optimal if:

Fθ(x:t + xt+1) < δ max a A Fθ(x:t + a), (3)

where 0 δ 1 is a hyperparameter. A larger δ increases the likelihood that the algorithm identifies xt+1 as sub-optimal. From (3), it can be inferred that xt+1 is identified as sub-optimal if its associated out-flow is considerably smaller than the out-flow associated with the best possible action in A. This means that the flow function Fθ( ) suggests that replacing xt+1 with other actions can lead to remarkable improvement in the sequence property.

4.2 Sequence Editing with GFNSeq Editor

Using the flow function Fθ( ), GFNSeq Editor iteratively identifies and edits positions in a seed sequence. Subsection 4.1 presented a simple function for determining if a position xt+1 in a sequence should be edited to improve the target property value ((3)). Based on this intuition, we now modify (3) to formally define the sub-optimal-position identification function D( ) used by GFNSeq Editor.

Let ˆx:t denote the first t elements of the edited sequence. Assume that xt A, t, meaning that xt is always in the available actions. At each step t of the algorithm, D( ) accepts ˆx:t 1 and evaluates whether appending xt (from the seed sequence) to the edited partial sequence ˆx:t 1 is detrimental to the performance. In particular, modifying (3), the sub-optimal identifier function D( ) checks the following condition:

Fθ(ˆx:t 1 + xt) P

a A Fθ(ˆx:t 1 + a ) < δ max a A Fθ(ˆx:t 1 + a) P

a A Fθ(ˆx:t 1 + a ) + ν, (4)

where ν N(0, σ2) is a Gaussian random variable with variance of σ2. The variance σ2 is a hyperparameter. The relation between σ and the algorithm performance will be analyzed in Section 4.3. The inclusion of additive noise ν on the right-hand side of (4) introduces a degree of randomness into the process of identifying sub-optimal positions. This, in turn, fosters exploration in the editing process. The sub-optimal-position-identifier function D( ) determines if xt is sub-optimal as follows:

D(xt, ˆx:t 1; δ, σ) = 1 If (4) is met 0 Otherwise . (5)

If D(xt, ˆx:t 1; δ, σ) = 0, at step t the algorithm appends xt from the original sequence x to ˆx:t 1. Otherwise, if D(xt, ˆx:t 1; δ, σ) = 1, the algorithm samples an action a according to the following policy:

π(a|ˆx:t 1) = (1 λ) Fθ(ˆx:t 1 + a) P

a AFθ(ˆx:t 1+a ) + λ1a=xt, (6)

where 0 λ < 1 is a regularization coefficient and 1a=xt denotes the indicator function and is 1 if a = xt. The regularization parameter λ allows tuning the sampling process to favor the original sequence (a larger λ leads to a smaller number of edits). The policy in (6) constitutes a trade-off between increasing the target property and decreasing the distance between the edited sequence ˆx and the original sequence x. Let xt be the action sampled by the policy π in (6). In summary, the t-th element in the edited sequence can be written as:

ˆxt = D(xt, ˆx:t 1; δ, σ) xt + (1 D(xt, ˆx:t 1; δ, σ))xt. (7)

Therefore, at each step t, the edited sequence is updated as ˆx:t = ˆx:t 1 + ˆxt. The process continues until step T, where T = |x| denotes the length of the original sequence x. Note that ˆx:0 is an empty sequence. Algorithm 1 summarizes GFNSeq Editor.

Algorithm 1 GFNSeq Editor: Sequence Editor using GFlow Net

1: Input: Sequence x with length T, flow function Fθ( ) and parameters δ, λ and σ. 2: Initialize ˆx:0 as an empty sequence. 3: for t = 1, . . . , T do 4: Check if xt is sub-optimal by obtaining D(xt, ˆx:t 1; δ, σ) according to (5). 5: if D(ˆx:t 1; δ, σ) = 1 then 6: Sample ˆxt according to policy π( |ˆx:t 1) in (6). 7: else 8: Assign ˆxt = xt. 9: end if 10: end for 11: Output: Edited sequence ˆx.

4.3 Analysis

This Subsection analyzes the reward and properties of the edited sequence as well as the number of edits performed by GFNSeq Editor. Specifically, the bounds for the reward of edited sequences, property improvement and the number of edits are determined by the algorithm s hyperparameters σ, δ, and λ. The following theorem specifies the lower bound for the reward of edited sequences. Theorem 4.1. Let T be the length of the original sequence x. The expected reward of the sequence edited by GFNSeq Editor ˆx given x is bounded from below as:

E[R(ˆx)|x] 1 Φ(1 δ

σ ) (1 λ)RF,T , (8)

where Φ( ) denotes the cumulative distribution function (CDF) for the normal distribution and RF,T represents the expected reward of a sequence with length T generated using the flow function Fθ( ).

Proof of Theorem 4.1 is deferred to Appendix A. The following Theorem obtains the expected property improvement upper bound of the proposed GFNSeq Editor. The property improvement of a sequence x is defined as PI = ˆy y where ˆy denotes the edited sequence property. Theorem 4.2. Let S x be the set of all sequences with length T which have larger properties than that of x (i.e., y). Assume that S x is a non-empty set. The expected property improvement by applying GFNSeq Editor on x is bounded from above as

σ ) (pw y) (9)

where pw denote the property of the sequence w.

The proof of Theorem 4.2 can be found in Appendix B. Theorems 4.1 and 4.2 demonstrate that an increase in δ results in an increase in both the lower bound of reward and the upper bound of property improvement. While a higher value of σ corresponds to larger lower bounds for the reward, an increase in σ diminishes the upper bound of the property improvement. The following theorem obtains the upper bound on the number of edits performed by the proposed GFNSeq Editor. Theorem 4.3. The expected distance between the edited sequence ˆx by GFNSeq Editor and the original sequence x is bounded from above as:

E[lev(x, ˆx)] (1 λ) 1 Φ( δ

σ ) T, (10)

where lev( , ) is the Levenshtein distance between two sequences.

The proof for Theorem 4.3 is available in Appendix C. The following Theorem specifies the lower bound for the number of edits. Theorem 4.4. Let there exists ϵ > 0 such that the flow function Fθ( ) satisfies:

max a A Fθ(ˆx:t 1 + a) P

a A Fθ(ˆx:t 1 + a ) 1 ϵ, t, (11)

meaning that the probability of choosing of each action is always less than 1 ϵ. The expected distance between the edited sequence ˆx by GFNSeq Editor and the original sequence x is bounded from below as:

E[lev(x, ˆx)] ϵ(1 λ) 1 Φ(1 δ

σ ) T. (12)

Proof of Theorem 4.4 can be found in Appendix D. Theorems 4.3 and 4.4 show that as δ increases, both the lower and upper bounds of distance increase. In contrast, an increase in λ leads to a decrease in both the lower and upper bounds of distance. Furthermore, Theorem 4.1 demonstrates that a reduction in λ results in a larger lower bound for the reward. Therefore, Theorems 4.1 and 4.3 reveal a trade-off between the expected number of edits and the lower bound for the expected reward. While it is preferable to select hyperparameters δ and λ that reduce the expected number of edits, an increase in the number of edits corresponds to a larger lower bound for the reward.

5 Experiments

We conducted extensive experiments to assess the performance of GFNSeq Editor in comparison to several state-of-the-art baselines across diverse DNAand protein-sequence editing tasks. We evaluate on TFbinding, AMP, and CRE datasets. TFbinding and CRE datasets consist DNA sequences with lengths of 8 and 200, respectively. The task in both datasets is to edit sequences to increase their binding activities. The vocabulary for both TFbinding and CRE is the four DNA bases, {A, C, G, T}. AMP dataset comprises positive samples, representing anti-microbial peptides (AMPs), and negative samples, which are non-AMPs. The vocabulary consists of 20 amino acids. The primary objective is to edit the non-AMP samples in such a way that the edited versions attain the characteristics exhibited by AMP samples. Additional information about the datasets can be found in Appendix E.1.1.

To evaluate the performance of sequence editing methods, we compute the following metrics:

Property Improvement (PI): The PI for a given sequence x with label y is calculated as the average enhancement in property across edits, expressed as PI = 1 ne Pne i=1 (ˆyi y), where ne is the number of edited sequences associated with the original sequence x and ˆyi denote the property of the i-th edited sequence ˆxi. To evaluate the performance of editing methods, for each dataset we leverage an oracle to obtain ˆyi given ˆxi. More details about oracles can be found in Appendix E. Edit Percentage (EP): The average Levenshtein distance between x and edited sequences normalized by the length of x expressed as 1 ne T Pne i=1 lev(x, ˆxi).

Diversity: For each sequence x, the diversity among edited sequences can be obtained as 2 ne(ne 1) Pne 1 i=1 Pne j=i+1 lev(ˆxi, ˆxj).

GMDPI: The geometric mean of diversity and PI is measured. This metric highlights algorithms that exhibit strong performance in both aspects simultaneously.

We compared GFNSeq Editor to several baselines, including Directed Evolution (DE) [46], Ledidi [42], La MBO [48], MOGFN-AL [21], GFlow Net-AL [19], and Seq2Seq. To perform Directed Evolution for sequence editing, we select a set of positions uniformly at random within a given sequence and then apply the directed-evolution algorithm to edit these positions. The implementation of the directed-evolution algorithm is the same as that of the Ada Lead framework in [46]. Inspired by graph-to-graph translation for molecular optimization in [23], we implemented another editing baseline, which is called Seq2Seq. For the Seq2Seq baseline, we initially partition the dataset into two subsets: i) sequences with lower target-property values, and ii) sequences with relatively higher targetproperty values. Subsequently, we create pairs of data samples such that each low-property sequence is paired with its closest counterpart from the high-property sequence set, based on Levenshtein distance. A transformer is then trained to map each low-property sequence to its high-property pair. Essentially, the Seq2Seq baseline maps an input sequence to a similar sequence with a higher property value. Furthermore, we adapted GFlow Net-AL for sequence editing, and named it GFlow Net-E in what follows. In this baseline, the initial segment of the sequence serves as the input, allowing the model to generate the subsequent portion of the sequence. For TF-binding, AMP, and CRE datasets, GFlow Net-E takes in the initial 70%, 65%, and 60% of elements, respectively, from the

Table 1: Performance of GFNSeq Editor compared to the baselines in terms of property improvement (PI), edit percentage (EP), diversity, and geometric mean of property improvement and diversity (GMDPI) on TFbinding, AMP, and CRE datasets. EP is selected to be approximately the same for all algorithms (if possible). Higher PI, diversity and GMDPI are preferable.

TFbinding AMP CRE Methods PI EP(%) Diversity GMDPI PI EP(%) Diversity GMDPI PI EP(%) Diversity GMDPI DE 0.12 25.00 3.01 0.60 0.11 33.82 13.67 1.23 0.63 22.93 62.07 6.25 Ledidi 0.06 27.80 1.25 0.27 0.18 34.79 11.65 1.45 1.36 22.13 50.49 8.29 La MBO 0.05 25.00 3.14 0.40 0.12 34.33 15.61 1.36 0.79 23.35 62.95 7.05 MOGFN-AL 0.09 25.00 2.66 0.49 0.10 35.26 7.59 0.87 2.45 22.99 10.96 5.18 GFlow Net-E 0.11 28.35 2.10 0.48 0.28 35.68 3.42 0.98 4.24 22.73 37.06 12.53 Seq2Seq 0.03 41.98 - - 0.21 78.05 - - - - - - GFNSeq Editor 0.14 24.27 3.84 0.73 0.33 34.49 14.34 2.17 9.90 21.90 40.41 20.00

input sequence x, and generates the remaining elements using the pre-trained flow function. More details on the baselines can be found in Appendix E.1.

To train both the baselines and the proposed GFNSeq Editor, we divide each dataset into training, validation, and test sets with proportions of 72%, 18% and 10%, respectively. The test set serves the purpose of evaluating the performance of the methods in sequence editing tasks. The flow function Fθ( ) utilized by GFNSeq Editor and the GFlow Net-E baseline is an MLP consisting of two hidden layers, each with a dimension of 2048, and |A| outputs corresponding to actions. Throughout our experiments, we employ the trajectory balance objective to train the flow function. Additional details regarding the training of the flow function can be found in Appendix E.1.

5.1 Sequence Editing

Figure 2: Property improvement of AMP (left) and CRE (right) with respect to edit percentage.

Table 1 presents the performance of GFNSeq Editor and other baselines on TFbinding, AMP, and CRE datasets3. We set GFNSeq Editor and all the baselines except for Seq2Seq to create 10 edited sequences for each input sequence. The Seq2Seq implementation closely resembles a deterministic machine translator and is limited to producing just one edited sequence per input, resulting in a diversity score of zero. Additionally, Figure 2 shows the property improvement achieved by GFNSeq Editor, DE, Ledidi, La MBO, and MOGFN-AL across a range of edit percentages. As evident from Table 1 and Figure 2, GFNSeq Editor outperforms all baselines, achieving substantial property improvements with a controlled number of edits. This superior performance is attributed to GFNSeq Editor s utilization of a pre-trained flow function from GFlow Net, enabling it to achieve significantly higher property improvements than DE, Ledidi, La MBO, and MOGFN-AL, which rely on local search techniques by optimizing either a given single sequence or a batch of sequences. Specifically, the flow function Fθ( ) is trained to sample sequences with probability proportional to their reward and, as a result, employing the policy in (6) for editing enables GFNSeq Editor to leverage global information contained in Fθ( ) about the entire space of sequences. Furthermore, GFNSeq Editor achieves larger property improvement than GFlow Net-E. The GFNSeq Editor identifies and edits sub-optimal positions within a seed sequence using (4), while GFlow Net-E only edits the tail of the input seed sequence. This indicates the effectiveness of the sub-optimal position identifier function of GFNSeq Editor.

Ablation study. We further study the property improvement achieved by GFNSeq Editor along with edit percentage across various choices of hyperparameters δ and λ. Figure 3 illustrates that an increase in δ generally corresponds to an increase in both property improvement and edit percentage, whereas, in most cases, an increase in λ results in a decrease in property improvement and edit percentage. Furthermore, in Figure 7 in Appendix E.3, we illustrate the impact of changing σ on property improvement and edit diversity for GFNSeq Editor. This figure highlights that increasing σ results in decreased property improvement and enhanced diversity. These results corroborate the theoretical analyses outlined in Theorems 4.1, 4.2 and 4.3 in Section 4.3.

3Seq2Seq relies on identifying pairs of similar sequences for training. However, we were unable to identify similar pairs for CRE, possibly because of the limited number of training samples relative to the lengthy nature of the sequences (i.e., sequences with a length of 200).

1e 4 1e 3 0.01 0.1

Property improvement

0.0 0.1 0.2 0.3 0.4 0.5

50.86 46.05 41.4

0.15 0.1 0.2

Expression improvement

0.05 0.1 0.2

Figure 3: Studying the effect of heyperparameters δ and λ on the performance of GFNSeq Editor over AMP (left) and CRE (right) datasets. The marker values are edit percentages.

5.2 Assisting Sequence Generation

Table 2: Performance of DM, GFlow Net and combination of DM with GFNSeq Editor for generating novel sequences.

AMP CRE Algorithms Property Diversity Property Diversity DM 0.66 23.86 1.75 107.38 GFlow Net 0.74 17.86 28.20 83.88 DM+GFNSeq Editor 0.73 23.78 26.42 103.10

In addition to editing sequences, we investigate the ability of GFNSeq Editor to be used alongside a sequence generative model to enhance the generation of novel sequences. This highlights the versatility of the proposed GFNSeq Editor. In this Subsection, we utilize a pre-trained diffusion model (DM) for sequence generation, with further details available in Appendix E.2. The sequences generated by the DM are passed to GFNSeq Editor to improve their target property. Given that GFNSeq Editor utilizes a trained GFlow Net model, this combination of a DM and GFNSeq Editor can be regarded as an ensemble approach, effectively leveraging both the DM and the GFlow Net for sequence generation. Table 2 presents the property and diversity metrics for sequences generated by the DM, the GFlow Net, and the combined DM+GFNSeq Editor across AMP and CRE datasets, with each method generating 1, 000 sequences. As observed from Table 2, GFlow Net excels at producing sequences with higher property values compared to the DM, while the DM exhibits greater sequence diversity than the GFlow Net. Sequences generated by DM+GFNSeq Editor maintain similar property levels to the GFlow Net on its own, while their diversity is in line with that of the DM. This highlights the effectiveness of DM+GFNSeq Editor in harnessing the benefits of both the GFlow Net and the DM.

Figure 4: CDF of generated sequence properties for AMP (left) and CRE (right). A right-shifted curve indicates that the model is generating more sequences that are high in the target property.

Moreover, we show the CDF of the property for sequences generated by the DM, the GFlow Net, and DM+GFNSeq Editor in Figure 4. As shown, the CDF of DM+GFNSeq Editor aligns with both DM and GFlow Net. Specifically, for AMP dataset, DM+GFNSeq Editor generates more sequences with higher properties than 0.78 compared to GFlow Net, while reducing the number of low-property generated sequences compared to DM alone. In the case of CRE dataset, the results in Figure 4 indicate that as δ increases, the CDF of DM+GFNSeq Editor becomes more akin to that of GFlow Net. This is expected, as an increase in δ leads to a greater number of edits.

5.3 Sequence Combination

Figure 5: GFNSeq Editor effectively reduces the length of AMP sequence inputs (right) while keeping their properties intact (left).

GFNSeq Editor possesses the capability to combine multiple sequences, yielding a novel sequence that closely resembles its parent sequences. This capability proves invaluable in several applications. For example, when it is important to shorten relatively lengthy sequences while retaining desired properties (see, e.g., [54, 59]). GFNSeq Editor accomplishes this by combining a longer sequence with a shorter one. The resultant sequence maintains high similarity with the longer one to retain its desired properties, while also resembling a realistic, relatively shorter sequence to ensure safety and predictability. Algorithm 2 in Appendix E.5 describes using GFNSeq Editor to combine two sequences with the goal of shortening the longer one.

We evaluate GFNSeq Editor s performance in combining pairs of long and short sequences using the AMP dataset as a test case. In this context, a long sequence is defined as one with a length exceeding

30, while a short sequence has a length shorter than 20. Each initial pair consists of a long AMP sequence and its closest short sequence with an AMP property exceeding 0.7. Table 5 and Figure 5 in Appendix E.5 present the results of sequence combination for sequence length reduction. As indicated in Table 5, GFNSeq Editor not only enhances the properties of the initial long sequences, but also significantly shortens them by more than 63%. Additionally, the sequences generated by GFNSeq Editor resemble both the initial long and short sequences, with an average Levenshtein similarity of approximately 65% to long sequences and 55% to short sequences.

6 Conclusions

This paper introduces GFNSeq Editor, a generative model for sequence editing built upon GFlow Net. Given an input seed sequence, GFNSeq Editor identifies and edits positions within the input sequence to enhance its property. This paper also offers a theoretical analysis of the properties of edited sequences and the amount of edits performed by GFNSeq Editor. Experimental evaluations using real-world DNA and protein datasets demonstrate that GFNSeq Editor outperforms state-of-the-art baselines in terms of property enhancement while maintaining a similar amount of edits. Nevertheless, akin to many machine learning algorithms, GFNSeq Editor does have its limitations. It relies on a well-trained GFlow Net model, necessitating the availability of a high-quality trained GFlow Net for optimal performance.

[1] Angermueller, C., Dohan, D., Belanger, D., Deshpande, R., Murphy, K., and Colwell, L. Modelbased reinforcement learning for biological sequence design. In International conference on learning representations, 2019.

[2] Arnold, F. H. Design by directed evolution. Accounts of chemical research, 31(3):125 131, 1998.

[3] Avdeyev, P., Shi, C., Tan, Y., Dudnyk, K., and Zhou, J. Dirichlet diffusion score model for biological sequence generation. ar Xiv preprint ar Xiv:2305.10699, 2023.

[4] Barrera, L. A., Vedenko, A., Kurland, J. V., Rogers, J. M., Gisselbrecht, S. S., Rossin, E. J., Woodard, J., Mariani, L., Kock, K. H., Inukai, S., et al. Survey of variation in human transcription factors reveals prevalent dna binding changes. Science, 351(6280):1450 1454, 2016.

[5] Bengio, E., Jain, M., Korablyov, M., Precup, D., and Bengio, Y. Flow network based generative models for non-iterative diverse candidate generation. In Advances in Neural Information Processing Systems, volume 34, pp. 27381 27394, 2021.

[6] Bengio, Y., Lahlou, S., Deleu, T., Hu, E. J., Tiwari, M., and Bengio, E. Gflownet foundations. Journal of Machine Learning Research, 24(210):1 55, 2023.

[7] Chen, C. S., Zhang, Y., Liu, X., and Coates, M. Bidirectional learning for offline modelbased biological sequence design. In Proceedings of the International Conference on Machine Learning, 2023.

[8] Chen, Y. and Mauch, L. Order-preserving GFlownets. In International Conference on Learning Representations, 2024.

[9] Cox, D. B. T., Platt, R. J., and Zhang, F. Therapeutic genome editing: prospects and challenges. Nature medicine, 21(2):121 131, 2015.

[10] Dadkhahi, H., Rios, J., Shanmugam, K., and Das, P. Fourier representations for black-box optimization over categorical variables. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 10156 10165, 2022.

[11] Deleu, T., Góis, A., Emezue, C., Rankawat, M., Lacoste-Julien, S., Bauer, S., and Bengio, Y. Bayesian structure learning with generative flow networks. In Uncertainty in Artificial Intelligence, pp. 518 528. PMLR, 2022.

[12] Gosai, S. J., Castro, R. I., Fuentes, N., Butts, J. C., Kales, S., Noche, R. R., Mouri, K., Sabeti, P. C., Reilly, S. K., and Tewhey, R. Machine-guided design of synthetic cell type-specific cis-regulatory elements. bio Rxiv, pp. 2023 08, 2023.

[13] Grau-Moya, J., Leibfried, F., and Vrancx, P. Soft q-learning with mutual-information regularization. In International Conference on Learning Representations, 2019.

[14] Guo, S., Chu, J., Zhu, L., and Li, T. Dynamic backtracking in gflownet: Enhancing decision steps with reward-dependent adjustment mechanisms. ar Xiv preprint ar Xiv:2404.05576, 2024.

[15] Haarnoja, T., Tang, H., Abbeel, P., and Levine, S. Reinforcement learning with deep energybased policies. In Proceedings of the International Conference on Machine Learning, pp. 1352 1361, 2017.

[16] Hansen, N. The cma evolution strategy: a comparing review. Towards a new evolutionary computation: Advances in the estimation of distribution algorithms, pp. 75 102, 2006.

[17] Ho, J., Jain, A., and Abbeel, P. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840 6851, 2020. URL https://github.com/ hojonathanho/diffusion.

[18] Hoffman, S. C., Chenthamarakshan, V., Wadhawan, K., Chen, P.-Y., and Das, P. Optimizing molecules using efficient queries from property evaluations. Nature Machine Intelligence, 4(1): 21 31, 2022.

[19] Jain, M., Bengio, E., Hernandez-Garcia, A., Rector-Brooks, J., Dossou, B. F. P., Ekbote, C. A., Fu, J., Zhang, T., Kilgour, M., Zhang, D., Simine, L., Das, P., and Bengio, Y. Biological sequence design with GFlow Nets. In Proceedings of the 39th International Conference on Machine Learning, volume 162, pp. 9786 9801, Jul 2022.

[20] Jain, M., Deleu, T., Hartford, J., Liu, C.-H., Hernandez-Garcia, A., and Bengio, Y. Gflownets for ai-driven scientific discovery. Digital Discovery, 2(3):557 577, 2023.

[21] Jain, M., Raparthy, S. C., Hernandez-Garcia, A., Rector-Brooks, J., Bengio, Y., Miret, S., and Bengio, E. Multi-objective gflownets. In Proceedings of the International Conference on Machine Learning, 2023.

[22] Jang, H., Kim, M., and Ahn, S. Learning energy decompositions for partial inference in GFlownets. In International Conference on Learning Representations, 2024.

[23] Jin, W., Yang, K., Barzilay, R., and Jaakkola, T. Learning multimodal graph-to-graph translation for molecule optimization. In International Conference on Learning Representations, 2019.

[24] Kim, H., Kim, M., Choi, S., and Park, J. Genetic-guided gflownets: Advancing in practical molecular optimization benchmark. ar Xiv preprint ar Xiv:2402.05961, 2024.

[25] Kim, M., Ko, J., Yun, T., Zhang, D., Pan, L., Kim, W. C., Park, J., Bengio, E., and Bengio, Y. Learning to scale logits for temperature-conditional GFlow Nets. In Proceedings of the International Conference on Machine Learning, volume 235, pp. 24248 24270, Jul 2024.

[26] Kim, M., Yun, T., Bengio, E., Zhang, D., Bengio, Y., Ahn, S., and Park, J. Local search GFlownets. In International Conference on Learning Representations, 2024.

[27] Koziarski, M., Abukalam, M., Shah, V., Vaillancourt, L., Schuetz, D. A., Jain, M., van der Sloot, A. M., Bourgey, M., Marinier, A., and Bengio, Y. Towards DNA-encoded library generation with GFlownets. In ICLR 2024 Workshop on Generative and Experimental Perspectives for Biomolecular Design, 2024.

[28] Li, H., Yang, Y., Hong, W., Huang, M., Wu, M., and Zhao, X. Applications of genome editing technology in the targeted therapy of human diseases: mechanisms, advances and prospects. Signal transduction and targeted therapy, 5(1):1, 2020.

[29] Li, W., Li, Y., Li, Z., HAO, J., and Pang, Y. DAG matters! GFlownets enhanced explainer for graph neural networks. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=jgmu Rz M-sb6.

[30] Madan, K., Rector-Brooks, J., Korablyov, M., Bengio, E., Jain, M., Nica, A. C., Bosc, T., Bengio, Y., and Malkin, N. Learning gflownets from partial episodes for improved convergence and stability. In International Conference on Machine Learning, pp. 23467 23483. PMLR, 2023.

[31] Malkin, N., Jain, M., Bengio, E., Sun, C., and Bengio, Y. Trajectory balance: Improved credit assignment in GFlownets. In Advances in Neural Information Processing Systems, 2022.

[32] Malkin, N., Lahlou, S., Deleu, T., Ji, X., Hu, E. J., Everett, K. E., Zhang, D., and Bengio, Y. GFlownets and variational inference. In The Eleventh International Conference on Learning Representations, 2023.

[33] Mohammadpour, S., Bengio, E., Frejinger, E., and Bacon, P.-L. Maximum entropy GFlow Nets with soft Q-learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics, volume 238, pp. 2593 2601, May 2024.

[34] Mullis, M. M., Rambo, I. M., Baker, B. J., and Reese, B. K. Diversity, ecology, and prevalence of antimicrobials in nature. Frontiers in microbiology, pp. 2518, 2019.

[35] Nachum, O., Norouzi, M., Xu, K., and Schuurmans, D. Bridging the gap between value and policy based reinforcement learning. In Proceedings of the International Conference on Neural Information Processing Systems, pp. 2772 2782, 2017.

[36] Nishikawa-Toomey, M., Deleu, T., Subramanian, J., Bengio, Y., and Charlin, L. Bayesian learning of causal structure and mechanisms with gflownets and variational bayes. ar Xiv preprint ar Xiv:2211.02763, 2022.

[37] Niu, P., Wu, S., Fan, M., and Qian, X. GFlow Net training by policy gradients. In Proceedings of the International Conference on Machine Learning, volume 235, pp. 38344 38380, Jul 2024.

[38] Pan, L., Zhang, D., Courville, A., Huang, L., and Bengio, Y. Generative augmented flow networks. ar Xiv preprint ar Xiv:2210.03308, 2022.

[39] Pan, L., Zhang, D., Jain, M., Huang, L., and Bengio, Y. Stochastic generative flow networks. ar Xiv preprint ar Xiv:2302.09465, 2023.

[40] Pirtskhalava, M., Amstrong, A. A., Grigolava, M., Chubinidze, M., Alimbarashvili, E., Vishnepolsky, B., Gabrielian, A., Rosenthal, A., Hurt, D. E., and Tartakovsky, M. DBAASP v3: database of antimicrobial/cytotoxic activity and structure of peptides as a resource for development of new therapeutics. Nucleic Acids Research, 49(D1):D288 D297, 11 2020.

[41] Ren, Z., Li, J., Ding, F., Zhou, Y., Ma, J., and Peng, J. Proximal exploration for model-guided protein sequence design. In Proceedings of the International Conference on Machine Learning, volume 162, pp. 18520 18536, Jul 2022.

[42] Schreiber, J., Lu, Y. Y., and Noble, W. S. Ledidi: Designing genomic edits that induce functional activity. bio Rxiv, 2020. doi: 10.1101/2020.05.21.109686. URL https://www.biorxiv.org/ content/early/2020/05/25/2020.05.21.109686.

[43] Shen, M. W., Bengio, E., Hajiramezanali, E., Loukas, A., Cho, K., and Biancalani, T. Towards understanding and improving gflownet training. ar Xiv preprint ar Xiv:2305.07170, 2023.

[44] Sidorczuk, K., Gagat, P., Pietluch, F., Kała, J., Rafacz, D., B akała, L., Słowik, J., Kolenda, R., Rödiger, S., Fingerhut, L. C. H. W., Cooke, I. R., Mackiewicz, P., and Burdukiewicz, M. Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data. Briefings in Bioinformatics, 23(5):bbac343, Sep 2022.

[45] Silva, T., Carvalho, L. M., Souza, A. H., Kaski, S., and Mesquita, D. Embarrassingly parallel GFlow Nets. In Proceedings of the International Conference on Machine Learning, volume 235, pp. 45406 45431, Jul 2024.

[46] Sinai, S., Wang, R., Whatley, A., Slocum, S., Locane, E., and Kelsic, E. D. Adalead: A simple and robust adaptive greedy search algorithm for sequence design. ar Xiv preprint ar Xiv:2010.02141, 2020.

[47] Song, Y., Sohl-Dickstein, J., Brain, G., Kingma, D. P., Kumar, A., Ermon, S., and Poole, B. Score-based generative modeling through stochastic differential equations. In ar Xiv preprint ar Xiv:2011.13456, 2021.

[48] Stanton, S., Maddox, W., Gruver, N., Maffettone, P., Delaney, E., Greenside, P., and Wilson, A. G. Accelerating Bayesian optimization for biological sequence design with denoising autoencoders. In Proceedings of the International Conference on Machine Learning, volume 162, pp. 20459 20478, Jul 2022.

[49] Swersky, K., Rubanova, Y., Dohan, D., and Murphy, K. Amortized bayesian optimization over discrete spaces. In Conference on Uncertainty in Artificial Intelligence, pp. 769 778. PMLR, 2020.

[50] Taskiran, I. I., Spanier, K. I., Christiaens, V., Mauduit, D., and Aerts, S. Cell type directed design of synthetic enhancers. bio Rxiv, pp. 2022 07, 2022.

[51] Terayama, K., Sumita, M., Tamura, R., and Tsuda, K. Black-box optimization for automated discovery. Accounts of Chemical Research, 54(6):1334 1346, 2021.

[52] Trabucco, B., Kumar, A., Geng, X., and Levine, S. Conservative objective models for effective offline model-based optimization. In International Conference on Machine Learning, pp. 10358 10368. PMLR, 2021.

[53] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30, 2017.

[54] Xu, X., Chemparathy, A., Zeng, L., Kempton, H. R., Shang, S., Nakamura, M., and Qi, L. S. Engineered miniature crispr-cas system for mammalian genome regulation and editing. Molecular Cell, 81(20):4333 4345.e4, 2021.

[55] Zhang, D., Fu, J., Bengio, Y., and Courville, A. Unifying likelihood-free inference with black-box optimization and beyond. ar Xiv preprint ar Xiv:2110.03372, 2021.

[56] Zhang, D., Chen, R. T., Malkin, N., and Bengio, Y. Unifying generative models with gflownets. ar Xiv preprint ar Xiv:2209.02606, 2022.

[57] Zhang, D., Malkin, N., Liu, Z., Volokhova, A., Courville, A., and Bengio, Y. Generative flow networks for discrete probabilistic modeling. In International Conference on Machine Learning, pp. 26412 26428. PMLR, 2022.

[58] Zhang, D., Pan, L., Chen, R. T., Courville, A., and Bengio, Y. Distributional gflownets with quantile flows. ar Xiv preprint ar Xiv:2302.05793, 2023.

[59] Zhao, F., Zhang, T., Sun, X., Zhang, X., Chen, L., Wang, H., Li, J., Fan, P., Lai, L., Sui, T., et al. A strategy for cas13 miniaturization based on the structure and alphafold. Nature Communications, 14(1):5545, 2023.

[60] Zimmermann, H., Lindsten, F., van de Meent, J.-W., and Naesseth, C. A. A variational perspective on generative flow networks. ar Xiv preprint ar Xiv:2210.07992, 2022.

[61] Zrimec, J., Fu, X., Muhammad, A. S., Skrekas, C., Jauniskis, V., Speicher, N. K., Börlin, C. S., Verendel, V., Chehreghani, M. H., Dubhashi, D., et al. Controlling gene expression with deep generative design of regulatory dna. Nature communications, 13(1):5099, 2022.

A Proof of Theorem 4.1

Let z denotes a sequence with length T generated from scratch using the policy πF ( ) as

πF (a|z:t) = Fθ(z:t + a) P

a A Fθ(z:t + a ). (13)

The expected reward of z can be obtained as

RF,T = E[z] = X

w TT Pr[z = w]R(w) = X

t=1 πF (wt|w:t 1)R(w) (14)

where TT denotes the set of sequences with length T that can be generated by Fθ( ). The probability that the GFNSeq Editor outputs an arbitrary sequence w TT given x can be expressed as

Pr[ˆx = w|x] =

t=1 Pr[ˆxt = wt|ˆx:t 1, x]. (15)

The probability Pr[ˆxt = wt|ˆx:t 1, x] can be obtained as Pr[ˆxt = wt|ˆx:t 1, x] = Pr[D(ˆx:t 1; δ, σ) = 1]π(wt|w:t 1) + Pr[D(ˆx:t 1; δ, σ) = 0]1wt=xt Pr[D(ˆx:t 1; δ, σ) = 1]π(wt|w:t 1) (16) where π( ) defined in (6). According to (6), it can be written that

π(wt|w:t 1) (1 λ) Fθ(w:t) P

a A Fθ(w:t 1 + a ) = (1 λ)πF (wt|w:t 1). (17)

Furthermore, according to (4) and (5), we have D(ˆx:t 1; δ, σ) = 1 if Fθ(ˆx:t 1 + xt) P

a A Fθ(ˆx:t 1 + a ) δ max a A Fθ(ˆx:t 1 + a) P

a A Fθ(ˆx:t 1 + a ) < ν. (18)

In addition, it can be inferred that Fθ(ˆx:t 1 + xt) P

a A Fθ(ˆx:t 1 + a ) δ max a A Fθ(ˆx:t 1 + a) P

a A Fθ(ˆx:t 1 + a ) 1 δ. (19)

Therefore, it can be concluded that if ν > 1 δ, it is guaranteed that D(ˆx:t 1; δ, σ) = 1. Since ν follows a Gaussian distribution with a variance of σ2 we have ν > 1 δ with probability 1 Φ( 1 δ

σ ). Hence, it can be written that

Pr[D(ˆx:t 1; δ, σ) = 1] 1 Φ(1 δ

Combining (20) and (17) with (16), we get

Pr[ˆxt = wt|ˆx:t 1, x] (1 λ)πF (wt|w:t 1) 1 Φ(1 δ

Moreover, combining (21) with (15), we obtain

Pr[ˆx = w|x]

t=1 (1 λ)πF (wt|w:t 1) 1 Φ(1 δ

Using (22) for the expected reward of ˆx given x we can write

E[R(ˆx)|x] = X

w TT Pr[ˆx = w|x]R(w)

t=1 (1 λ)πF (wt|w:t 1) 1 Φ(1 δ

σ ) R(w). (23)

Combining (23) with (14), we get

E[R(ˆx)|x] (1 λ) 1 Φ(1 δ

σ ) RF,T (24)

which proves (8). Moreover, the upper bound of property improvement by the proposed GFNSeq Editor is analyzed in Appendix B.

B Proof of Theorem 4.2

The expected property improvement of GFNSeq Editor can be obtained as

E[PI|x] = X

w TT Pr[ˆx = w|x](pw y). (25)

Since TT can be split into two sets S x and TT \ S x, the expected property improvement of GFNSeq Editor can be obtained as

E[PI|x] = X

w S x Pr[ˆx = w|x](pw y) + X

w TT \S x Pr[ˆx = w|x](pw y). (26)

If w TT \ S x, then pw y. Therefore, the expected property improvement of GFNSeq Editor can be bounded from above as

w S x Pr[ˆx = w|x](pw y). (27)

The probability that the GFNSeq Editor outputs w S x can be expressed as

Pr[ˆx = w|x] =

t=1 Pr[ˆxt = wt|ˆx:t 1, x]. (28)

The probability Pr[ˆxt = wt|ˆx:t 1, x] can be obtained as

Pr[ˆxt = wt|ˆx:t 1, x] = Pr[D(ˆx:t 1; δ, σ) = 1]π(wt|w:t 1) + Pr[D(ˆx:t 1; δ, σ) = 0]1xt=wt. (29)

If xt = wt, according to (37) and considering the fact that π(wt|w:t 1) 1, the probability in (29) can be bounded from above as

Pr[ˆxt = wt|ˆx:t 1, x] 1 Φ( δ

Otherwise if xt = wt, it can be written that Pr[ˆxt = wt|ˆx:t 1, x] 1. Since any w S x should be different from x in at least one position, combining (28) with (30) we can conclude that

Pr[ˆx = w|x] 1 Φ( δ

Combining (27) with (31) proves the Theorem.

C Proof of Theorem 4.3

We obtain the upper bound for the expected distance between edited sequence ˆx and the original sequence x. Since both x and ˆx have the same length T, the distance lev(x, ˆx) can be interpreted as the number of elements different in these two sequences. Therefore, in order to obtain lev(x, ˆx), it is sufficient to find the number of times that xt = ˆxt, t : 1 t T. If D(ˆx:t 1; δ, σ) = 0, then ˆxt = xt. Furthermore, if D(ˆx:t 1; δ, σ) = 1, then according to (6), we have ˆxt = xt with probability

(1 λ) Fθ(ˆx:t 1 + xt) P

a A Fθ(ˆx:t 1 + a ) + λ. (32)

Therefore, the probability Pr[ˆxt = xt] can be obtained as

Pr[ˆxt = xt] = Pr[D(ˆx:t 1; δ, σ) = 1](1 λ) 1 Fθ(ˆx:t 1 + xt) P

a A Fθ(ˆx:t 1 + a )

Since Fθ(ˆx:t 1 + xt) 0, the probability Pr[ˆxt = xt] can be bounded as

Pr[ˆxt = xt] Pr[D(ˆx:t 1; δ, σ) = 1](1 λ). (34)

Moreover, if we have

ν Fθ(ˆx:t 1 + xt) P

a A Fθ(ˆx:t 1 + a ) δ max a A Fθ(ˆx:t 1 + a) P

a A Fθ(ˆx:t 1 + a ) (35)

then D(ˆx:t 1; δ, σ) = 0. Furthermore, the right hand side of (35) can be bounded from below as

Fθ(ˆx:t 1 + xt) P

a A Fθ(ˆx:t 1 + a ) δ max a A Fθ(ˆx:t 1 + a) P

a A Fθ(ˆx:t 1 + a ) δ. (36)

Therefore, if ν δ, it is ensured that D(ˆx:t 1; δ, σ) = 0. The probability that ν δ is Φ( δ

σ). Hence, we can conclude that

Pr[D(ˆx:t 1; δ, σ) = 1] = 1 Pr[D(ˆx:t 1; δ, σ) = 0] 1 Φ( δ

Combining (37) with (34), we arrive at

Pr[ˆxt = xt] 1 Φ( δ

σ ) (1 λ). (38)

Moreover, since both x and ˆx have the same length T, the expected Levenshtein distance between x and ˆx can be obtained as

E[lev(x, ˆx)] =

t=1 Pr[ˆxt = xt]. (39)

Thus, combining (39) with (38), we can write that

E[lev(x, ˆx)] 1 Φ( δ

σ ) (1 λ)T (40)

which proves the Theorem.

D Proof of Theorem 4.4

According to (33) and the assumption in (11), it can be written that

Pr[ˆxt = xt] Pr[D(ˆx:t 1; δ, σ) = 1](1 λ)ϵ. (41)

Combining (41) with (20), we get

Pr[ˆxt = xt] ϵ(1 λ) 1 Φ(1 δ

Summing (42) over all elements in the sequence proves the theorem.

E Supplementary Experimental Results and Details

This appendix provides a comprehensive overview of the experimental setup in Section 5 and presents additional supplementary experimental results.

E.1 Implementation Details

All training and inferences, including GFNSeq Editor, have been conducted using a single Nvidia Quadro P6000.

E.1.1 Datasets

Detailed information about the datasets can be found below:

TFbinding: The dataset is taken from [4] and contains all possible DNA sequences with length 8. The vocabulary is the four DNA bases, {A, C, G, T}. The goal is to edit a given DNA sequence to increase its binding activity with certain DNA-binding proteins called transcription factors. Higher binding activity is preferable. For train, test and validation purposes 50% of the dataset is set aside. The task entails editing a test dataset consisting of 10% of samples while the remaining data is utilized for training and validation. AMP: The dataset, acquired from DBAASP [40], is curated following the approach outlined by [19]. Peptides (i.e. short proteins) within a sequence-length range of 12 to 60 amino acids are specifically chosen. The dataset comprises a total of 6, 438 positive samples, representing anti-microbial peptides (AMPs), and 9,522 negative samples, which are non AMPs. The vocabulary consists of 20 amino acids. The primary objective is to edit the non-AMP samples in such a way that the edited versions attain the characteristics exhibited by AMP samples. The task primarily centers on editing a subset comprising 10% of the non-AMP samples, designated for use as test samples, with the remaining samples allocated for training and validation purposes. CRE: The dataset contains putative human cis-regulatory elements (CRE) which are regulatory DNA sequences modulating gene expression. CREs were profiled via massively parallel reporter assays (MPRAs)[12] where the activity is measured as the expression of the reporter gene. For our analysis, we randomly extract 10, 000 DNA sequences, each with a length of 200 base pairs, utilizing a vocabulary of the four bases. The overarching objective is to edit the DNA sequences to increase the reporter gene s expression specifically within the K562 cell line, which represents erythroid precursors in leukemia. The task involves editing a subset of 1, 000 test samples, while the rest are allocated for training and validation purposes.

E.1.2 Oracles

To evaluate the performance of each sequence editing method in terms of property improvement, it is required to obtain the properties of edited sequences. To this end, we employ an oracle for each dataset. The TFbinding dataset contains all possible 65, 792 DNA sequences with length of 8. Therefore, by looking into the dataset the true label of each edited sequence can be found. Following [1, 19], the AMP dataset is split into two parts: D1 and D2. The oracle for the AMP dataset is a set of trained models on partition D2 as a simulation of wet-lab experiments. We employed oracles trained by [19] for AMP dataset. It is worth noting that the performance of predictive models on AMP datasets can be influenced by negative sampling in the dataset [44]. Furthermore, for CRE dataset we leverage the Malinois model [12] which is a deep convolutional neural network (CNN) for cell type-informed CRE activity prediction of any arbitrary sequence.

E.1.3 Baselines Implementation

DE and Ledidi. In order to implement DE and Ledidi baselines, there should be a proxy model to enable baselines to evaluate their candidate edits at each iteration of these algorithms. For each dataset, we train a proxy model on the training split of each dataset. For the TFBinding dataset, we configure a three-layer MLP with hidden dimensions of 64. In the case of AMP, we opt for a four-layer MLP, also with hidden dimensions of 64. Finally, for CRE, we utilize a four-layer MLP with hidden dimensions set to 2048. Across all models, the learning rate is consistently set to 10 4, Re LU serves as the activation function, and we set the number of epochs as 2, 000. To implement the DE baseline, we randomly select edit locations based on the desired edit percentages. At each selected location, we apply an edit by choosing the action that maximizes the proxy model s property prediction.

La MBO. We utilize the official implementation of La MBO4. Test sequences5 serve as the candidate pool, and all candidate samples in the pool are weighted similarly to maintain consistency with other sequence editing baselines. To ensure a fair comparison, we employ the same proxy as GFNSeq Editor to calculate the property scores. Across all three datasets, we use mlm as the encoder objective, ei as the acquisition function, and DKL SVGP regression as the surrogate. To generate 10 edits per sample, we configure num-gens=10, and window-size is adjusted for each dataset to ensure the edited

4https://github.com/samuelstanton/lambo. 5For the AMP dataset, we have removed the samples with a length lower than window-size.

percentages closely match the desired values. Hence, we set it to 2, 14, and 23 for TFBinding, AMP, and CRE, respectively. Additionally, for all datasets, we set pref-cond=False, pref-alpha=1, and beta-sched=1.

MOGFN-AL. We utilize the official implementation of MOGFN-AL6, where sequence tasks are based on the La MBO implementation. Similar to La MBO, test sequences serve as the candidate pool, weighted equally. The same proxy as GFNSeq Editor and La MBO is employed to calculate property scores. Like La MBO, we utilize mlm as the encoder objective. Training and validation batch sizes are set to 16 and 64, respectively, with hyperparameters for the conditional transformer as follows: num-hid=128, num-layers=3, and num-head=8. For each dataset, we adjust the max-len parameter to closely match the desired edited percentage. Thus, for TFBinding, AMP, and CRE, we set them as 6, 30, and 50, respectively. The reward function is modified to match that of GFNSeq Editor. To ensure fair comparisons and adapt La MBO and MOGFN-AL for sequence editing, we do not use active learning settings for either baseline.

Seq2Seq. In order to implement Seq2Seq baseline we use a standard transformer [53] as the translator to map an input sequence to an output sequence with superior property. We paired samples in each dataset such that each pair contains a sequence with lower property and a similar sequence with higher property which is the most similar to the sequence with lower property in the dataset. The transformer is trained to map the low property sequence to the high property sequence in each pair. The transformer is trained using the standard configurations in Pytorch transformer module tutorial. Both the embedding dimension of the transformer and the dimension of the 2 layer feedforward network model in the transformer encoder are set to 200. The number of heads in multihead attention layer is 2 and the drop-out rate is 0.2. We employ the Cross Entropy Loss function in conjunction with the stochastic gradient descent optimizer. The initial learning rate is set at 5.0 and follows a Step LR schedule.

E.1.4 GFlow Net Training

Both the baseline GFlow Net-E and the proposed GFNSeq Editor use the same trained GFlow Net model. We trained an active learning based GFlow Net model following the setting in [19]. In active learning setting, at each round of active learning t K candidates generated by GFlow Net are sampled and then top K samples based on scores given by a proxy are chosen to be added to the offline dataset. Here offline dataset refers to an initial labeled dataset. To train the GFlow Net, we employed the same proxy models as those used by other baseline methods. For all datasets, we set the number of active learning rounds to 1, with t equal to 5 and K equal to 100. We parameterize the flow using a MLP comprising two hidden layers, each with a dimension of 2048, and |A| outputs corresponding to individual actions. Throughout our experiments, we employ the trajectory balance objective for training. Adam optimizer with (β0, β1) = (0.9, 0.999) is utilized during the training process. The learning rate for log Z in trajectory balance loss is set to 10 3 for all the experiments. The number of training steps for TFbinding, AMP and CRE are 5000, 106 and 104, respectively. The remaining hyperparameters were configured in accordance with the settings established in [19].

E.2 Diffusion Model Training

We trained our diffusion models on the full sequence datasets of AMP sequences or CRE sequences. The sequences were one-hot encoded, yielding 20-vectors for protein sequences and 4-vectors for DNA sequences.

We employed the variance-preserving stochastic differential equation (VP-SDE) [47]. We used a variance schedule of β(t) = 0.9t + 0.1. We set our time horizon T = 1 (i.e. t [0, 1)). This amounts to adding Gaussian noise over continuous time.

For our discrete-time diffusion model, we defined a discrete-time Gaussian noising process, following [17]. We defined βt = (1 10 4) + (1 10 5)t. We set our time horizon T = 1000 (i.e. t [0, 1000]).

Our denoising network was based on a transformer architecture. The time embedding was computed as [sin(2π t

T z), cos(2π t

T z)], where z is a 30-vector of Gaussian-distributed parameters that are not trainable. The time embeddings were passed through two dense layers with a sigmoid in between,

6https://github.com/MJ10/mogfn-al.

0.2 0.4 0.6 0.8 Anti-microbial property

GFNSeq Editor (output) non-AMP (input) AMP

Figure 6: GFNSeq Editor shifts the distribution of non-AMP inputs to the known AMPs.

Property improvement

Expression improvemen

Figure 7: Studying the effect of hyperparameter σ on the diversity and performance of GFNSeq Editor over AMP (left) and CRE (right) datasets.

mapping to a 256-vector of time representations. For any input in a batch, it was concatenated with the time embedding and a sinusoidal positional embedding (defined in [53]) of dimension 64. This concatenation was passed to a linear layer to map it to 128 dimensions. This was then passed to a standard transformer encoder of 3 layers, 8 attention heads, and with a hidden dimension of 128 and an MLP dimension of 64. The result was then passed to a linear layer which projected back to the input dimension.

We trained our diffusion models with a learning rate of 0.001, for 100 epochs. We noted that the loss had converged for all models at that point. We also employed empirical loss weighting, where the loss of each input in a batch is divided by the L2 norm of the true Stein score.

We trained our diffusion models on a single Nvidia Quadro P6000.

When generating samples from a continuous-time diffusion model, we used the predictor-corrector algorithm defined in [47], using 1000 time steps from T to 0. We then rounded all outputs to the nearest integer to recover the one-hot encoded sample.

E.3 Supplementary Results for Sequence Editing

In Figure 6, we illustrate the distribution of input non-AMP sequences, the sequences edited by GFNSeq Editor, and the AMP samples from the AMP dataset. It is evident from Figure 6 that GFNSeq Editor shifts the property distribution of input non-AMP sequences towards that of AMP sequences. Moreover, Figure 7 illustrates the impact of changing σ on property improvement and edit diversity for GFNSeq Editor. As can be seen increasing σ results in decreased property improvement and enhanced diversity.

It is worth noting that GFNSeq Editor is capable of performing edits even when certain portions of the input sequence are masked and cannot be modified. Table 3 showcases the performance of GFNSeq Editor compared to Ledidi on the CRE dataset, with the first 100 elements of the input sequences masked. As depicted in Table 3, GFNSeq Editor achieves significantly greater property improvement than Ledidi while utilizing a lower edit percentage.

E.4 Supplementary Results for Sequence Generation

This Subsection compares the performance of GFNSeq Editor in sequence generation task with that of GFlow Net and diffusion model (DM) on CRE dataset. We relax the hyperparameters to allow a higher amount of edits and we set δ = 0.4, λ = 0.1 and σ = 0.001 for GFNSeq Editor. The results are presented in Table 4. GFlow Net and DM generate 1000 sequences. GFNSeq Editor also generates 1000 sequences by editing each of 1000 samples in the test dataset. As can be seen, GFNSeq Editor

Table 3: Performance of GFNSeq Editor and Ledidi with 100 elements of each sequence masked for editing for CRE dataset.

Algorithms PI EP(%) Diversity PI EP(%) Diversity Ledidi 0.52 18.69 38.34 0.26 14.39 37.45 GFNSeq Editor 4.79 17.89 32.30 4.05 14.19 25.52

Table 4: Performance of GFNSeq Editor, GFlow Net and DM on generating new sequences for CRE dataset.

Algorithms Property Diversity Distance(%) DM 1.75 107.38 63.59 GFlow Net 28.20 83.88 54.41 GFNSeq Editor 29.25 87.32 47.34

achieves higher property than both GFlow Net and DM. It is useful to note that the experimental study by [19] have shown that GFlow Net outperforms state-of-the-art sequence design methods. For each sequence generated by GFlow Net and DM, the distance to the test set is measured as the distance between the generated sequence and its closest counterpart in the test set. On average, the distance between sequences generated by GFlow Net and the test set is 54.34%, while for DM, it is 63.59%. GFNSeq Editor achieves superior performance by editing, on average, 47.34% of a sequence in the test dataset. The distance between test set and generated sequences by GFlow Net and DM cannot be controlled. As it is studied in Figures 3 and 7, the amounts of edits performed by GFNSeq Editor can be controlled by hyperparameters δ, λ and σ.

E.5 Supplementary Discussion and Results for Sequence Combination

Algorithm 2 presents the GFNSeq Editor for combining two sequences in order to obtain a new sequence whose length is the length of shorter sequence. In Figure 5, we depict the distributions of input sequence lengths and properties, alongside the lengths and properties of the outputs generated by GFNSeq Editor. This scenario pertains to the combination of a long AMP sequence with a short AMP sequence, as detailed in Subsection 5.3. As depicted in Figure 5, the edited sequences produced by GFNSeq Editor exhibit property distributions akin to those of the lengthy input AMP sequences. Simultaneously, these edited sequences are considerably shorter than the original long input sequences. This highlights GFNSeq Editor s effectiveness in shortening lengthy AMP sequences while preserving their inherent properties.

Furthermore, Table 6 provides results for combining pairs of AMP sequences as well as pairs consisting of an AMP sequence and a non-AMP sequence. In both cases, GFNSeq Editor generates a sequence with a length matching that of the longer sequence. When combining two AMP sequences, GFNSeq Editor produces new sequences with higher properties than their parent sequences, maintaining an average resemblance of over 60% to each parent. Additionally, GFNSeq Editor can be applied to combine a non-AMP sequence with an AMP sequence, offering the advantage of rendering the edited non-AMP sequence more akin to a genuine AMP sequence. The results in Table 6 demonstrate that GFNSeq Editor substantially enhances the properties of non-AMP sequences, surpassing the properties of their AMP parents. Furthermore, on average, 35% of the edited sequences bear a resemblance to their AMP parents.

F Supplementary Related Works

GFlow Nets, initially proposed by [5], were introduced as a reinforcement-learning (RL) algorithm designed to expand upon maximum-entropy RL, effectively handling scenarios with multiple paths leading to a common state. However, recent studies have redefined and generalized its scope, describing it as a general framework for amortized inference with neural networks [32, 20, 60, 56].

There has been a recent surge of interest in employing GFlow Nets across various domains. Noteworthy examples include its utilization in molecule discovery [5], Bayesian structure learning [11, 36], and

Algorithm 2 GFNSeq Editor for combining two sequences to shorten the length of longer sequence.

1: Input: x1 and x2 with lengths T1 and T2, flow function Fθ( ) and parameters δ, λ and σ. 2: Initialize ˆx:0 as an empty sequence and Tmin = min{T1, T2}. 3: for t = 1, . . . , Tmin do 4: Assign xt = arg max{x1,t,x2,t}{Fθ(ˆx:t 1 + x1,t), Fθ(ˆx:t 1 + x2,t)}. 5: Check if xt is sub-optimal by obtaining D(xt, ˆx:t 1; δ, σ) according to (5). 6: if D(xt, ˆx:t 1; δ, σ) = 1 then 7: Sample ˆxt according to policy π( |ˆx:t 1) as follows: 8: if T1 > T2 then 9: π(a|ˆx:t 1) = (1 λ) Fθ(ˆx:t 1+a) P

a A Fθ(ˆx:t 1+a ) + λ1a=x1,t.

10: else 11: π(a|ˆx:t 1) = (1 λ) Fθ(ˆx:t 1+a) P

a A Fθ(ˆx:t 1+a ) + λ1a=x2,t.

12: end if 13: else 14: Assign ˆxt = xt. 15: end if 16: end for 17: Output: Edited sequence ˆx.

Table 5: Performance of GFNSeq Editor for sequence reduction on AMP dataset in terms of variation in property, edit percentage of long sequences (EPLS), edit percentage of short sequences (EPSS), and percentage of length reduction in the long sequences.

Input Property Output Property EPLS(%) EPSS(%) Sequence Reduction(%)

0.65 0.67 35.96 44.65 63.23

graph explainability [29]. Recognizing its significance, several studies have emerged to enhance the learning efficiency of GFlow Nets [6, 32, 30, 43] since the introduction of the flow matching learning objective by [5]. Moreover, GFlow Nets have demonstrated adaptability in being jointly trained with energy and reward functions [57]. [38] introduce intrinsic exploration rewards into GFlow Nets, addressing exploration challenges within sparse reward tasks. A couple of recent studies try to extend GFlow Nets to stochastic environments, accommodating stochasticity in transition dynamics [39] and rewards [58]. Several novel GFlow Net training methodologies have been recently proposed in [22, 26, 37, 14]. The application of GFlow Nets when a predefined reward function is not accessible is explored in [8]. Distributed training of GFlow Nets is discussed in [45]. Accelerating GFlow Net training is investigated in [25]. Moreover, [27] employs GFlow Nets for designing DNA-encoded libraries. To reduce the need for expensive reward evaluations, [24] proposes a new GFlow Net-based method for molecular optimization.

The aforementioned works have primarily focused on theoretical developments of GFlow Net and its application in molecular generation, without directly addressing the challenges associated with sequence design or editing. In a departure from this trend, and inspired by Bayesian Optimization, [19] proposed a new active learning algorithm based on GFlow Nets, i.e. GFlow Net-AL to design novel biological sequences. GFlow Net-AL [19] utilizes the epistemic uncertainty of the surrogate model within its reward function, guiding the GFlow Net towards the optimization of promising yet less-explored regions within the state space. This approach fosters the generation of a diverse set of de novo sequences from scratch and token-by-token. Unlike GFNSeq Editor, it lacks the capability to edit input seed sequences and combine multiple sequences. This distinction underscores the unique contribution of GFNSeq Editor in addressing the sequence editing problem, positioning it as a valuable addition to the existing literature on GFLow Net.

G Societal Impact

Biological sequence optimization and design hold transformative potential for biotechnology and health, offering enhanced therapeutic solutions and a vast range of applications. Techniques that

Table 6: Performance of GFNSeq Editor for sequence combination over AMP dataset in terms of property improvements of first (PI-S1) and second (PI-S2) sequences, edit percentages of first (EP-S1) and second (EP-S2) sequences, and diversity.

Seq1 Seq2 PI-S1 PI-S2 EP-S1(%) EP-S2(%) Diversity AMP AMP 0.05 0.06 36.10 39.47 7.29 non-AMP AMP 0.41 0.04 41.41 65.39 12.77

enable refining sequences can lead to advancements like elucidating the role of individual gene products, rectifying genetic mutations in afflicted tissues, and optimizing properties of peptides, antibodies, and nucleic-acid therapeutics. However, the dual-edged nature of such breakthroughs must be acknowledged, as the same research might be misappropriated for unintended purposes. Our method can be instrumental in refining diagnostic procedures and uncovering the genetic basis of diseases, which promises a deeper grasp of genetic factors in diseases. Yet, we must approach with caution, as these advancements may unintentionally amplify health disparities for marginalized communities. As researchers, we emphasize the significance of weighing the potential societal benefits against unintended consequences while remaining optimistic about our work s predominant inclination towards beneficial outcomes.

Neur IPS Paper Checklist

Question: Do the main claims made in the abstract and introduction accurately reflect the paper s contributions and scope?

Answer: [Yes]

Justification: The abstract and introduction explain that the paper study biological sequence editing using GFlow Net. The abstract and introduction summarize contributions and main results of the paper.

Guidelines:

The answer NA means that the abstract and introduction do not include the claims made in the paper. The abstract and/or introduction should clearly state the claims made, including the contributions made in the paper and important assumptions and limitations. A No or NA answer to this question will not be perceived well by the reviewers. The claims made should match theoretical and experimental results, and reflect how much the results can be expected to generalize to other settings. It is fine to include aspirational goals as motivation as long as it is clear that these goals are not attained by the paper.

2. Limitations

Question: Does the paper discuss the limitations of the work performed by the authors?

Answer: [Yes]

Justification: In conclusion Section 6 and Appendix G, the paper discusses its limitations.

Guidelines:

The answer NA means that the paper has no limitation while the answer No means that the paper has limitations, but those are not discussed in the paper. The authors are encouraged to create a separate "Limitations" section in their paper. The paper should point out any strong assumptions and how robust the results are to violations of these assumptions (e.g., independence assumptions, noiseless settings, model well-specification, asymptotic approximations only holding locally). The authors should reflect on how these assumptions might be violated in practice and what the implications would be. The authors should reflect on the scope of the claims made, e.g., if the approach was only tested on a few datasets or with a few runs. In general, empirical results often depend on implicit assumptions, which should be articulated. The authors should reflect on the factors that influence the performance of the approach. For example, a facial recognition algorithm may perform poorly when image resolution is low or images are taken in low lighting. Or a speech-to-text system might not be used reliably to provide closed captions for online lectures because it fails to handle technical jargon. The authors should discuss the computational efficiency of the proposed algorithms and how they scale with dataset size. If applicable, the authors should discuss possible limitations of their approach to address problems of privacy and fairness. While the authors might fear that complete honesty about limitations might be used by reviewers as grounds for rejection, a worse outcome might be that reviewers discover limitations that aren t acknowledged in the paper. The authors should use their best judgment and recognize that individual actions in favor of transparency play an important role in developing norms that preserve the integrity of the community. Reviewers will be specifically instructed to not penalize honesty concerning limitations.

3. Theory Assumptions and Proofs

Question: For each theoretical result, does the paper provide the full set of assumptions and a complete (and correct) proof?

Answer: [Yes] Justification: The paper provides the full set of assumptions and complete, correct proofs for all theoretical results. Each theorem is proven, with the proofs detailed in the appendices. The main text appropriately references these sections, ensuring that readers can easily locate and verify the proofs. Guidelines:

The answer NA means that the paper does not include theoretical results. All the theorems, formulas, and proofs in the paper should be numbered and crossreferenced. All assumptions should be clearly stated or referenced in the statement of any theorems. The proofs can either appear in the main paper or the supplemental material, but if they appear in the supplemental material, the authors are encouraged to provide a short proof sketch to provide intuition. Inversely, any informal proof provided in the core of the paper should be complemented by formal proofs provided in appendix or supplemental material. Theorems and Lemmas that the proof relies upon should be properly referenced. 4. Experimental Result Reproducibility

Question: Does the paper fully disclose all the information needed to reproduce the main experimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper (regardless of whether the code and data are provided or not)? Answer: [Yes] Justification: In Section 5 and Appendix E.1, we provide details about implementation and training of baselines and the proposed algorithm. Guidelines:

The answer NA means that the paper does not include experiments. If the paper includes experiments, a No answer to this question will not be perceived well by the reviewers: Making the paper reproducible is important, regardless of whether the code and data are provided or not. If the contribution is a dataset and/or model, the authors should describe the steps taken to make their results reproducible or verifiable. Depending on the contribution, reproducibility can be accomplished in various ways. For example, if the contribution is a novel architecture, describing the architecture fully might suffice, or if the contribution is a specific model and empirical evaluation, it may be necessary to either make it possible for others to replicate the model with the same dataset, or provide access to the model. In general. releasing code and data is often one good way to accomplish this, but reproducibility can also be provided via detailed instructions for how to replicate the results, access to a hosted model (e.g., in the case of a large language model), releasing of a model checkpoint, or other means that are appropriate to the research performed. While Neur IPS does not require releasing code, the conference does require all submissions to provide some reasonable avenue for reproducibility, which may depend on the nature of the contribution. For example (a) If the contribution is primarily a new algorithm, the paper should make it clear how to reproduce that algorithm. (b) If the contribution is primarily a new model architecture, the paper should describe the architecture clearly and fully. (c) If the contribution is a new model (e.g., a large language model), then there should either be a way to access this model for reproducing the results or a way to reproduce the model (e.g., with an open-source dataset or instructions for how to construct the dataset). (d) We recognize that reproducibility may be tricky in some cases, in which case authors are welcome to describe the particular way they provide for reproducibility. In the case of closed-source models, it may be that access to the model is limited in some way (e.g., to registered users), but it should be possible for other researchers to have some path to reproducing or verifying the results.

5. Open access to data and code

Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: The data is already public, as we stated in the experiments. We will release the code as well upon company approval. Guidelines:

The answer NA means that paper does not include experiments requiring code. Please see the Neur IPS code and data submission guidelines (https://nips.cc/ public/guides/Code Submission Policy) for more details. While we encourage the release of code and data, we understand that this might not be possible, so No is an acceptable answer. Papers cannot be rejected simply for not including code, unless this is central to the contribution (e.g., for a new open-source benchmark). The instructions should contain the exact command and environment needed to run to reproduce the results. See the Neur IPS code and data submission guidelines (https: //nips.cc/public/guides/Code Submission Policy) for more details. The authors should provide instructions on data access and preparation, including how to access the raw data, preprocessed data, intermediate data, and generated data, etc. The authors should provide scripts to reproduce all experimental results for the new proposed method and baselines. If only a subset of experiments are reproducible, they should state which ones are omitted from the script and why. At submission time, to preserve anonymity, the authors should release anonymized versions (if applicable). Providing as much information as possible in supplemental material (appended to the paper) is recommended, but including URLs to data and code is permitted. 6. Experimental Setting/Details

Question: Does the paper specify all the training and test details (e.g., data splits, hyperparameters, how they were chosen, type of optimizer, etc.) necessary to understand the results? Answer: [Yes] Justification: The paper provides detailed information about all the training and test details in Section 5 and Appendix E.1. Guidelines:

The answer NA means that the paper does not include experiments. The experimental setting should be presented in the core of the paper to a level of detail that is necessary to appreciate the results and make sense of them. The full details can be provided either with the code, in appendix, or as supplemental material. 7. Experiment Statistical Significance

Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments? Answer: [Yes] Justification: The paper presents Figures 4, 5, and 6, which show the distribution of results generated by the proposed algorithm, illustrating the statistical significance of the experiments. Guidelines:

The answer NA means that the paper does not include experiments. The authors should answer "Yes" if the results are accompanied by error bars, confidence intervals, or statistical significance tests, at least for the experiments that support the main claims of the paper.

The factors of variability that the error bars are capturing should be clearly stated (for example, train/test split, initialization, random drawing of some parameter, or overall run with given experimental conditions). The method for calculating the error bars should be explained (closed form formula, call to a library function, bootstrap, etc.) The assumptions made should be given (e.g., Normally distributed errors). It should be clear whether the error bar is the standard deviation or the standard error of the mean. It is OK to report 1-sigma error bars, but one should state it. The authors should preferably report a 2-sigma error bar than state that they have a 96% CI, if the hypothesis of Normality of errors is not verified. For asymmetric distributions, the authors should be careful not to show in tables or figures symmetric error bars that would yield results that are out of range (e.g. negative error rates). If error bars are reported in tables or plots, The authors should explain in the text how they were calculated and reference the corresponding figures or tables in the text. 8. Experiments Compute Resources

Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [Yes]

Justification: In Appendix E.1, we specify that all training and inferences, including GFNSeq Editor, have been conducted using a single Nvidia Quadro P6000. Guidelines:

The answer NA means that the paper does not include experiments. The paper should indicate the type of compute workers CPU or GPU, internal cluster, or cloud provider, including relevant memory and storage. The paper should provide the amount of compute required for each of the individual experimental runs as well as estimate the total compute. The paper should disclose whether the full research project required more compute than the experiments reported in the paper (e.g., preliminary or failed experiments that didn t make it into the paper). 9. Code Of Ethics

Question: Does the research conducted in the paper conform, in every respect, with the Neur IPS Code of Ethics https://neurips.cc/public/Ethics Guidelines? Answer: [Yes]

Justification: We believe that the research conducted in this paper conform, in every respect, with the Neur IPS Code of Ethics. Guidelines:

The answer NA means that the authors have not reviewed the Neur IPS Code of Ethics. If the authors answer No, they should explain the special circumstances that require a deviation from the Code of Ethics. The authors should make sure to preserve anonymity (e.g., if there is a special consideration due to laws or regulations in their jurisdiction). 10. Broader Impacts

Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work performed? Answer: [Yes] Justification: The societal impact discussion is provided in Appendix G. Guidelines:

The answer NA means that there is no societal impact of the work performed.

If the authors answer NA or No, they should explain why their work has no societal impact or why the paper does not address societal impact. Examples of negative societal impacts include potential malicious or unintended uses (e.g., disinformation, generating fake profiles, surveillance), fairness considerations (e.g., deployment of technologies that could make decisions that unfairly impact specific groups), privacy considerations, and security considerations. The conference expects that many papers will be foundational research and not tied to particular applications, let alone deployments. However, if there is a direct path to any negative applications, the authors should point it out. For example, it is legitimate to point out that an improvement in the quality of generative models could be used to generate deepfakes for disinformation. On the other hand, it is not needed to point out that a generic algorithm for optimizing neural networks could enable people to train models that generate Deepfakes faster. The authors should consider possible harms that could arise when the technology is being used as intended and functioning correctly, harms that could arise when the technology is being used as intended but gives incorrect results, and harms following from (intentional or unintentional) misuse of the technology. If there are negative societal impacts, the authors could also discuss possible mitigation strategies (e.g., gated release of models, providing defenses in addition to attacks, mechanisms for monitoring misuse, mechanisms to monitor how a system learns from feedback over time, improving the efficiency and accessibility of ML).

11. Safeguards

Question: Does the paper describe safeguards that have been put in place for responsible release of data or models that have a high risk for misuse (e.g., pretrained language models, image generators, or scraped datasets)?

Answer: [NA]

Justification: We believe that the paper poses no such risks.

Guidelines:

The answer NA means that the paper poses no such risks. Released models that have a high risk for misuse or dual-use should be released with necessary safeguards to allow for controlled use of the model, for example by requiring that users adhere to usage guidelines or restrictions to access the model or implementing safety filters. Datasets that have been scraped from the Internet could pose safety risks. The authors should describe how they avoided releasing unsafe images. We recognize that providing effective safeguards is challenging, and many papers do not require this, but we encourage authors to take this into account and make a best faith effort.

12. Licenses for existing assets

Question: Are the creators or original owners of assets (e.g., code, data, models), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected?

Answer: [Yes] Justification: In Section 5 and Appendix E.1, we properly cited the original papers that produced the code packages or datasets we used.

Guidelines:

The answer NA means that the paper does not use existing assets. The authors should cite the original paper that produced the code package or dataset. The authors should state which version of the asset is used and, if possible, include a URL. The name of the license (e.g., CC-BY 4.0) should be included for each asset. For scraped data from a particular source (e.g., website), the copyright and terms of service of that source should be provided.

If assets are released, the license, copyright information, and terms of use in the package should be provided. For popular datasets, paperswithcode.com/datasets has curated licenses for some datasets. Their licensing guide can help determine the license of a dataset. For existing datasets that are re-packaged, both the original license and the license of the derived asset (if it has changed) should be provided. If this information is not available online, the authors are encouraged to reach out to the asset s creators.

13. New Assets

Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets?

Answer: [NA]

Justification: The paper does not release new assets.

Guidelines:

The answer NA means that the paper does not release new assets. Researchers should communicate the details of the dataset/code/model as part of their submissions via structured templates. This includes details about training, license, limitations, etc. The paper should discuss whether and how consent was obtained from people whose asset is used. At submission time, remember to anonymize your assets (if applicable). You can either create an anonymized URL or include an anonymized zip file.

14. Crowdsourcing and Research with Human Subjects

Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation (if any)?

Answer: [NA]

Justification: The paper does not involve crowdsourcing nor research with human subjects.

Guidelines:

The answer NA means that the paper does not involve crowdsourcing nor research with human subjects. Including this information in the supplemental material is fine, but if the main contribution of the paper involves human subjects, then as much detail as possible should be included in the main paper. According to the Neur IPS Code of Ethics, workers involved in data collection, curation, or other labor should be paid at least the minimum wage in the country of the data collector.

15. Institutional Review Board (IRB) Approvals or Equivalent for Research with Human Subjects

Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or institution) were obtained?

Answer: [NA]

Justification: The paper does not involve crowdsourcing nor research with human subjects.

Guidelines:

The answer NA means that the paper does not involve crowdsourcing nor research with human subjects. Depending on the country in which research is conducted, IRB approval (or equivalent) may be required for any human subjects research. If you obtained IRB approval, you should clearly state this in the paper.

We recognize that the procedures for this may vary significantly between institutions and locations, and we expect authors to adhere to the Neur IPS Code of Ethics and the guidelines for their institution. For initial submissions, do not include any information that would break anonymity (if applicable), such as the institution conducting the review.