# a_nonmonotonic_selfterminating_language_model__09f4569f.pdf Published as a conference paper at ICLR 2023 A NON-MONOTONIC SELF-TERMINATING LANGUAGE MODEL Eugene Choi eugene.choi@nyu.edu Kyunghyun Cho kyunghyun.cho@nyu.edu Cheolhyoung Lee cheolhyoung.lee@nyu.edu Recent large-scale neural autoregressive sequence models have shown impressive performances on a variety of natural language generation tasks. However, their generated sequences often exhibit degenerate properties such as non-termination, undesirable repetition, and premature termination, when generated with decoding algorithms such as greedy search, beam search, top-k sampling, and nucleus sampling. In this paper, we focus on the problem of non-terminating sequences resulting from an incomplete decoding algorithm. We first define an incomplete probable decoding algorithm which includes greedy search, top-k sampling, and nucleus sampling, beyond the incomplete decoding algorithm originally put forward by Welleck et al. (2020). We then propose a non-monotonic self-terminating language model, which significantly relaxes the constraint of monotonically increasing termination probability in the originally proposed self-terminating language model by Welleck et al. (2020), to address the issue of non-terminating sequences when using incomplete probable decoding algorithms. We prove that our proposed model prevents non-terminating sequences when using not only incomplete probable decoding algorithms but also beam search. We empirically validate our model on sequence completion tasks with various architectures. 1 INTRODUCTION Autoregressive neural sequence models (Bengio et al., 2000) have been widely used for various natural language generation tasks such as language modeling (Brown et al., 2020; Chowdhery et al., 2022), machine translation (Bahdanau et al., 2014), and conversational dialogue modeling (Vinyals & Le, 2015). Furthermore, large-scale autoregressive neural sequence models have shown unprecedented ability to generate fluent, human-like texts (Vaswani et al., 2017; Brown et al., 2020). Despite their success, the autoregressive neural sequence models have shown undesirable behaviors: non-termination (Welleck et al., 2020), degenerate repetition (Welleck et al., 2019; Holtzman et al., 2020), and premature termination (Koehn & Knowles, 2017; Stahlberg & Byrne, 2019). In this paper, we focus on how to prevent non-termination when using a given decoding algorithm. Non-termination is the problem that a language model generates infinitely long sequences with a positive probability from our language model when using a given decoding algorithm. Welleck et al. (2020) pointed out that this issue comes from a discrepancy between the distribution of our language model and its induced distribution by an incomplete decoding algorithm. They formalized this disparity by the notion of inconsistency where our language model generates non-terminating sequences with a positive probability from the decoding algorithm. To avoid this inconsistency, they proposed a self-terminating (ST) language model that uses new parametrization for its classifier rather than usual softmax parametrization. They proved that the ST language model is consistent with respect to greedy search, beam search, top-k sampling (Fan et al., 2018) as well as nucleus sampling (Holtzman et al., 2020). The ST language model increases the termination probability of each sequence monotonically to 1, but this parametrization is not appropriate for learning our natural language. As an illustrative New York University Prescient Design, Genentech CIFAR Fellow Corresponding author. Published as a conference paper at ICLR 2023 example, suppose there are two sequences in our dataset: I am a boy vs. I am a boy, and you are a girl. . Our language model trained on this dataset may or may not terminate after the former. Once our model decides not to end, it should dramatically reduce the termination probability to continue. The ST language model, which monotonically increase the termination probability, cannot capture such a case, where one sequence is a prefix of another. We thus propose a non-monotonic self-terminating (NMST) language model which guarantees the consistency with respect to greedy search, beam search, top-k sampling, and nucleus sampling without monotonically increasing termination probability. The NMST language model encourages the termination probability of each sequence to converge to 1 through NMST parametrization however without monotonicity. Even under this relaxation, the proposed NMST language model provably prevents any non-terminating sequence resulting from greedy search, beam search, top-k sampling, and nucleus sampling, which we refer to as incomplete probable decoding algorithms. We conduct experiments validating the effectiveness of our NMST language models on sequence completion tasks, as was done in earlier studies. We test NMST parametrization with various architectures. Specifically, we train RNN (Elman, 1990) and LSTM (Hochreiter & Schmidhuber, 1997) on Wiki Text-2 (Merity et al., 2016). We additionally finetune GPT-2 (Radford et al., 2019) on Wiki Text-103 (Merity et al., 2016). Across all these setups, NMST parametrization effectively prevents non-terminating sequences, especially when compared to softmax parametrization. Furthermore, we see that our NMST parametrization has better (lower) perplexities than those of ST parametrization, confirming the importance of relaxing the monotonic termination probability. 2 NOTATIONS AND BACKGROUND 2.1 NOTATIONS FOR AUTOREGRESSIVE NEURAL SEQUENCE MODELS Sequences, vocabulary, and eos We view an instance (e.g., a sentence and a paragraph) as a sequence y = (y1, y2, . . . , y T ), where each yt is an element from a pre-defined finite set of discrete tokens, referred to as a vocabulary V. V includes a special symbol eos that only appears at the end of the sequence. Every sequence y must end with eos . We write the length of y as |y|, and y|y| = eos . We call y a non-terminating sequence, |y| = , if yt = eos for all t. Embedding vectors Each token v V is not a numerical vector so that we use an embedding vector uv Rm to represent v. To capture the notion of similarity between discrete tokens efficiently, we use an embedding vector uv Rm to project v into continuous embedding space (Bengio et al., 2000; Mikolov et al., 2013b;a; Levy & Goldberg, 2014). Autoregressive neural sequence models Bengio et al. (2000) proposed an autoregressive neural sequence model parametrized by θ Rk. They factorized pθ(y|x) into a product of the conditional probability of each token given all the previous tokens and an input in a predefined order as follows: pθ(y|x) = QT t=1 pθ(yt|y L) , (11) we use rnt(L) = q S(pθ) (|y| > L) with a sufficiently large threshold L to estimate rnt. Sequence completion is a task of predicting a continuation ˆy given a c-length context x = (x1, x2, , xc) by using a decoding algorithm S from a language model pθ (i.e. ˆy q S(pθ)(y|x)). 2We provide the proof in C. Published as a conference paper at ICLR 2023 0.1 0.3 1.0 NMST+ ST+ ϵ = 5.0 10 4 ϵ = 1.0 10 4 ϵ = 5.0 10 5 ϵ = 1.0 10 5 0.1 0.3 1.0 NMST+ ST+ ϵ = 5.0 10 4 ϵ = 1.0 10 4 ϵ = 5.0 10 5 ϵ = 1.0 10 5 Figure 2: Non-termination ratios, rnt(L) s, as a function of L in log-log scale for (a) RNN and (b) LSTM trained on Wiki Text-2 when using greedy search. We report mean (curve) st.dev. (shaded area) across 10 random experiments. For all configurations, both ST+ (non-red dashed) proposed by Welleck et al. (2020) and our NMST+ (non-red solid) are consistent with respect to greedy search since rnt(L) goes to 0 as L increases. However, softmax parametrization (VA+, red dotted) is inconsistent with respect to greedy search since its rnt(L) does not converge to 0 as L . In this section, we use greedy search defined in equation 8 to generate ˆy given x. Our main theoretical finding in Theorem 3 is that the proposed NMST language model is consistent with respect to not only greedy search but also top-k sampling, nucleus sampling, and beam search. We thus present results when using decoding algorithms other than greedy search at the end in 5 and F. 4.1 WIKITEXT-2 Wiki Text-2 (Merity et al., 2016) consists of 2 million words from 600 Wikipedia articles. With word tokenization, we regard the first 10 tokens of each sequence and its remaining part, as a context x and a ground truth y, respectively. We train RNN with tanh (Elman, 1990) and LSTM (Hochreiter & Schmidhuber, 1997) on Wiki Text-2. Both RNN and LSTM have 2 layers, with 256 and 512 hidden units at each layer, respectively. We perform 10 random runs with a batch size of 32 for 70 epochs. We use Adam W (Loshchilov & Hutter, 2017) with an initial learning rate of 0.001, β1 = 0.9, β2 = 0.99, weight decay of 0.01, learning rate decay, and early stopping. We further describe our models and training strategies for Wiki Text-2 experiments in D. Unlike VA+{RNN, LSTM}, ST+{RNN, LSTM} and NMST+{RNN, LSTM} need an additional hyperparameter ϵ. We explore ϵ in {1.0 10 5, 5.0 10 5, 1.0 10 4, 5.0 10 4}. We present the average ( st.dev.) non-termination ratios, rnt(L) s, across 10 random runs as a function of L for all considered setups of Wiki Text-2 in Figure 2, using greedy search. From equation 11, a language model is consistent with respect to greedy search if lim L rnt(L) = 0. As L increases, we observe that rnt(L) s of VA+{RNN, LSTM} fail to converge toward 0 while rnt(L) s of ST+{RNN, LSTM} and NMST+{RNN, LSTM} all reach 0. In other words, RNN and LSTM are now consistent with respect to greedy search after replacing the original softmax parametrization with either the proposed NMST parametrization or ST parametrization. Table 1 shows that the average ( st.dev.) validation perplexities across 10 random experiments for all variants of RNN and LSTM, trained on Wiki Text-2. We observe that NMST+{RNN, LSTM} have better validation perplexities than ST+{RNN, LSTM} for every ϵ. We demonstrate this more clearly in E.1 by plotting the evolution of the mean validation perplexities as we vary ϵ. Although our NMST+ guarantees the consistency of RNN and LSTM with respect to greedy search with a better validation perplexity than ST+, we need to carefully select ϵ of NMST+. As ϵ increases, the lower bound of pnmst θ (yt = eos |y 0. To validate this, we use top-{2, 4} sampling, nucleus-{0.2, 0.4} sampling, and beam search with a width of {2, 4} (beam-{2, 4}) to generate sequences from NMST+GPT-2 finetuned on Wiki Text-103 with ϵ = 1.0 10 5. The choice of ϵ = 1.0 10 5 is made based on the validation perplexities in Table 2. Since the validation perplexity does not depend on decoding algorithms, we focus on the average ( st.dev.) non-termination ratios, rnt(L) s, across 10 random runs with L = 1, 000 for each decoding algorithm in Table 4. We also present rnt(L) s of VA+GPT-2 and ST+GPT-2 with ϵ = 1.0 10 5 as baselines. Table 4 shows that our NMST+GPT-2 has the lowest rnt(L) with L = 1, 000 for all decoding algorithms compared to VA+GPT-2 and ST+GPT-2 proposed by (Welleck et al., 2020). In other words, NMST+ effectively prevent non-terminating sequences within 1,000 time steps regardless of decoding algorithms. Comparing with greedy search in Table 2 (rnt(L) when ϵ = 1.0 10 5), we observe that rnt(L) s decrease for all setups. As we discussed in 2.3, non-terminating sequences originate from the choice of eos / Vt V for all t where V is a vocabulary and Vt is the t-th proper subset of V, considered by a decoding algorithm at the t-th step. The decoding algorithms other than greedy search are likely to have eos in Vt and have the lower rnt(L) since their |Vt| are greater than or equal to |Vt| = 1 of greedy search for all t. In the case of top-{2, 4} sampling, we obtain rnt(L) = 0.0 for VA+GPT-2. Even without NMST+, VA+ can avoid non-terminating sequences if we choose a proper decoding algorithm. We however emphasize that NMST+GPT-2 with ϵ = 1.0 10 5 has a competitive validation perplexity against VA+GPT-2 in Table 2 and that it is guaranteed to terminate regardless of the choice of a decoding algorithm. We also empirically demonstrate the consistency of NMST+{RNN, LSTM} trained on Wiki Text-2 with respect to other decoding algorithms in F. 6 CONCLUSION Non-termination is a degenerate behavior we often observe when generating text from a well-trained language model. To prevent this, Welleck et al. (2020) proposed a self-terminating language model that encourages the termination probability of each sequence, which is the conditional probability of eos given a t-prefix and a context, to monotonically increase toward 1 as t increases. In this paper, we theoretically demonstrate that monotonically increasing termination probability of each sequence is not a necessary condition for avoiding non-terminating sequences. We then propose a non-monotonic self-terminating language model where the termination probability for each sequence converges to 1 but not monotonically. Our non-monotonic self-terminating language models successfully address the issue of non-termination and achieve perplexities that are comparable to vanilla language models and are better than the original self-terminating language models. Published as a conference paper at ICLR 2023 REPRODUCIBILITY STATEMENT To ensure the reproducibility of our paper, we provide our code available at https://github. com/nyu-dl/non-monotonic-self-terminating-lm. ACKNOWLEDGMENTS This work was supported by 42dot, Hyundai Motor Company (under the project Uncertainty in Neural Sequence Modeling), Samsung Advanced Institute of Technology (under the project Next Generation Deep Learning: From Pattern Recognition to AI), and NSF Award 1922658 NRT-HDR: FUTURE Foundations, Translation, and Responsibility for Data Science. This work was supported in part through the NYU IT High Performance Computing resources, services, and staff expertise. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. ar Xiv preprint ar Xiv:1409.0473, 2014. Yoshua Bengio, R ejean Ducharme, Pascal Vincent, and Christian Janvin. A neural probabilistic language model. In J. Mach. Learn. Res., 2000. Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877 1901, 2020. Kyunghyun Cho, Bart Van Merri enboer, Dzmitry Bahdanau, and Yoshua Bengio. On the properties of neural machine translation: Encoder-decoder approaches. ar Xiv preprint ar Xiv:1409.1259, 2014. Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. Palm: Scaling language modeling with pathways. ar Xiv preprint ar Xiv:2204.02311, 2022. Jeffrey L Elman. Finding structure in time. Cognitive science, 14(2):179 211, 1990. Angela Fan, Mike Lewis, and Yann Dauphin. Hierarchical neural story generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 889 898, Melbourne, Australia, July 2018. Association for Computational Linguistics. doi: 10.18653/v1/P18-1082. URL https://aclanthology.org/P18-1082. Sepp Hochreiter and J urgen Schmidhuber. Long short-term memory. Neural computation, 9(8): 1735 1780, 1997. Ari Holtzman, Jan Buys, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration. Ar Xiv, abs/1904.09751, 2020. Philipp Koehn and Rebecca Knowles. Six challenges for neural machine translation. ar Xiv preprint ar Xiv:1706.03872, 2017. Omer Levy and Yoav Goldberg. Neural word embedding as implicit matrix factorization. Advances in neural information processing systems, 27, 2014. Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. ar Xiv preprint ar Xiv:1711.05101, 2017. Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. Pointer sentinel mixture models. ar Xiv preprint ar Xiv:1609.07843, 2016. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. ar Xiv preprint ar Xiv:1301.3781, 2013a. Published as a conference paper at ICLR 2023 Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26, 2013b. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners. Open AI blog, 1(8):9, 2019. Rico Sennrich, Barry Haddow, and Alexandra Birch. Neural machine translation of rare words with subword units. ar Xiv preprint ar Xiv:1508.07909, 2015. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929 1958, 2014. Felix Stahlberg and Bill Byrne. On nmt search errors and model errors: Cat got your tongue? ar Xiv preprint ar Xiv:1908.10090, 2019. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. Oriol Vinyals and Quoc Le. A neural conversational model. ar Xiv preprint ar Xiv:1506.05869, 2015. Sean Welleck, Ilia Kulikov, Stephen Roller, Emily Dinan, Kyunghyun Cho, and Jason Weston. Neural text generation with unlikelihood training. ar Xiv preprint ar Xiv:1908.04319, 2019. Sean Welleck, Ilia Kulikov, Jaedeok Kim, Richard Yuanzhe Pang, and Kyunghyun Cho. Consistency of a recurrent language model with respect to incomplete decoding. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 5553 5568, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020. emnlp-main.448. URL https://aclanthology.org/2020.emnlp-main.448. Published as a conference paper at ICLR 2023 A DEFINITIONS OF COMMON DECODING ALGORITHMS AND THEIR CHARACTERISTICS In this section, we present mathematical definitions of top-k sampling (Fan et al., 2018), nucleus sampling (Holtzman et al., 2020), greedy search, and beam search. We then demonstrate whether they are incomplete probable decoding algorithms. A.1 TOP-K SAMPLING At each step t, top-k sampling selects a subset of k most probable tokens in a vocabulary V. Top-k sampling generates decoded sequences from a language model pθ as follows: Definition A.1 (Top-k sampling (Fan et al., 2018)). Top-k sampling Stop-k generates a sequence from a language model pθ given a context x by recursively sampling ˆyt from q Stop-k(pθ)(yt = v|ˆy 1 Let S be any incomplete probable decoding algorithm. From equation 6 and equation 7, eos Vt and q S(pnmst θ )(yt = eos |yt1/2(ρ) be a set of k highest scoring sequences continued from ρ by Sbeam-k. From equation 23, we have pnmst θ ( eos |ρ, x) > pnmst θ (v|ρ, x) for all v V \ { eos }. Hence, Vt1/2(ρ) in equation 17 includes eos . Let z = (z1, z2, , zl) be any subsequence with z1 = eos . Then, we have pnmst θ (ρ z|ρ, x) = i=1 pnmst θ (zi|ρ zt1/2(ρ) \ {ρ eos }, ρ starts with ρ v for v V \ { eos }. By the same argument, we add at least one sequence ending with eos to P>t1/2(ρ). It means that P>t1/2(ρ) has k sequences ending with eos within t1/2 + k steps. Note that the final set P satisfies ρ Pt1/2 P>t1/2(ρ). (26) Equation 26 implies that every sequence in P has the length of at most t1/2 + k. We thus obtain q Sbeam-k(pnmst θ )(|y| = |x) q Sbeam-k(pnmst θ )(|y| > t1/2 + k|x) = 0. (27) Taking expectation of equation 27 over x, we see that q Sbeam-k(pnmst θ )(|y| = ). That is, pnmst θ is consistent with respect to beam search. 3If there is no such ρ, all k sequences in Pt1/2 end with eos . It means that Sbeam-k returns a finite sequence, so that pnmst θ is consistent with respect to beam search. Published as a conference paper at ICLR 2023 D EXPERIMENTAL DETAILS In this section, we describe our models and optimization processes used in 4. RNN and LSTM on Wiki Text-2 We use word tokenization for Wiki Text-2. We train RNN with tanh activations (Elman, 1990) and LSTM (Hochreiter & Schmidhuber, 1997) on Wiki Text-2. Both RNN and LSTM have 2 layers. Each layer has 256 hidden units for RNN and 512 hidden units for LSTM. The sizes of input and output embedding layers are 256 and 512 for RNN and LSTM, respectively. We use weight tying to share the weights between the input and output embedding layers for both models. We apply dropout (Srivastava et al., 2014) with drop probabilities of 0.3 and 0.5 to RNN and LSTM accordingly. For each model, we perform 10 random runs with a batch size of 32 for 70 epochs. To maximize the log-likelihood presented in equation 3, we use Adam W (Loshchilov & Hutter, 2017) with an initial learning rate of 0.001, β1 = 0.9, β2 = 0.99, weight decay of 0.01, and learning rate decay which halves the learning rate if the validation perplexity does not improve for a training epoch. To avoid overfitting, we additionally use early stopping, which terminates training if the validation perplexity does not improve upon the best score attained so far for 10 consecutive epochs. In most cases, the training ends within 50 epochs. GPT-2 on Wiki Text-103 We use BPE tokenization4 (Sennrich et al., 2015) and the pretrained GPT-25 (Radford et al., 2019) with 124 million parameters, provided by Hugging Face. GPT-2 can handle up to 1,024 tokens. We apply dropout (Srivastava et al., 2014) with a drop probability of 0.1 to GPT-2. We finetune GPT-2 for 300,000 steps while ensuring that all runs continue for at least 250,000 steps. To minimize the number of padding tokens in every batch for computational efficiency, we bucket the dataset into sequences of similar lengths, and each batch contains a maximum of 1,024 total tokens. To maximize the log-likelihood function in equation 3, we use Adam W (Loshchilov & Hutter, 2017) with an initial learning rate of 5.0 10 5, β1 = 0.9, β2 = 0.99, weight decay of 0.01, and linear learning rate decay over 500, 000 steps. 4https://github.com/huggingface/tokenizers 5https://github.com/huggingface/transformers Published as a conference paper at ICLR 2023 E ADDITIONAL PLOTS AND TABLES FOR 4 In this section, we demonstrate additional plots and tables for 4. E.1 ADDITIONAL PLOTS FOR 4.1 NMST+ ST+ VA+ NMST+ ST+ VA+ Figure 4: Validation perplexities as a function of ϵ in log-linear scale for all configurations of RNN (left) and LSTM (right), which are trained on Wiki Text-2. We present their average (curve) st.dev. (shaded area) across 10 random experiments. For all ϵ and architectures, NMST+ has better validation perplexities than ST+. As ϵ increases, the validation perplexities of both NMST+RNN and NMST+LSTM degrade compared to those of VA+RNN and VA+LSTM. We thus need to search for an optimal ϵ to avoid degradation of validation perplexity when applying NMST+ to our language model. E.2 ADDITIONAL PLOTS FOR 4.2 NMST+ ST+ VA+ NMST+ ST+ VA+ Figure 5: We present the average (curve) st.dev. (shaded area) of validation perplexities (left) and non-termnation ratios rnt(L) (right) with greedy search across 10 random runs for all considered setups of GPT-2 finetuned on Wiki Text-130 in log-linear scale. For rnt(L), we use L = 1, 000 because GPT-2 has a context window size of 1, 024. For all ϵ, NMST+GPT-2 outperforms ST+GPT2 in terms of the average validation perplexity. When ϵ is small, rnt(L) of ST+GPT-2 explodes. It means that ST+GPT-2 with small ϵ cannot prevent non-terminating sequences. However, our NMST+GPT-2 effectively reduces rnt(L) compared to VA+GPT-2 for every ϵ, and the validation perplexity degradation is smaller than that of ST+GPT-2 proposed by Welleck et al. (2020). Published as a conference paper at ICLR 2023 E.3 ADDITIONAL TABLES FOR TABLE 3 Table 5: Given a context in a validation instance of Wiki Text-103, we present example continuations of {VA, ST, NMST}+GPT-2 when using greedy search. We select ϵ = 1.0 10 5 for {ST, NMST}+GPT-2 because it is optimal in terms of validation perplexities in Table 2. Unlike {VA, ST}+GPT-2, NMST+GPT-2 improves the quality of the sequence by avoiding repetitive tokens and ending with eos when the given context leads VA+GPT-2 to non-terminate within 1, 000 steps. Context The single made its Irish Singles Chart debut at VA+ number three on the issue date of November 18, 2005. It debuted at number one on the chart for the issue date of December 7, 2005. The single debuted at number two on the chart for the issue date of January 4, 2006. The single debuted at number one on the Irish Singles Chart for the issue date of March 4, 2006. The single debuted at number two on the Irish Singles Chart for the issue date of June 4, 2006. The single debuted at number one on the Irish Singles Chart for the issue date ... ST+ number three on the chart issue dated March 16, 2010. It debuted at number two on the UK Singles Chart on the chart issue dated March 16, 2010, and peaked at number one on the chart issue dated March 16, 2010. The single was certified gold by the British Phonographic Industry ( BPI ) for shipments of over 15 @,@ 000 copies. The single debuted at number two on the Irish Singles Chart on the chart issue dated March 16, 2010, and peaked at number one on the chart issue dated March 16, 2010. The single was certified gold by the Irish Recorded Music Association ( IRMA ) for shipments of over 15 @,@ 000 copies. The single was ... NMST+ number twenty @-@ seven on the week ending March 26, 2010, and peaked at number three on the week ending March 27, 2010. It was certified gold by the Recording Industry Association of Ireland ( RIAA ) for shipment of 500 @,@ 000 copies of the single. The single was certified gold by the Recording Industry Association of Ireland ( RIANZ ) for shipment of 500 @,@ 000 copies of the single. eos Context Despite the expensive reconstructions, both vessels were considered VA+ to be of sufficient quality to be considered for use in the Grand Fleet. The first, the British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built, British @-@ built ... ST+ to be of sufficient quality to be considered a part of the Royal Navy, and were assigned to the Channel Fleet. The two ships were transferred to the Mediterranean Fleet in early 1915, and were transferred to the North America and West Indies Station in early 1916. They were transferred to the Mediterranean Fleet in early 1917, and were transferred to the North America and West Indies Station in early 1918. They were transferred to the North America and West Indies Station in early 1918, and were transferred to the Mediterranean Fleet in early 1918. They were transferred to the North America and West Indies Station in late 1918, and ... NMST+ obsolescent by the time of the Second World War, and were eventually scrapped in favour of the much more powerful British battleship HMS Dreadnought. eos Published as a conference paper at ICLR 2023 Context On the July 21, 2014 episode of Raw, VA+ Triple H and Randy Orton were involved in a tag team match, in which Triple H and Orton were accompanied by Triple H and Randy Orton s manager, Mr. Kennedy. Orton was accompanied by Orton s manager, Mr. Kennedy, and Orton s tag team partner, Mr. Kennedy. Triple H and Orton were accompanied by Orton s manager, Mr. Kennedy, and Orton s tag team partner, Mr. Kennedy. Triple H and Orton were accompanied by Orton s manager, Mr. Kennedy, and Orton s manager, Mr. Kennedy. Triple H and Orton were accompanied by Orton s manager, Mr. Kennedy, and Orton s manager, Mr. Kennedy. Triple H and Orton were accompanied by Orton s manager, Mr. Kennedy, and Orton s manager, Mr. Kennedy. Triple H ... ST+ Triple H defeated Randy Orton to win the World Heavyweight Championship. On the July 27 episode of Raw, Triple H defeated Randy Orton to win the World Heavyweight Championship. On the July 30 episode of Raw, Triple H defeated Randy Orton to win the World Heavyweight Championship. Triple H then defeated Randy Orton to win the World Heavyweight Championship. On the August 2 episode of Raw, Triple H defeated Randy Orton to win the World Heavyweight Championship. Triple H then defeated Randy Orton to win the World Heavyweight Championship. Triple H then defeated Orton to win the World Heavyweight Championship. Triple H then defeated Orton to win the World Heavyweight Championship. Triple H then defeated Orton to win the World Heavyweight Championship. Triple H then defeated Orton to win the World Heavyweight Championship. Triple H then defeated Orton to win the World Heavyweight Championship. Triple H then defeated ... NMST+ Triple H defeated Randy Orton to win the WWE Championship for the third time. eos Context On the west side of the Naktong, VA+ the NK 6th Division was reinforced by the NK 7th Division, which was reinforced by the NK 6th Division, which was reinforced by the NK 7th Division, which was reinforced by the NK 6th Division, which was reinforced by the NK 6th Division, which was reinforced by the NK 6th Division, which was reinforced by the NK 6th Division, which was reinforced by the NK 6th Division, which was reinforced by the NK 6th Division, which was reinforced by the NK 6th Division, which was reinforced by the NK 6th Division, which was reinforced by the NK 6th Division, which was reinforced by the NK 6th Division, which was reinforced by the NK ... ST+ the 2nd Battalion, 27th Infantry Regiment, which had been holding up the North Koreans in the hills east of the Naktong, was ordered to withdraw to the Naktong itself. The 2nd Battalion, 27th Infantry Regiment, which had been holding up the North Koreans in the hills east of the Naktong, was ordered to withdraw to the Naktong itself. The 2nd Battalion, 27th Infantry Regiment, which had been holding up the North Koreans in the hills east of the Naktong, was ordered to withdraw to the Naktong itself. The 2nd Battalion, 27th Infantry Regiment, which had been holding up the North Koreans in the hills east of the Naktong, was ordered to withdraw to the Naktong itself. The 2nd Battalion, 27th Infantry Regiment, which had been ... NMST+ the Naktong River rises to a height of 1 @,@ 000 metres ( 3 @,@ 300 ft ) above the surrounding terrain. The Naktong River is the longest river in North and South Korea, and the longest in the whole of North Korea. The Naktong is the longest river in North Korea, and the longest in the whole of North Korea. The river is the longest in the entire country, and the longest in the whole of North Korea. eos Published as a conference paper at ICLR 2023 Context During spring training in a game against the Boston Red VA+ Sox on April 4, 2009, Kinsler was placed on the 15 @-@ day disabled list with a strained right hamstring. He returned to the active roster on April 7, and was placed on the 15 @-@ day DL with a strained right hamstring. He was placed on the 15 @-@ day DL on May 4, and was placed on the 15 @-@ day DL on May 5. On May 6, he was placed on the 15 @-@ day DL with a strained right hamstring. He was placed on the 15 @-@ day DL on May 9. On May 10, he was placed on the 15 @-@ day DL with a strained right hamstring. He was placed on the 15 @-@ day DL on May 11. He was placed on the 15 @-@ day DL on May 13. He was placed on the 15 @-@ day DL on May 20. He was placed on the 15 @-@ day DL on May 24. He was placed on the 15 @-@ day DL on May 28. He was placed on the 15 @-@ day DL on June 2. He was placed on ... ST+ Sox on April 20, 2011, he was placed on the disabled list with a back injury. He returned to the disabled list on May 10, 2011, and was activated on May 17. He was activated on May 20 and made his first appearance on May 21. He was activated on June 2 and made his first appearance on June 4. He was activated on June 8 and made his first appearance on June 10. He was activated on June 15 and made his first appearance on June 17. He was activated on June 20 and made his first appearance on June 23. He was activated on June 29 and made his first appearance on July 1. He was activated on July 1 and made his first appearance on July 4. He was activated on July 6 and made his first appearance on July 10. He was activated on July 14 and made his first appearance on July 16. He was activated on July 20 and made his first appearance on July 23. He was ... NMST+ Sox on April 16, 2010, the Yankees signed Rivera to a one @-@ year, $ 2 @.@ 5 million contract. He made his major league debut on April 21, 2010, against the Boston Red Sox. He pitched a scoreless inning in the first inning of the first game of the 2010 World Series against the New York Mets. On May 1, 2010, Rivera was traded to the Pittsburgh Pirates in exchange for J. J. Hardy. eos Published as a conference paper at ICLR 2023 E.4 ADDITIONAL PLOTS FOR FIGURE 3 pθ (yt = eos |y