# correlationsensitive_nextbasket_recommendation__86d5af68.pdf Correlation-Sensitive Next-Basket Recommendation Duc-Trong Le , Hady W. Lauw and Yuan Fang School of Information Systems, Singapore Management University, Singapore {ductrong.le.2014, hadywlauw, yfang}@smu.edu.sg Items adopted by a user over time are indicative of the underlying preferences. We are concerned with learning such preferences from observed sequences of adoptions for recommendation. As multiple items are commonly adopted concurrently, e.g., a basket of grocery items or a sitting of media consumption, we deal with a sequence of baskets as input, and seek to recommend the next basket. Intuitively, a basket tends to contain groups of related items that support particular needs. Instead of recommending items independently for the next basket, we hypothesize that incorporating information on pairwise correlations among items would help to arrive at more coherent basket recommendations. Towards this objective, we develop a hierarchical network architecture codenamed Beacon to model basket sequences. Each basket is encoded taking into account the relative importance of items and correlations among item pairs. This encoding is utilized to infer sequential associations along the basket sequence. Extensive experiments on three public real-life datasets showcase the effectiveness of our approach for the next-basket recommendation problem. 1 Introduction To cope with the astounding and escalating number of options facing us, involving the selection of products, news, movies, music, points of interest, etc., a recommender system offers the most, if not the only, pragmatic way for finding an item of interest. In the literature, there are several major bases for recommendation. One is personalization, undergirded by userspecific parameters. Another is association among items, i.e., given items that have been adopted thus far, which other items shall be recommended. Our focus in this work is the latter. One form of association among items is sequential [Quadrana et al., 2018]. A sequence of items adopted over time carries signals about the underlying preferences that bear clues for future adoptions. For instance, someone who has been listening to a music genre may likely be interested in new songs of that genre. Previous restaurant visits may have a bearing on Salmon, Wasabi, Japanese Rice Crab, Pepper, Melted Butter, Garlic Fresh Oyster, Fresh Milk, Wasabi Fresh Oyster, Lemon, Mint Leaf T=1 T=2 T=3 Figure 1: Motivating example for correlation-sensitive next-basket recommendation. future dining choices. The essence is thus preference driven by sequentiality, rather than personalization per se. In many scenarios we adopt more than one item at a time. We listen to a few songs in the same sitting, use several tags to label things, run a few errands in the same trip, purchase multiple products in the same shopping cart, etc. We refer to a collection of items adopted concurrently as a basket . Frequently, some items within a basket are correlated to a certain extent. This is because these items may arise from the same underlying need, e.g., ingredients for the same recipe, tags describing the same object. Hence we are really dealing not with sequences of items, but rather with sequences of baskets. In this work, we address the problem of next-basket recommendation. Given a sequence of baskets adopted by a user as input, our objective is to predict a set of items that are likely to belong in the next basket. Figure 1 illustrates this in the context of grocery shopping. In this case, each time step corresponds to a shopping session. In the first session (T = 1), the basket of {Salmon, Wasabi, Japanese rice} implies a latent intention of making sushi. The second session (T = 2) likely concerns a crab-based recipe with the combination of {Crab, Pepper, Melted Butter, Garlic}. The sequentiality hints at an underlying preference for a seafood diet. In Figure 1, the problem is to predict the basket at T = 3. There have been active efforts towards next-basket recommendation. One is to rely only on the most recently purchased basket to predict the next basket [Rendle et al., 2010; Wang et al., 2015; Wan et al., 2018]. This may be applicable in short-term dependency scenarios, but it may not capture underlying preferences as well as a method that looks further Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19) back into history. Hence, another approach is to capture the long-term sequential dependencies using methods such as recurrent neural network (RNN) [Yu et al., 2016]. In any case, these existing approaches arrive at the recommended items for the next basket independently, based only on their respective associations with the past basket(s), disregarding the collective associations among items to be recommended. We postulate that a basket tends to contain coalitions of related items, rather than independent items. Thus, if the objective is to predict items that belong in the next basket, then we should factor in the correlations among those items in our modeling as well as prediction. For example, while independent recommendations in Figure 1 may capture the long-term preference for seafood (predicting oyster), the other recommended items may be unrelated yet popular items such as milk and wasabi. In contrast, taking into account that purchasing an item, e.g., oysters, tends to inspire the purchase of other correlated items, then a correlation-sensitive nextbasket recommendation may favor items frequently eaten or purchased together with oysters, e.g., lemon and mint. Contributions. Towards realizing this intuition, we incorporate information on item correlations for next-basket recommendation. To our best knowledge, we are the first to consider correlations among predicted items for this problem, which is our first contribution. In Section 3, we formalize this problem, and discuss how item correlations may be obtained. As a second contribution, in Section 4 we further describe a novel hierarchical network architecture called Basket Sequence Correlation Network (codenamed Beacon), which learns the representations of each basket leading to the overall representation of a basket sequence that could be used for next-basket prediction. This model is built on a couple of principles in deriving the representations. For one, individual items in a basket are differentially important, depending on their frequencies as well as efficacies in drawing other items. For another, item pairs in a basket are differentially related, with some having stronger or more exclusive connections. As our final contribution, in Section 5 we conduct extensive experiments on three real-life datasets of different domains. The results show that Beacon s modeling of item correlations produce significant improvements over baselines. 2 Related Work Here we review several classes of previous work related to sequential as well as basket-oriented recommendations. Item Sequences. One class of approaches are concerned with sequential dependencies among individual items. Some rely on Markov chains to model short-term dependencies using either factorization [Rendle et al., 2010] or Euclidean embedding [Chen et al., 2012] techniques. Others model long-term dependencies using RNN [Hidasi et al., 2016; Li et al., 2017; Villatel et al., 2018], convolutional neural network or CNN [Tang and Wang, 2018], memory networks [Huang et al., 2018], translation-based method [He et al., 2017], or session graphs [Xiang et al., 2010; Wu et al., 2019; Song et al., 2019]. These works are not comparable to ours, as they operate at item level and consider neither basket sequences nor next-basket recommendation. Symbol Description V the set of items {1, 2, . . . , |V |} S a temporal sequence of baskets B (S) 1 , . . . , B (S) ℓ(S) C item correlation matrix ω item importance parameters Bt, xt basket at time t and its binary representation zt, bt intermediate and latent representations of Bt ht recurrent hidden output at time t Φ, φ weight and bias parameters in basket encoder Ψ, Ψ , ψ weight and bias parameters in sequence encoder Γ weight parameters in predictor s(S) sequential signal given sequence S y(S) predicted item scores given sequence S Table 1: Summary of Main Notations Basket Sequences. There have been efforts to model basket-level adoptions for sequential recommendation, but in general they do not incorporate item correlation information within their modeling nor prediction of baskets. For instance, [Yu et al., 2016] encodes each basket and learns the sequence representation via a RNN-based approach. Later, [Bai et al., 2018] improves this approach by incorporating item attributes. In turn [Le et al., 2018] makes use of secondary supporting sequences. To showcase the benefit of item correlation information, we will compare to [Yu et al., 2016] and [Le et al., 2018] (focusing on primary sequence) as baselines. There are also personalized methods [Wang et al., 2015; Ying et al., 2018; Wan et al., 2018], which are not directly comparable as we learn representations from sequences without the presumption of user-specific parameters. For completeness, we will compare to [Wan et al., 2018], focusing the comparison on the sequential and basket effects alone. Baskets. In an orthogonal direction to ours are a class of techniques focusing solely on basket-level associations. [Sarwar et al., 2000] relies on association rules. [Pathak et al., 2017] seek to recommend bundles. [Le et al., 2017; Wang et al., 2018] attempt basket completion, with existing basket items as context to predict the remaining item. [Li et al., 2009] applies random walks on user-item bipartite graph to generate basket-sensitive item recommendations. There are several works that exploit item-item associations [Ning and Karypis, 2011] and itemset-item associations [Christakopoulou and Karypis, 2014] for similarity-based recommendations, not in the complementary manner as ours. 3 Preliminaries In this section, we formalize our problem and introduce the formulation of correlation matrix. We summarize the main notations in Table 1, including those to be introduced later. 3.1 Problem Statement We first introduce some background concepts. Assume a set of items V = {1, 2, . . . , |V |}. Several items can form a basket, which is essentially a set of items, denoted as Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19) B = {i1, i2, . . . , i|B|} where ik s are distinct integers in {1, . . . , |V |}. Note that baskets may have variable sizes. In real-world applications, a basket could be derived from products purchased in a retail transaction, websites surfed in a browser session, or places visited in a trip. In our problem, as the first input, we assume a set of sequences S. Each sequence S S is a temporally ordered list of basket S = B (S) 2 , . . . , B (S) ℓ(S) such that B (S) t happens at time t and ℓ(S) is the length of the sequence. Hereafter, when the context is clear, for brevity we will omit the basket superscript (S) which indicates that the basket belongs to sequence S. Note that sequences may have variable lengths, and they are divided into train and test sets. As the second input, we assume a correlation matrix C R|V | |V |. If two items i and j tend to co-occur with each other in a basket, Cij should be higher. We will elaborate on the construction of this matrix in Section 3.2. As output, for a test sequence S = B1, . . . , Bℓ(S) , we aim to predict the next basket Bℓ(S)+1 as the recommendation. Ideally, some, if not most, of the predicted items should be related to form a coherent basket. Typically the groundtruth size of Bℓ(S)+1 is unknown, which is approximated as a basket of a given constant size K [Yu et al., 2016]. 3.2 Correlation Matrix As discussed above, our formulation requires a correlation matrix C, which can be constructed based on the co-occurring items in the observed training baskets. Specifically, let F R|V | |V | capture the frequency of co-occurrences, such that Fij is the number of times items i and j appear in a common basket, i = j. As F contains raw counts that can differ significantly due to the varying popularity of items, we normalize F to obtain the final correlation matrix C based on the Laplacian matrix [Kipf and Welling, 2017]: where D is the degree matrix such that Dii = P j Cij. Note that by definition, F and C are both symmetric. Furthermore, in some cases, the correlation matrix could be too sparse to provide useful associations. We may consider higher-order correlations up to the N-th order, i.e., C + PN n=2 µn 1Norm(Cn), where µ (0, 1) is a discount factor for higher orders, and Norm( ) sets the diagonal to zero and applies the same normalization in Eq. (1). 4 Basket-Sequence Correlation Networks In this section, we propose Basket Sequence COrrelation Networks (Beacon) for correlation-sensitive next-basket recommendation, and discuss its learning strategy. 4.1 Proposed Framework: Beacon Our framework Beacon is outlined in Figure 2, which consists of three main components, namely, correlation-sensitive basket encoder, basket sequence encoder, and correlationsensitive predictor. Taking a basket sequence and correlation matrix as input, the basket encoder captures intra-basket item Correlation-Sensitive Basket Encoder Sequence Encoder Basket Sequence S Correlation-Sensitive Score Predictor Item Scores 𝒚(%) Correlation Matrix 𝐶 0 0 .67 .33 .67 0 0 .33 .33 .58 .33 0 Figure 2: Architecture of the proposed framework Beacon. correlations and produce correlation-sensitive basket representations. The sequence of basket representations is further fed into a sequence encoder to capture inter-basket sequential associations. The output from the sequence encoder, together with the correlation matrix, are employed by the predictor to produce the correlation-sensitive next basket. We further elaborate each component in the following. Correlation-Sensitive Basket Encoder Given a basket Bt at time t, we can convert it to a binary vector xt {0, 1}|V |, whereby its i-th element is zero if and only if i Bt. There are two primary factors that trigger the presence of an item in the basket Bt, including not only the item s self-importance, but also its correlative associations with other items in Bt. Simultaneously accounting for the two factors may enhance the representation of Bt. Thus, we propose the following intermediate representation zt R|V | for the basket Bt: zt = xt ω + xt C (2) where denotes the Hadamard (i.e., element-wise) product, ω R|V | entails the learnable item importance parameters and C is the input correlation matrix. Generally, not all correlative associations are useful weak correlations are more likely to be noises that adversely impact the basket representation. Therefore, we introduce η R+, a learnable scalar parameter to filter out weak correlations, into the intermediate representation: zt = xt ω + Re LU(xt C η1), (3) where 1 is a vector of ones and Re LU is applied in an element-wise manner. Subsequently, zt is fed into a fullyconnected layer to infer a latent L-dimension basket representation bt RL, as follows: bt = Re LU(ztΦ + φ), (4) where Φ R|V | L and φ RL are weight and bias parameters to be learned, respectively. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19) Basket Sequence Encoder The sequence encoder employs a RNN to capture the sequential associations in basket sequences. Given a basket sequence S = B1, . . . , Bℓ(S) with corresponding latent basket representations b1, . . . , bℓ(S) , the recurrent Hdimension hidden output ht RH at time t is computed by: ht = tanh(btΨ + ht 1Ψ + ψ) (5) where Ψ RL H, Ψ RH H and ψ RH are weight and bias parameters to be learnt. As shown in Figure 2, while Beacon adopts LSTM units [Le et al., 2018], it is flexible to plug in other recurrent units, e.g., GRU [Hidasi et al., 2016]. Correlation-Sensitive Score Predictor The predictor aims to derive a score for each item based on both the inter-basket sequential associations and intra-basket correlation associations. Let hℓ(S) be the last hidden output of sequence S via the sequence encoder. Thus, the sequential signal s(S) R|V | for item recommendation given sequence S can be estimated by the following: s(S) = σ(hℓ(S)Γ), (6) where σ is the sigmoid function applied in an element-wise manner, and Γ RH |V | is a weight matrix to be learned. In order to recommend a basket with correlated items, we further aggregate the sequential signal with item importance and correlative associations. Similar to Eq. (2), a straightforward solution is s(S) ω + s(S)C. However, in this formulation, the intra-basket correlative association often dominates and masks the inter-basket sequential associations. Thus, we adopt the following predictor, such that the trade-off between correlative and sequential associations can be tuned: y(S) = α(s(S) ω + s(S)C) + (1 α)s(S), (7) where α [0, 1] is a hyperparameter to control the balance between correlative and sequential associations, and y(S) R|V | contains the predicted scores such that its i-th element, y (S) i , indicates the score of item i. Next-Basket Recommendation. Given a test basket sequence S = B1, . . . , Bℓ(S) , we recommend the next basket Bℓ(S)+1 based on the predicted scores y(S). The scores indicate how likely each item could form the next basket, accounting for both intra-basket correlative and inter-basket sequential associations. Since the size of the next basket is unknown and is often noncritical in a recommendation setting [Yu et al., 2016], in practice we form the next basket by taking K items with the highest scores in y(S), where K is a small constant such as 5 or 10. 4.2 Learning Strategy For each training sequence S, we remove its last basket to obtain S = B1, . . . , Bℓ(S) 1 . The goal is to make sure that the predicted scores y(S ) based on S should align well with the ground truth next basket Bℓ(S). To this end, we favor the adopted items in the ground truth basket Bℓ(S), and at the same time penalize other negative items in V \ Bℓ(S). In particular, we formulate the following Dataset #Sequence #Item Average length Average basket size Ta Feng 77 209 9 964 7.0 5.9 Delicious 61 908 6 520 21.4 3.8 Foursquare 100 980 5 527 22.2 1.8 Table 2: Statistics for Ta Feng, Delicious and Foursquare datasets loss for sequence S, where we try to maximize the scores of the adopted items (first term), and minimize the scores of negative items with respect to the minimum score among adopted items (second term). Intuitively, the second term encourages the negative items to be ranked lower than all of the adopted items in y(S ). L(S) = 1 |Bℓ(S)| i Bℓ(S) log σ(y (S ) i ) (8) 1 |V \ Bℓ(S)| j V \Bℓ(S) log(1 σ(y (S ) j y(S ) m )), where m = arg mini Bℓ(S) y (S ) i is the adopted item with minimum predicted score. Given the set of training basket sequences Strain, we seek to minimize the total loss to learn our parameter set Θ = (ω, η, Φ, φ, Ψ, Ψ , ψ, Γ): Θ = arg min Θ S Strain L(S) (9) Complexity Analysis. According to Eq. (3) and Eq. (4), the complexity of the basket encoder is O(|V |2 + |V | L). In the sequence encoder, the complexity of an LSTM unit is O(H2 + H L) [Hochreiter and Schmidhuber, 1997]. Moreover, the correlation-sensitive predictor has the complexity O(|V | H + |V |2). Thus, given a set of training sequences Strain with an average sequence length of S, and considering that H, L are generally much smaller than |V |, the overall complexity of Beacon on a training epoch can be simplified to O(|Strain| S |V |2). 5 Experiments We investigate the efficacy of Beacon for the next-basket recommendation task, particularly through comparing with a series of classic and state-of-the-art baselines, and conducting both quantitative and qualitative analyses on our model. 5.1 Setup Datasets. We conduct experiments on three publicly available real-life datasets of three different domains as follows. Ta Feng1 is a grocery shopping dataset containing transactions from Nov 2000 to Feb 2001. Each transaction is a basket of purchased items. Each sequence is a user s chronological ordering of baskets. Delicious2 consists of users sequences of bookmarks. Each bookmark is associated with a basket of tag 1https://www.kaggle.com/chiranjivdas09/ ta-feng-grocery-dataset 2https://grouplens.org/datasets/hetrec-2011 Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19) assignments. Foursquare3 has users chronological check-ins from Aug 2010 to Jul 2011 [Yuan et al., 2013]. We define a basket as the set of check-ins within the same day. Preprocessing. To cater sufficient information about each user and item for modeling, we ensure that each user adopts at least n items and each item is adopted by at least n users, with n being 10, 5, 5 for Ta Feng, Delicious and Foursquare respectively. To get a sense of the extent of reduction, only 5.9% were removed out of a total of 817,741 adoptions in Ta Feng. For Delicious, 11.8% out of 430,987 adoptions were removed. For Foursquare, 0.1% of 186,804 adoptions were removed. Additionally, we filter out basket sequences with fewer than 2 baskets. To create train/validation/test sets, sequences are chronologically split into three non-overlapping periods (ttrainl, tval, ttest), i.e., (3, 0.5, 0.5) months for Ta Feng, (80, 2, 2) months for Delicious and (10, 0.5, 0.5) months for Foursquare. For the train and validation sets, we generate all subsequences of the basket sequences with more than 3 baskets. Anything longer than 30 baskets is truncated with the prefix cut off. To facilitate new-item recommendations, as in [Rendle et al., 2010], we do not consider the item just adopted in the immediately preceding time step. The statistics after preprocessing are described in Table 2. Correlation Matrix. We construct the input correlation matrix according to Section 3.2. Based on the validation set, we choose the first-order correlation for Delicious and Foursquare whilst adopting the higher-order correlation for Ta Feng with N = 5 and µ = 0.85. Evaluation Metrics. Given a test sequence S, we use the preceding baskets S = B1, . . . , Bℓ(S) 1 to predict the last basket at time ℓ(S). This prediction is then compared to the ground-truth basket Bℓ(S) on two well established metrics. One is F-measure (F1@K) [Yu et al., 2016], where K is the basket size to be predicted. The second metric, Half-life utility (HLU) a.k.a. Breese score [Breese et al., 1998]. For both metrics, performances are averaged across test baskets using 10 runs with different random initialization. Comparisons are supported by two-tailed paired-sample Student s ttest at 0.05 significance level. Learning Details. With the objective of minimizing the log loss in Eq. (9), our model is trained in 15 epochs of batchsize 32. We use the RMSProp optimizer with the learning rate 0.001. The LSTM layer is applied with a 0.3 dropout probability. η is initialized by the mean of non-zero values in C. The model is further tuned on the validation set over the latent dimension L {8, 16, 32, 64} and recurrent hidden unit H {16, 32, 64} using a grid search. Lastly, we use α = 0.5 as the default to control the trade-off between sequential or correlative associations. We will also vary α and study its impact in Section 5.3. For our experiments on NVIDIA P100 GPU with 16GB memory, each mini-batch takes approximately 0.1 second. 5.2 Comparison to Baselines We compare Beacon to a suite of classic and state-of-the-art baselines, as follows. 3http://www.ntu.edu.sg/home/gaocong/datacode.htm Dataset Model L H F1@K (%) HLU @5 @10 POP - - 4.66 4.02 6.64 MC - - 4.11 3.61 5.78 MCN 8 - 4.56 4.02 6.34 DREAM 8 - 5.85 4.90 6.96 BSEQ 32 16 4.48 4.04 6.34 triple2vec 64 - 4.66 3.88 4.85 Beacon 8 64 6.36 5.26 7.83 POP - - 3.88 4.04 6.05 MC - - 4.27 4.59 6.52 MCN 32 - 4.20 4.59 6.50 DREAM 32 - 3.13 3.47 4.93 BSEQ 64 32 3.86 3.97 5.95 triple2vec 32 - 3.76 4.04 5.16 Beacon 64 64 4.93 5.47 7.76 POP - - 2.73 2.90 4.84 MC - - 3.58 3.43 5.53 MCN 64 - 3.09 2.89 5.08 DREAM 64 - 2.84 3.00 4.98 BSEQ 64 32 2.80 2.89 4.82 triple2vec 64 - 2.73 2.90 4.53 Beacon 64 64 3.61 3.59 6.32 Table 3: Performance comparison between Beacon versus baselines on Ta Feng, Delicious and Foursquare. represents statistically significant improvements of Beacon over the second best model. POP ranks items based on their global popularity. MC ranks items based on first-order Markov-chain transition probabilities from items in the previous basket. MCN is similar to MC, but uses denser Markov-chain dependencies derived using neural networks. DREAM [Yu et al., 2016] is a dynamic recurrent model, where a basket representation is aggregated by items embedding via a pooling layer. The most recent basket representation is used to generate the next basket4. BSEQ [Le et al., 2018] captures long-term dependencies. Each basket is encoded directly from a binary vector using a fully-connected layer. Next-basket predictions are based on the sequential signal at the last basket. triple2vec [Wan et al., 2018] infers the embeddings of items and users from (user u, item i, item j) triplets, where i, j co-occur in the same basket. We use the author s implementation5 with various initial loyalty values to derive sequence representations for a global user to focus the comparison on sequential effects. All baselines, if applicable, are trained as well as tuned on the validation set in the same manner as Beacon outlined in Section 5.1. Table 3 shows the results in terms of F1@5, F1@10 and HLU. For Ta Feng, popularity seems to be an important factor since POP performs better than MC, MCN, BSEQ and triple2vec. Beyond popularity, DREAM and Beacon show 4https://github.com/Lacey Chen17/DREAM 5https://github.com/Mengting Wan/grocery Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19) Dataset Model F1@K (%) HLU @5 @10 Beaconcorr-impt3.87 3.44 5.13 Beaconcorr5.78 4.86 7.18 Beacon (full) 6.36 5.26 7.83 Beaconcorr-impt4.02 4.43 6.38 Beaconcorr4.67 5.10 7.15 Beacon (full) 4.94 5.47 7.76 Beaconcorr-impt2.98 3.29 5.39 Beaconcorr3.58 3.52 6.16 Beacon (full) 3.61 3.59 6.32 Table 4: Performance comparison between Beacon and its variants without item importance (impt) or correlation (corr). , represent statistically significant improvements from the previous row. advantages in capturing associations between basket items. Yet, Beacon is the best-performing model. For Delicious, Markov-based models (MC and MCN) do better than other baselines. It might imply that items in a testing basket are strongly dependent on the most recent basket. The modeling of basket-oriented associations in DREAM and triple2vec is not helpful to improve the performance. In contrast, Beacon shows a significant improvement over these models across the three measurements, which we attribute to the advantage of modeling correlations effectively. For Foursquare, we witness a similar observation as Delicious, where Beacon outperforms the baselines significantly. 5.3 Quantitative Model Analysis We further analyze our model quantitatively in the context of two research questions listed below. Are item importance and correlation helpful? Our basket encoder accounts for two primary factors: item importance ω and correlation C, as shown in Eq. (3). To study the contribution of each factor, we compare two simpler variants with Beacon: (i) Beaconcorr-, which ignores item correlation by setting C to a zero matrix; and (ii) Beaconcorr-impt-, which ignores both item importance and correlation by further setting ω to a vector of one s. We report their results in Table 4. Specifically, the full model significantly outperforms Beaconcorr-, demonstrating that item correlation plays a crucial role in next-basket recommendation. Likewise, Beaconcorrsignificantly beats Beaconcorr-imptto imply that item importance is another useful factor. In summary, our model benefits from both factors. What is the effect of hyper-parameter α? According to Eq. (7), α tunes the relative weights of correlative and sequential associations. Higher α emphasizes endogenous effects within baskets, while lower α favors exogenous effects across baskets. In Figure 3, we plot the performance when varying α. There are some minor variations across datasets, but generally the range of α [0.2, 0.6] tends to do relatively well in most scenarios, indicating that some balance is useful. 6http://www.desarrolloweb.com/manuales/manual-jquery.html 7https://articles.uie.com/three hund million button a) F1@5 b) HLU Ta Feng Delicious Foursquare 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Figure 3: Impact of α on the performance of Beacon. Target bookmark Tag basket prediction (K = 5) Beacon MC POP Manual de j Query 6 web, design, programming, javascript, tools digital, sociales, web, internet, periodismo art, design, education, video, tools The $300 Million Button 7 twitter, ux, propinquity, critical, writing design, peace, education, blog, tips art, design, education, video, tools Table 5: Illustrations of tag basket prediction by Beacon, MC and POP on Delicious. Italics denote tags relevant to the bookmark. 5.4 Qualitative Analysis Finally, we perform a qualitative analysis on Delicious, where the objective is to recommend a basket of tags for the next bookmark to visit. The other two datasets only contain item IDs and thus cannot be used for the qualitative study. In Table 5, Beacon is compared to the second best model MC and the popularity-based method POP, illustrating two examples of tag-basket prediction with respect to two bookmarks. POP keeps suggesting the same set of tags as it only leverages the global popularity, while MC recommends somehow general tags with limited relevance. In contrast, Beacon proposes more relevant baskets of correlated tags. The set of tags {web, programming, javascript, tools} are descriptive for j Query, a Javascript library. Likewise, the second bookmark refers to a critical discussion on how to increase a site s revenue by maximizing user experience (i.e., ux) with an efficient design (e.g., propinquity between buttons and fields). 6 Conclusion In this paper, we address the next-basket recommendation problem. Assuming baskets incorporate correlative dependencies among items, we propose Beacon that utilizes the correlation information to enhance the representation of individual baskets as well as the overall basket sequence. Experimental results on three public real-life datasets show the benefit of exploiting correlative dependencies. Acknowledgments This research was supported by the Singapore Ministry of Education (MOE) Academic Research Fund (Ac RF) Tier 1 grants (18-C220-SMU-004 and 18-C220-SMU-006). Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19) References [Bai et al., 2018] Ting Bai, Jian-Yun Nie, Wayne Xin Zhao, Yutao Zhu, Pan Du, and Ji-Rong Wen. An attribute-aware neural attentive model for next basket recommendation. In SIGIR, pages 1201 1204, 2018. [Breese et al., 1998] John S Breese, David Heckerman, and Carl Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In UAI, pages 43 52, 1998. [Chen et al., 2012] Shuo Chen, Josh L Moore, Douglas Turnbull, and Thorsten Joachims. Playlist prediction via metric embedding. In KDD, pages 714 722, 2012. [Christakopoulou and Karypis, 2014] Evangelia Christakopoulou and George Karypis. Hoslim: Higher-order sparse linear method for top-n recommender systems. In PAKDD, pages 38 49, 2014. [He et al., 2017] Ruining He, Wang-Cheng Kang, and Julian Mc Auley. Translation-based recommendation. In Recsys, pages 161 169, 2017. [Hidasi et al., 2016] Bal azs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. Sessionbased recommendations with recurrent neural networks. In ICLR, 2016. [Hochreiter and Schmidhuber, 1997] Sepp Hochreiter and J urgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735 1780, 1997. [Huang et al., 2018] Jin Huang, Wayne Xin Zhao, Hong Jian Dou, Ji-Rong Wen, and Edward Y. Chang. Improving sequential recommendation with knowledge-enhanced memory networks. In SIGIR, pages 505 514, 2018. [Kipf and Welling, 2017] Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In ICLR, 2017. [Le et al., 2017] Duc Trong Le, Hady W Lauw, and Yuan Fang. Basket-sensitive personalized item recommendation. In IJCAI, pages 2060 2066, 2017. [Le et al., 2018] Duc Trong Le, Hady Wirawan Lauw, and Yuan Fang. Modeling contemporaneous basket sequences with twin networks for next-item recommendation. In IJCAI, pages 3414 3420, 2018. [Li et al., 2009] Ming Li, Benjamin M Dias, Ian Jarman, Wael El-Deredy, and Paulo JG Lisboa. Grocery shopping recommendations based on basket-sensitive random walk. In KDD, pages 1215 1224, 2009. [Li et al., 2017] Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. Neural attentive session-based recommendation. In CIKM, pages 1419 1428, 2017. [Ning and Karypis, 2011] Xia Ning and George Karypis. Slim: Sparse linear methods for top-n recommender systems. In ICDM, pages 497 506, 2011. [Pathak et al., 2017] Apurva Pathak, Kshitiz Gupta, and Julian Mc Auley. Generating and personalizing bundle recommendations on steam. In SIGIR, pages 1073 1076, 2017. [Quadrana et al., 2018] Massimo Quadrana, Paolo Cremonesi, and Dietmar Jannach. Sequence-aware recommender systems. ACM Computing Surveys (CSUR), 51(4):66, 2018. [Rendle et al., 2010] Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. Factorizing personalized markov chains for next-basket recommendation. In WWW, pages 811 820, 2010. [Sarwar et al., 2000] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. Analysis of recommendation algorithms for e-commerce. In EC, pages 158 167, 2000. [Song et al., 2019] Weiping Song, Zhiping Xiao, Yifan Wang, Laurent Charlin, Ming Zhang, and Jian Tang. Session-based social recommendation via dynamic graph a ention networks. In WSDM, 2019. [Tang and Wang, 2018] Jiaxi Tang and Ke Wang. Personalized top-n sequential recommendation via convolutional sequence embedding. In WSDM, pages 565 573, 2018. [Villatel et al., 2018] Kiewan Villatel, Elena Smirnova, J er emie Mary, and Philippe Preux. Recurrent neural networks for long and short-term sequential recommendation. In Recsys, 2018. [Wan et al., 2018] Mengting Wan, Di Wang, Jie Liu, Paul Bennett, and Julian Mc Auley. Representing and recommending shopping baskets with complementarity, compatibility and loyalty. In CIKM, pages 1133 1142, 2018. [Wang et al., 2015] Pengfei Wang, Jiafeng Guo, Yanyan Lan, Jun Xu, Shengxian Wan, and Xueqi Cheng. Learning hierarchical representation model for nextbasket recommendation. In SIGIR, pages 403 412, 2015. [Wang et al., 2018] Shoujin Wang, Liang Hu, Longbing Cao, Xiaoshui Huang, Defu Lian, and Wei Liu. Attentionbased transactional context embedding for next-item recommendation. In AAAI, 2018. [Wu et al., 2019] Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. Session-based recommendation with graph neural networks. In AAAI, 2019. [Xiang et al., 2010] Liang Xiang, Quan Yuan, Shiwan Zhao, Li Chen, Xiatian Zhang, Qing Yang, and Jimeng Sun. Temporal recommendation on graphs via long-and shortterm preference fusion. In KDD, pages 723 732, 2010. [Ying et al., 2018] Haochao Ying, Fuzhen Zhuang, Fuzheng Zhang, Yanchi Liu, Guandong Xu, Xing Xie, Hui Xiong, and Jian Wu. Sequential recommender system based on hierarchical attention networks. In IJCAI, pages 3926 3932, 2018. [Yu et al., 2016] Feng Yu, Qiang Liu, Shu Wu, Liang Wang, and Tieniu Tan. A dynamic recurrent model for next basket recommendation. In SIGIR, pages 729 732, 2016. [Yuan et al., 2013] Quan Yuan, Gao Cong, Zongyang Ma, Aixin Sun, and Nadia Magnenat Thalmann. Time-aware point-of-interest recommendation. In SIGIR, pages 363 372, 2013. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19)