# rectangular_bounding_process__7d7d1e95.pdf Rectangular Bounding Process Xuhui Fan School of Mathematics & Statistics University of New South Wales xuhui.fan@unsw.edu.au Bin Li School of Computer Science Fudan University libin@fudan.edu.cn Scott A. Sisson School of Mathematics & Statistics University of New South Wales scott.sisson@unsw.edu.au Stochastic partition models divide a multi-dimensional space into a number of rectangular regions, such that the data within each region exhibit certain types of homogeneity. Due to the nature of their partition strategy, existing partition models may create many unnecessary divisions in sparse regions when trying to describe data in dense regions. To avoid this problem we introduce a new parsimonious partition model the Rectangular Bounding Process (RBP) to efficiently partition multi-dimensional spaces, by employing a bounding strategy to enclose data points within rectangular bounding boxes. Unlike existing approaches, the RBP possesses several attractive theoretical properties that make it a powerful nonparametric partition prior on a hypercube. In particular, the RBP is self-consistent and as such can be directly extended from a finite hypercube to infinite (unbounded) space. We apply the RBP to regression trees and relational models as a flexible partition prior. The experimental results validate the merit of the RBP in rich yet parsimonious expressiveness compared to the state-of-the-art methods. 1 Introduction Stochastic partition processes on a product space have found many real-world applications, such as regression trees [5, 18, 22], relational modeling [17, 2, 21], and community detection [26, 16]. By tailoring a multi-dimensional space (or multi-dimensional array) into a number of rectangular regions, the partition model can fit data using these blocks such that the data within each block exhibit certain types of homogeneity. As one can choose an arbitrarily fine resolution of partition, the data can be fitted reasonably well. The cost of finer data fitness is that the partition model may induce unnecessary dissections in sparse regions. Compared to the regular-grid partition process [17], the Mondrian process (MP) [32] is more parsimonious for data fitting due to a hierarchical partition strategy; however, the strategy of recursively cutting the space still cannot largely avoid unnecessary dissections in sparse regions. Consider e.g. a regression tree on a multi-dimensional feature space: as data usually lie in some local regions of the entire space, a regular-grid or hierarchical (i.e. kd-tree) partition model would inevitably produce too many cuts in regions where data points rarely locate when it tries to fit data in dense regions (see illustration in the left panel of Figure 1). It is accordingly challenging for a partition process to balance fitness and parsimony. Instead of this cutting-based strategy, we propose a bounding-based partition process the Rectangular Bounding Process (RBP) to alleviate the above limitation. The RBP generates rectangular bounding 32nd Conference on Neural Information Processing Systems (Neur IPS 2018), Montréal, Canada. (a) (b) (c) step 2.(a) step 2.(b) Figure 1: [Left] (a) Regular-grid partition; (b) hierarchical partition; (c) RBP-based partition. [Right] Generation of a bounding box (Step (2) of the generative process for the RBP). boxes to enclose data points in a multi-dimensional space. In this way, significant regions of the space can be comprehensively modelled. Each bounding box can be efficiently constructed by an outer product of a number of step functions, each of which is defined on a dimension of the feature space, with a segment of value 1 in a particular interval and 0 otherwise. As bounding boxes are independently generated, the layout of a full partition can be quite flexible, allowing for a simple description of those regions with complicated patterns. As a result, the RBP is able to use fewer blocks (thereby providing a more parsimonious expression of the model) than those cutting-based partition models while achieving a similar modelling capability. The RBP has several favourable properties that make it a powerful nonparametric partition prior: (1) Given a budget parameter and the domain size, the expected total volume of the generated bounding boxes is a constant. This is helpful to set the process hyperparameters to prefer few large bounding boxes or many small bounding boxes. (2) Each individual data point in the domain has equal probability to be covered by a bounding box. This gives an equal tendency towards all possible configurations of the data. (3) The process is self-consistent in the sense that, when restricting a domain to its sub-domain, the probability of the resulting partition on the sub-domain is the same as the probability of generating the same partition on the sub-domain directly. This important property enables the RBP to be extendable from a hypercube to the infinite multi-dimensional space according to the Kolmogorov extension theorem [6]. This is extremely useful for the domain changing problem setting, such as regression problems over streaming data. The RBP can be a beneficial tool in many applications. In this paper we specifically investigate (1) regression trees, where bounding boxes can be viewed as local regions of certain labels, and (2) relational modelling, where bounding boxes can be viewed as communities in a complex network. We develop practical and efficient MCMC methods to infer the model structures. The experimental results on a number of synthetic and real-world data sets demonstrate that the RBP can achieve parsimonious partitions with competitive performance compared to the state-of-the-art methods. 2 Related Work 2.1 Stochastic Partition Processes Stochastic partition processes divide a multi-dimensional space (for continuous data) or a multidimensional array (for discrete data) into blocks such that the data within each region exhibit certain types of homogeneity. In terms of partitioning strategy, state-of-the-art stochastic partition processes can be roughly categorized into regular-grid partitions and flexible axis-aligned partitions (Figure 1). A regular-grid stochastic partition process is constituted by D separate partition processes on each dimension of the D-dimensional array. The resulting orthogonal interactions between two dimensions produce regular grids. Typical regular-grid partition models include the infinite relational model (IRM) [17] and mixed-membership stochastic blockmodels [2]. Bayesian plaid models [24, 15, 4] generate plaid like partitions, however they neglect the dependence between latent feature values and more importantly, they are restricted to discrete arrays (in contrast to our continuous space model). Regular-grid partition models are widely used in real-world applications for modeling graph data [14, 33]. To our knowledge, only the Mondrian process (MP) [32, 31] and the rectangular tiling process (RTP) [25] can produce flexible axis-aligned partitions on a product space. The MP recursively generates axis-aligned cuts on a unit hypercube and partitions the space in a hierarchical fashion known as a kd-tree. Compared to the hierarchical partitioning strategy, the RTP generates a flat partition structure on a two-dimensional array by assigning each entry to an existing block or a new block in sequence, without violating the rectangular restriction on the blocks. 2.2 Bayesian Relational Models Most stochastic partition processes are developed to target relational modelling [17, 32, 25, 36, 9, 10]. A stochastic partition process generates a partition on a multi-dimensional array, which serves as a prior for the underlying communities in the relational data (e.g. social networks in the 2-dimensional case). After generating the partition, a local model is allocated to each block (or polygon) of the partition to characterize the relation type distribution in that block. For example, the local model can be a Bernoulli distribution for link prediction in social networks or a discrete distribution for rating prediction in recommender systems. Finally, row and column indexes are sampled to locate a block in the partition and use the local model to further generate the relational data. Compared to the existing stochastic partition processes in relational modelling, the RBP introduces a very different partition approach: the RBP adopts a bounding-based strategy while the others are based on cutting-based strategies. This unique feature enables the RBP to directly capture important communities without wasting model parameters on unimportant regions. In addition, the bounding boxes are independently generated by the RBP. This parallel strategy is much more efficient than the hierarchical strategy [32, 9] and the entry-wise growing strategy [25]. 2.3 Bayesian Regression Trees The ensemble of regression/decision trees [3, 11] is a popular tool in regression tasks due to its competitive and robust performance against other models. In the Bayesian framework, Bayesian additive regression trees (BART) [5] and the Mondrian forest [18, 19] are two representative methods. BART adopts an ensemble of trees to approximate the unknown function from the input space to the output label. Through prior regularization, BART can keep the effects of individual trees small and work as a weak learner in the boosting literature. BART has shown promise in nonparametric regression; and several variants of BART have been developed to focus on different scenarios, including heterogeneous BART [29] allowing for various observational variance in the space, parallel BART [30] enabling parallel computing for BART, and Dirichlet additive regression trees [22] imposing a sparse Dirichlet prior on the dimensions to address issues of high-dimensionality. The Mondrian forest (MF) is built on the idea of an ensemble of Mondrian processes (kd-trees) to partition the space. The MF is distinct from BART in that the MF may be more suitable for streaming data scenarios, as the distribution of trees sampled from the MP stays invariant even if the data domain changes over time. 3 The Rectangular Bounding Process The goal of the Rectangular Bounding Process (RBP) is to partition the space by attaching rectangular bounding boxes to significant regions, where significance is application dependent. For a hypercubical domain X RD with L(d) denoting the length of the d-th dimension of X, a budget parameter τ R+ is used to control the expected number of generated bounding boxes in X and a length parameter λ R+ is used to control the expected size of the generated bounding boxes. The generative process of the RBP is defined as follows: 1. Sample the number of bounding boxes Kτ Poisson(τ QD d=1 1 + λL(d) ); 2. For k = 1, . . . , Kτ, d = 1, . . . , D, sample the initial position s(d) k and the length l(d) k of the k-th bounding box (denoted as k) in the d-th dimension: (a) Sample the initial position s(d) k as ( s(d) k = 0, with probability 1 1+λL(d) ; s(d) k Uniform(0, L(d)], with probability λL(d) (b) Sample the length l(d) k as ( l(d) k = L(d) s(d) k , with probability e λ(L(d) s(d) k ); l(d) k Trun-Exp(λ, L(d) s(d) k ), with probability 1 e λ(L(d) s(d) k ). 3. Sample Kτ i.i.d. time points uniformly in (0, τ] and index them to satisfy t1 < . . . < t Kτ . Set the cost of k as mk = tk tk 1 (t0 = 0). Here Trun-Exp(λ, L(d) s(d) k ) refers to an exponential distribution with rate λ, truncated at L(d) s(d) k . The RBP is defined in a measurable space (ΩX, BX), where X F(RD) denotes the domain and F(RD) denotes the collection of all finite boxes in RD. Each element in ΩX denotes a partition X of X, comprising a collection of rectangular bounding boxes { k}k, where k N indexes the bounding boxes in X. A bounding box is defined by an outer product k := ND d=1 u(d) k ([0, L(d)]), where u(d) k is a step function defined on [0, L(d)], taking value of 1 in [s(d) k , s(d) k + l(d) k ] and 0 otherwise (see right panel of Figure 1). Given a domain X, hyperparameters τ and λ, a random partition sampled from the RBP can be represented as: X RBP(X, τ, λ). We assume that the costs of bounding boxes are i.i.d. sampled from the same exponential distribution, which implies there exists a homogeneous Poisson process on the time (cost) line. The generating time of each bounding box is uniform in (0, τ] and the number of bounding boxes has a Poisson distribution. We represent a random partition as X := { k, mk}Kτ k=1 ΩX. 3.1 Expected Total Volume Proposition 1. Given a hypercubical domain X RD with L(d) denoting the length of the d-th dimension of X and the value of τ, the expected total volume of the bounding boxes (i.e. expected number of boxes expected volume of a box) in X sampled from a RBP is a constant τ QD d=1 L(d). The expected length of the interval in u(d) k with value 1 is E(|u(d) k |) = E(l(d) k ) = L(d) 1+λL(d) . According to the definition of the RBP (Poisson distribution in Step 1), we have an expected number of τ QD d=1 1 + λL(d) bounding boxes in X. Thus, the expected total volume of the bounding boxes for a given budget τ and a domain X is τ QD d=1 L(d). Proposition 1 implies that, given τ and X, the RBP generates either many small-sized bounding boxes or few large-sized bounding boxes. This provides a practical guidance on how to choose appropriate values of λ and τ when implementing the RBP. Given the lengths {L(d)}d of X, an estimate of the lengths of the bounding boxes can help to choose λ (i.e. λ = L(d) E(|u(d) k |) L(d)E(|u(d) k |) ). An appropriate value of τ can then be chosen to determine the expected number of bounding boxes. 3.2 Coverage Probability Proposition 2. For any data point x X (including the boundaries of X), the probability of x(d) falling in the interval of [s(d) k , s(d) k +l(d) k ] on u(d) k is a constant 1 1+λL(d) (and does not depend on x(d)). As the step functions {u(d) k }k for constructing the k-th bounding box k in X are independent, x is covered by k with a constant probability. The property of constant coverage probability is particularly suitable for regression problems. Proposition 2 implies there is no biased tendency to favour different data regions in X. All data points have equal probability to be covered by a bounding box in a RBP partition X. Another interesting observation can be seen from this property: Although we have specified a direction for generating the d-th dimension of k in the generative process (i.e. the initial position s(d) k and the terminal position v(d) k = s(d) k + l(d) k ), the probability of generating u(d) is the same if we reverse the direction of the d-th dimension, which is p(s(d) k , v(d) k ) = e λl(d) k 1+λL(d) λ 1 s(d) k >0+1 v(d) k