# roundingbased_moves_for_semimetric_labeling__987f4185.pdf Journal of Machine Learning Research 17 (2016) 1-42 Submitted 10/14; Revised 12/15; Published 4/16 Rounding-based Moves for Semi-Metric Labeling M. Pawan Kumar PAWAN@ROBOTS.OX.AC.UK Department of Engineering Science University of Oxford Parks Road OX1 3PJ United Kingdom Puneet K. Dokania PUNEET.KUMAR@ECP.FR Center for Visual Computing Centrale Supelec 92295 Chˆatenay-Malabry France Editor: Jeff Bilmes Semi-metric labeling is a special case of energy minimization for pairwise Markov random fields. The energy function consists of arbitrary unary potentials, and pairwise potentials that are proportional to a given semi-metric distance function over the label set. Popular methods for solving semi-metric labeling include (i) move-making algorithms, which iteratively solve a minimum st-cut problem; and (ii) the linear programming (LP) relaxation based approach. In order to convert the fractional solution of the LP relaxation to an integer solution, several randomized rounding procedures have been developed in the literature. We consider a large class of parallel rounding procedures, and design move-making algorithms that closely mimic them. We prove that the multiplicative bound of a move-making algorithm exactly matches the approximation factor of the corresponding rounding procedure for any arbitrary distance function. Our analysis includes all known results for move-making algorithms as special cases. Keywords: semi-metric labeling, move-making algorithms, linear programming relaxation, multiplicative bounds 1. Introduction A Markov random field (MRF) is a graph whose vertices are random variables, and whose edges specify a neighborhood over the random variables. Each random variable can be assigned a value from a set of labels, resulting in a labeling of the MRF. The putative labelings of an MRF are quantitatively distinguished from each other by an energy function, which is the sum of potential functions that depend on the cliques of the graph. An important optimization problem associate with the MRF framework is energy minimization, that is, finding a labeling with the minimum energy. Semi-metric labeling is a special case of energy minimization, which models several useful low-level vision tasks (Boykov et al., 1998, 1999; Szeliski et al., 2008). It is characterized by a finite, discrete label set and a semi-metric distance function over the labels. The energy function in semi- c 2016 M. Pawan Kumar and Puneet Dokania. KUMAR AND DOKANIA metric labeling consists of arbitrary unary potentials and pairwise potentials that are proportional to the distance between the labels assigned to them. The problem is known to be NP-hard (Veksler, 1999). Two popular approaches for semi-metric labeling are: (i) move-making algorithms (Boykov et al., 1999; Gupta and Tardos, 2000; Kumar and Koller, 2009; Kumar and Torr, 2008; Veksler, 2007), which iteratively improve the labeling by solving a minimum st-cut problem; and (ii) linear programming (LP) relaxation (Chekuri et al., 2001; Koster et al., 1998; Schlesinger, 1976; Wainwright et al., 2005), which is obtained by dropping the integral constraints in the corresponding integer programming formulation. Move-making algorithms are very efficient due to the availability of fast minimum st-cut solvers (Boykov and Kolmogorov, 2004) and are very popular in the computer vision community. In contrast, the LP relaxation is significantly slower, despite the development of several specialized solvers (Globerson and Jaakkola, 2007; Hazan and Shashua, 2008; Kolmogorov, 2006; Komodakis et al., 2007; Ravikumar et al., 2008; Tarlow et al., 2011; Wainwright et al., 2005; Weiss et al., 2007; Werner, 2007, 2010). However, when used in conjunction with randomized rounding algorithms, the LP relaxation provides the best known polynomial-time theoretical guarantees for semi-metric labeling (Archer et al., 2004; Chekuri et al., 2001; Kleinberg and Tardos, 1999). At first sight, the difference between move-making algorithms and the LP relaxation appears to be the standard accuracy vs. speed trade-off. However, we prove that, for any arbitrary semimetric distance function, it is possible to design move-making algorithms that match the theoretical guarantees of a large class of randomized rounding procedures, which we call parallel rounding. Our proofs are constructive, which allows us to test the rounding-based move-making algorithms empirically. Our experimental results confirm that rounding-based moves provide similar accuracy to the LP relaxation while being significantly faster. 2. Related Work As our work is concerned with only semi-metric labeling, we will focus on the related work for this special case of energy minimization. For a more general survey, we refer the reader to (Wang et al., 2013), and for a thorough empirical comparison of the various algorithms we refer the reader to (Kappes et al., 2015; Szeliski et al., 2008). The earliest known theoretical guarantee of a move-making algorithm was provided by Veksler (1999), who presented an analysis of the popular expansion move-making algorithm. The analysis showed that the expansion algorithm and the parallel rounding procedure of Kleinberg and Tardos (1999) provide matching guarantees for the uniform distance function. Komodakis et al. (2008) also proposed a move-making algorithm based on the primal-dual scheme of optimizing the LP relaxation. The resulting algorithm is similar to the expansion algorithm in terms of its theoretical guarantees, but provides a principled formulation for improving its efficiency in dynamic settings. Despite the existing analysis of expansion-like algorithms, parallel rounding procedures (Chekuri et al., 2001; Kleinberg and Tardos, 1999) are known to perform better for several distance functions of interest such as the truncated linear distance, the truncated quadratic distance, and the hierarchically well-separated tree (HST) metric. Kumar and Torr (2008) proposed the range expansion move-making algorithm that matches the guarantees of the rounding procedure of Chekuri et al. (2001) for the truncated linear and quadratic distance. Kumar and Koller (2009) proposed a hierarchical move-making algorithm that matches the guarantees of the rounding procedure of Kleinberg and Tardos (1999) for the HST metric. However, their analysis is restricted to special cases of semi-metric labeling, whereas our work does not make ROUNDING-BASED MOVES FOR SEMI-METRIC LABELING any assumptions on the form of the semi-metric distance function. We note that, recently, a more general algorithm has also been proposed for non-convex priors (Ajanthan et al., 2014). However, this generalized algorithm does not provide any strong theoretical gurantees on the quality of the solution, which is the key focus of our work. While our analysis focuses on parallel rounding, it is also worth noting that Archer et al. (2004) have proposed a serial rounding procedure for metric labeling (which is a special case of semi-metric labeling where the distance function also satisfies the triangular inequality property). The theoretical guarantee of serial rounding depends on the tree decomposibility of a distance function. Given an MRF with n random variables, the tree decomposibility for any metric distance function is guaranteed to be less than or equal to O(log n). This worst-case theoretical guarantee of O(log n) over all metric distance functions can be matched by a mixture-of-tree algorithm that can be derived form the work of Andrew et al. (2011). Specifically, Andrew et al. (2011) describe an approximate algorithm for capacitated metric labeling that builds on the cut-based decomposition method of Racke (2008). Ignoring the capacity constraints, which are not present in the standard metric labeling problem, results in an approximation of the original MRF using an O(m log n) mixture of tree-structured MRFs. Here, m is the number of edges in the MRF, and n is the number of random variables. Solving the metric labeling problem over each tree-structured MRF optimally using belief propagation (Pearl, 1998), and choosing the best labeling in terms of the original problem, provides the O(log n) guarantee. We refer the reader to (Andrew et al., 2011) for details. Note that while the mixture-of-tree algorithm provides matching worst-case guarantees to serial rounding, it does not provide a matching guarantee for a given distance function. In other words, the guarantee of the mixture-of-tree algorithm does not depend on the tree decomposibility of the distance function. The existence or the impossibility of a matching combinatorial algorithm for serial rounding remains an open question. 3. Our Contributions Our paper can be thought of as a generalization of (Kumar and Koller, 2009; Kumar and Torr, 2008; Veksler, 1999). Indeed, the move-making algorithms proposed in this paper are simple extensions of the expansion, the range expansion and the hierarchical move-making algorithms. The novelty of our work lies in our analysis, which improves on the state of the art in two significant aspects. First, we do not place any assumptions on the form of the distance function except that it is a semi-metric distance. This is in contrast to previous works that only focus on special cases such as the uniform metric, the truncated convex models or the HST metric. Second, we show that the matching guarantees provided by our move-making algorithms and the parallel rounding procedures are tight, that is, they cannot be improved through a more careful analysis. A preliminary version of this article has appeared as (Kumar, 2014). There are two significant differences between the preliminary version and the current paper. First, we provide detailed proofs of all the theorems, which were omitted from the preliminary version due to a lack of space. Second, we provide an experimental comparison of the rounding-based move-making algorithms with the state of the art methods for semi-metric labeling, including those that are directly related to the LP relaxation such as TRW-S (Kolmogorov, 2006), and those that are indirectly related to the LP relaxation such as (Andrew et al., 2011). Finally, we note that since the publication of the preliminary version, we have also extended rounding-based move-making algorithms to special cases of high-order potentials. This includes the KUMAR AND DOKANIA interval move-making algorithm for truncated max-of-convex potentials (Pansari and Kumar, 2015) and the hierarchical move-making algorithm for parsimonious labeling (Dokania and Kumar, 2015). 4. Preliminaries 4.1 Semi-metric Labeling The problem of semi-metric labeling is defined over an undirected graph G = (X, E). The vertices X = {X1, X2, , Xn} are random variables, and the edges E specify a neighborhood relationship over the random variables. Each random variable can be assigned a value from the label set L = {l1, l2, , lh}. We assume that we are also provided with a semi-metric distance function d : L L R+ over the labels. Recall that a semi-metric distance function satisfies the following properties: d(li, lj) 0 for all li, lj L, and d(li, lj) = 0 if and only if i = j. Furthermore, a distance function is said to be metric if, in addition to the above condition, it also satisfies the triangular inequality, that is, d(li, lj) + d(lj, lk) d(li, lk) for all li, lj, lk L. We refer to an assignment of values to all the random variables as a labeling. In other words, a labeling is a vector x Ln, which specifies the label xa assigned to each random variable Xa. The hn different labelings are quantitatively distinguished from each other by an energy function Q(x), which is defined as follows: Xa X θa(xa) + X (Xa,Xb) E wabd(xa, xb). Here, the unary potentials θa( ) are arbitrary, and the edge weights wab are non-negative. Semi-metric labeling requires us to find a labeling with the minimum energy. It is known to be NP-hard. 4.2 Multiplicative Bound As semi-metric labeling plays a central role in low-level vision, several approximate algorithms have been proposed in the literature. A common theoretical measure of accuracy for an approximate algorithm is the multiplicative bound. In this work, we are interested in the multiplicative bound of an algorithm with respect to a distance function. Formally, given a distance function d, the multiplicative bound of an algorithm is said to be B if the following condition is satisfied for all possible values of unary potentials θa( ) and non-negative edge weights wab: X Xa X θa(ˆxa) + X (Xa,Xb) E wabd(ˆxa, ˆxb) X Xa X θa(x a) + B X (Xa,Xb) E wabd(x a, x b). (1) Here, ˆx is the labeling estimated by the algorithm for the given values of unary potentials and edge weights, and x is an optimal labeling. Multiplicative bounds are greater than or equal to 1, and are invariant to reparameterizations of the unary potentials. A multiplicative bound B is said to be tight if the above inequality holds as an equality for some value of unary potentials and edge weights. 4.3 Linear Programming Relaxation An overcomplete representation of a labeling can be specified using the following variables: (i) unary variables ya(i) {0, 1} for all Xa X and li L such that ya(i) = 1 if and only if Xa is assigned the label li; and (ii) pairwise variables yab(i, j) {0, 1} for all (Xa, Xb) E and li, lj L such ROUNDING-BASED MOVES FOR SEMI-METRIC LABELING that yab(i, j) = 1 if and only if Xa and Xb are assigned labels li and lj respectively. This allows us to formulate semi-metric labeling as follows: li L θa(li)ya(i) + X li,lj L wabd(li, lj)yab(i, j), li L ya(i) = 1, Xa X, lj L yab(i, j) = ya(i), (Xa, Xb) E, li L, li L yab(i, j) = yb(j), (Xa, Xb) E, lj L, ya(i) {0, 1}, yab(i, j) {0, 1}, Xa X, (Xa, Xb) E, li, lj L. The first set of constraints ensures that each random variables is assigned exactly one label. The second and third sets of constraints ensure that, for binary optimization variables, yab(i, j) = ya(i)yb(j). By relaxing the final set of constraints such that the optimization variables can take any value between 0 and 1 inclusive, we obtain a linear program (LP). The computational complexity of solving the LP relaxation is polynomial in the size of the problem. 4.4 Rounding Procedure In order to prove theoretical guarantees of the LP relaxation, it is common to use a rounding procedure that can covert a feasible fractional solution y of the LP relaxation to a feasible integer solution ˆy of the integer linear program. Several rounding procedures have been proposed in the literature. In this work, we focus on the randomized parallel rounding procedures proposed by Chekuri et al. (2001) and Kleinberg and Tardos (1999). These procedures have the property that, given a fractional solution y, the probability of assigning a label li L to a random variable Xa X is equal to ya(i), that is, Pr(ˆya(i) = 1) = ya(i). (2) We will describe the various rounding procedures in detail in Sections 5-7. For now, we would like to note that our reason for focusing on the parallel rounding of Chekuri et al. (2001) and Kleinberg and Tardos (1999) is that they provide the best known polynomial-time theoretical guarantees for semi-metric labeling. Specifically, we are interested in their approximation factor, which is defined next. 4.5 Approximation Factor Given a distance function d, the approximation factor for a rounding procedure is said to be F if the following condition is satisfied for all feasible fractional solutions y: li,lj L d(li, lj)ˆya(i)ˆyb(j) li,lj L d(li, lj)yab(i, j). (3) Here, ˆy refers to the integer solution, and the expectation is taken with respect to the randomized rounding procedure applied to the feasible solution y. KUMAR AND DOKANIA Given a rounding procedure with an approximation factor of F, an optimal fractional solution y of the LP relaxation can be rounded to a labeling ˆy that satisfies the following condition: li L θa(li)ˆya(i) + X li,lj L wabd(li, lj)ˆya(i)ˆyb(j) li L θa(li)y a(i) + F X li,lj L wabd(li, lj)y ab(i, j). The above inequality follows directly from properties (2) and (3). Similar to multiplicative bounds, approximation factors are always greater than or equal to 1, and are invariant to reparameterizations of the unary potentials. An approximation factor F is said to be tight if the above inequality holds as an equality for some value of unary potentials and edge weights. Approximation factors are closely linked to the integrality gap of the LP relaxation (roughly speaking, the ratio of the optimal value of the integer linear program to the optimal value of the relaxation), which in turn is related to the computational hardness of the semi-metric labeling problem (Manokaran et al., 2008). However, establishing the integrality gap of the LP relaxation for a given distance function is beyond the scope of this work. We are only interested in designing move-making algorithms whose multiplicative bounds match the approximation factors of the parallel rounding procedures. 4.6 Submodular Energy Function We will use the following important fact throughout this paper. Given an energy function defined using arbitrary unary potentials, non-negative edge weights and a submodular distance function, an optimal labeling can be computed in polynomial time by solving an equivalent minimum st-cut problem (Flach and Schlesinger, 2006). Recall that a submodular distance function d over a label set L = {l1, l2, , lh} satisfies the following properties: (i) d (li, lj) 0 for all li, lj L, and d (li, lj) = 0 if and only if i = j; and (ii) d (li, lj) + d (li+1, lj+1) d (li, lj+1) + d (li+1, lj) for all li, lj L\{lh} (where \ refers to set difference). 5. Complete Rounding and Complete Move We start with a simple rounding scheme, which we call complete rounding. While complete rounding is not very accurate, it would help illustrate the flavor of our results. We will subsequently consider its generalizations, which have been useful in obtaining the best-known approximation factors for various special cases of semi-metric labeling. The complete rounding procedure consists of a single stage where we use the set of all unary variables to obtain a labeling (as opposed to other rounding procedures discussed subsequently). Algorithm 1 describes its main steps. Intuitively, it treats the value of the unary variable ya(i) as the probability of assigning the label li to the random variable Xa. It obtains a labeling by sampling from all the distributions ya = [ya(i), li L] simultaneously using the same random number r [0, 1]. It can be shown that using a different random number to sample the distributions ya and yb of two neighboring random variables (Xa, Xb) E results in an infinite approximation factor. For example, let ya(i) = yb(i) = 1/h for all li L, where h is the number of labels. The pairwise variables yab that minimize the energy function are yab(i, i) = 1/h and yab(i, j) = 0 when i = j. ROUNDING-BASED MOVES FOR SEMI-METRIC LABELING For the above feasible solution of the LP relaxation, the RHS of inequality (3) is 0 for any finite F, while the LHS of inequality (3) is strictly greater than 0 if h > 1. However, we will shortly show that using the same random number r for all random variables provides a finite approximation factor. Algorithm 1 The complete rounding procedure. input A feasible solution y of the LP relaxation. 1: Pick a real number r uniformly from [0, 1]. 2: for all Xa X do 3: Define Ya(0) = 0 and Ya(i) = Pi j=1 ya(j) for all li L. 4: Assign the label li L to the random variable Xa if Ya(i 1) < r Ya(i). We now turn our attention to designing a move-making algorithm whose multiplicative bound matches the approximation factor of the complete rounding procedure. To this end, we modify the range expansion algorithm proposed by Kumar and Torr (2008) for truncated convex pairwise potentials to a general semi-metric distance function. Our method, which we refer to as the complete move-making algorithm, considers all putative labels of all random variables, and provides an approximate solution in a single iteration. Algorithm 2 describes its two main steps. First, it computes a submodular overestimation of the given semi-metric distance function by solving the following optimization problem: d = argmin d t (4) s.t. d (li, lj) td(li, lj), li, lj L, d (li, lj) d(li, lj), li, lj L, d (li, lj) + d (li+1, lj+1) d (li, lj+1) + d (li+1, lj), li, lj L\{lh}. The above problem minimizes the maximum ratio of the estimated distance to the original distance over all pairs of labels, that is, max i =j d (li, lj) d(li, lj) . We will refer to the optimal value of problem (4) as the submodular distortion of the distance function d. Second, it replaces the original distance function by the submodular overestimation and computes an approximate solution to the original semi-metric labeling problem by solving a single minimum st-cut problem. Note that, unlike the range expansion algorithm (Kumar and Torr, 2008) that uses the readily available submodular overestimation of a truncated convex distance (namely, the corresponding convex distance function), our approach estimates the submodular overestimation via the LP (4). Since the LP (4) can be solved for any arbitrary distance function, it makes complete move-making more generally applicable. The following theorem establishes the theoretical guarantees of the complete move-making algorithm and the complete rounding procedure. Theorem 1 The tight multiplicative bound of the complete move-making algorithm is equal to the submodular distortion of the distance function. Furthermore, the tight approximation factor of the complete rounding procedure is also equal to the submodular distortion of the distance function. KUMAR AND DOKANIA Algorithm 2 The complete move-making algorithm. input Unary potentials θa( ), edge weights wab, distance function d. 1: Compute a submodular overestimation of d by solving problem (4). 2: Using the approach of Flach and Schlesinger (2006), solve the following problem via an equivalent minimum st-cut problem: ˆx = argmin x Ln Xa X θa(xa) + X (Xa,Xb) E wabd(xa, xb). The proof of Theorem 1 is given in Appendix A. The following corollary of the above theorem was previously stated by Chekuri et al. (2001) without a formal proof. Corollary 2 The complete rounding procedure is tight for submodular distance functions, that is, its approximation factor is equal to 1. In terms of computational complexities, complete move-making is significantly faster than solving the LP relaxation. Specifically, given an MRF with n random variables and m edges, and a label set with h labels, the LP relaxation requires at least O(m3h3 log(m2h3)) time, since it consists of O(mh2) optimization variables and O(mh) constraints. In contrast, complete movemaking requires O(nmh3 log(m)) time, since the graph constructed using the method of Flach and Schlesinger (2006) consists of O(nh) nodes and O(mh2) arcs. Note that complete move-making also requires us to solve the linear program (4). However, since problem (4) is independent of the unary potentials and the edge weights, it only needs to be solved once beforehand in order to compute the approximate solution for any semi-metric labeling problem defined using the distance function d. 6. Interval Rounding and Interval Moves Theorem 1 implies that the approximation factor of the complete rounding procedure is very large for distance functions that are highly non-submodular. For example, consider the truncated linear distance function defined as follows over a label set L = {l1, l2, , lh}: d(li, lj) = min{|i j|, M}. Here, M is a user specified parameter that determines the maximum distance. The tightest submodular overestimation of the above distance function is the linear distance function, that is, d(li, lj) = |i j|. This implies that the submodular distortion of the truncated linear metric is (h 1)/M, and therefore, the approximation factor for the complete rounding procedure is also (h 1)/M. In order to avoid this large approximation factor, Chekuri et al. (2001) proposed an interval rounding procedure, which captures the intuition that it is beneficial to assign similar labels to as many random variables as possible. Algorithm 3 provides a description of interval rounding. The rounding procedure chooses an interval of at most q consecutive labels (step 2). It generates a random number r (step 3), and uses it to attempt to assign labels to previously unlabeled random variables from the selected interval (steps 4-7). It can be shown that the overall procedure converges in a polynomial number of iterations with a probability of 1 (Chekuri et al., 2001). Note that if we fix q = h and z = 1, interval ROUNDING-BASED MOVES FOR SEMI-METRIC LABELING rounding becomes equivalent to complete rounding. However, the analyses of Chekuri et al. (2001) and Kleinberg and Tardos (1999) shows that other values of q provide better approximation factors for various special cases. Algorithm 3 The interval rounding procedure. input A feasible solution y of the LP relaxation. 2: Pick an integer z uniformly from [ q + 2, h]. Define an interval of labels I = {ls, , le}, where s = max{z, 1} is the start index and e = min{z + q 1, h} is the end index. 3: Pick a real number r uniformly from [0, 1]. 4: for all Unlabeled random variables Xa do 5: Define Ya(0) = 0 and Ya(i) = Ps+i 1 j=s ya(j) for all i {1, , e s + 1}. 6: Assign the label ls+i 1 I to the Xa if Ya(i 1) < r Ya(i). 8: until All random variables have been assigned a label. Our goal is to design a move-making algorithm whose multiplicative bound matches the approximation factor of interval rounding for any choice of q. To this end, we propose the interval move-making algorithm that generalizes the range expansion algorithm (Kumar and Torr, 2008), originally proposed for truncated convex distances, to arbitrary distance functions. Algorithm 4 provides its main steps. The central idea of the method is to improve a given labeling ˆx by allowing each random variable Xa to either retain its current label ˆxa or to choose a new label from an interval of consecutive labels. In more detail, let I = {ls, , le} L be an interval of labels of length at most q (step 4). For the sake of simplicity, let us assume that ˆxa / I for any random variable Xa. We define Ia = I S{ˆxa} (step 5). For each pair of neighboring random variables (Xa, Xb) E, we compute a submodular distance function dˆxa,ˆxb : Ia Ib R+ by solving the following linear program (step 6): dˆxa,ˆxb = argmin d t (5) s.t. d (li, lj) td(li, lj), li Ia, lj Ib, d (li, lj) d(li, lj), li Ia, lj Ib, d (li, lj) + d (li+1, lj+1) d (li, lj+1) + d (li+1, lj), li, lj I\{le}, d (li, le) + d (li+1, ˆxb) d (li, ˆxb) + d (li+1, le), li I\{le}, d (le, lj) + d (ˆxa, lj+1) d (le, lj+1) + d (ˆxa, lj), lj I\{le}, d (le, le) + d(ˆxa, ˆxb) d (le, ˆxb) + d (ˆxa, le). Similar to problem (4), the above problem minimizes the maximum ratio of the estimated distance to the original distance. However, instead of introducing constraints for all pairs of labels, it only considers pairs of labels li and lj where li Ia and lj Ib. Furthermore, it does not modify the distance between the current labels ˆxa and ˆxb (as can be seen in the last constraint of problem (5)). Given the submodular distance functions dˆxa,ˆxb, we can compute a new labeling x by solving the following optimization problem via minimum st-cut using the method of Flach and Schlesinger KUMAR AND DOKANIA (2006) (step 7): x = argmin x Xa X θa(xa) + X (Xa,Xb) E wabdˆxa,ˆxb(xa, xb) s.t. xa Ia, Xa X. (6) If the energy of the new labeling x is less than that of the current labeling ˆx, then we update our labeling to x (steps 8-10). Otherwise, we retain the current estimate of the labeling and consider another interval. The algorithm converges when the energy does not decrease for any interval of length at most q. Note that, once again, the main difference between interval move-making and the range expansion algorithm is the use of an appropriate optimization problem, namely the LP (5), to obtain a submodular overestimation of the given distance function. This allows us to use the interval move-making algorithm for the general semi-metric labeling problem, instead of focusing on only truncated convex models. Algorithm 4 The interval move-making algorithm. input Unary potentials θa( ), edge weights wab, distance function d, initial labeling x0. 1: Set current labeling to initial labeling, that is, ˆx = x0. 3: for all z [ q + 2, h] do 4: Define an interval of labels I = {ls, , le}, where s = max{z, 1} is the start index and e = min{z + q 1, h} is the end index. 5: Define Ia = I S{ˆxa} for all random variables Xa X. 6: Obtain submodular overestimates dˆxa,ˆxb for each pair of neighboring random variables (Xa, Xb) E by solving problem (5). 7: Obtain a new labeling x by solving problem (6). 8: if Energy of x is less than energy of ˆx then 9: Update ˆx = x. 11: end for 12: until Energy cannot be decreased further. The following theorem establishes the theoretical guarantees of the interval move-making algorithm and the interval rounding procedure. Theorem 3 The tight multiplicative bound of the interval move-making algorithm is equal to the tight approximation factor of the interval rounding procedure. The proof of Theorem 3 is given in Appendix B. While Algorithms 3 and 4 use intervals of consecutive labels, they can easily be modified to use subsets of (potentially non-consecutive) labels. Our analysis could be extended to show that the multiplicative bound of the resulting subset move-making algorithm matches the approximation factor of the subset rounding procedure. However, our reason for focusing on intervals of consecutive labels is that several special cases of Theorem 3 have previously been considered separately in the literature (Gupta and Tardos, 2000; Kumar and Koller, 2009; Kumar and Torr, 2008; Veksler, 1999). Specifically, the following known results are corollaries of the above theorem. Note that, while the following corollaries have been previously proved in the literature, our work is the first to establish the tightness of the theoretical guarantees. ROUNDING-BASED MOVES FOR SEMI-METRIC LABELING Corollary 4 When q = 1, the multiplicative bound of the interval move-making algorithm (which is equivalent to the expansion algorithm) for the uniform metric distance is 2. The above corollary follows from the approximation factor of the interval rounding procedure proved by Kleinberg and Tardos (1999), but it was independently proved by Veksler (1999). Corollary 5 When q = M, the multiplicative bound of the interval move-making algorithm for the truncated linear distance function is 4. The above corollary follows from the approximation factor of the interval rounding procedure proved by Chekuri et al. (2001), but it was independently proved by Gupta and Tardos (2000). Corollary 6 When q = 2M, the multiplicative bound of the interval move-making algorithm for the truncated linear distance function is 2 + The above corollary follows from the approximation factor of the interval rounding procedure proved by Chekuri et al. (2001), but it was independently proved by Kumar and Torr (2008). Finally, since our analysis does not use the triangular inequality of metric distance functions, we can also state the following corollary for the truncated quadratic distance. Corollary 7 When q = M, the multiplicative bound of the interval move-making algorithm for the truncated quadratic distance function is O( The above corollary follows from the approximation factor of the interval rounding procedure proved by Chekuri et al. (2001), but it was independently proved by Kumar and Torr (2008). An interval move-making algorithm that uses an interval length of q runs for at most O(h/q) iterations. This follows from a simple modification of the result by Gupta and Tardos (2000) (specifically, theorem 3.7). Hence, the total time complexity of interval move-making is O(nmhq2 log(m)), since each iteration solves a minimum st-cut problem of a graph with O(nq) nodes and O(mq2) arcs. In other words, interval move-making is at most as computationally complex as complete move-making, which in turn is significantly less complex than solving the LP relaxation. Note that problem (5), which is required for interval move-making, is independent of the unary potentials and the edge weights. Hence, it only needs to be solved once beforehand for all pairs of labels (ˆxa, ˆxb) L L in order to obtain a solution for any semi-metric labeling problem defined using the distance function d. 7. Hierarchical Rounding and Hierarchical Moves We now consider the most general form of parallel rounding that has been proposed in the literature, namely the hierarchical rounding procedure (Kleinberg and Tardos, 1999). The rounding relies on a hierarchical clustering of the labels. Formally, we denote a hierarchical clustering of m levels for the label set L by C = {C(i), i = 1, , m}. At each level i, the clustering C(i) = {C(i, j) L, j = 1, , hi} is mutually exclusive and collectively exhaustive, that is, [ j C(i, j) = L, C(i, j) C(i, j ) = , j = j . Furthermore, for each cluster C(i, j) at the level i > 2, there exists a unique cluster C(i 1, j ) in the level i 1 such that C(i, j) C(i 1, j ). We call the cluster C(i 1, j ) the parent of the KUMAR AND DOKANIA Algorithm 5 The hierarchical rounding procedure. input A feasible solution y of the LP relaxation. 1: Define f1 a = 1 for all Xa X. 2: for all i {2, , m} do 3: for all Xa X do 4: Define zi a(j) for all j {1, , hi} as follows: zi a(j) = P k s.t. lk C(i,j) ya(k) if p(i, j) = fi 1 a , 0 otherwise. 5: Define yi a(j) for all j {1, , hi} as follows: yi a(j) = zi a(j) Phi j =1 zia(j ) 7: Using a rounding procedure (complete or interval) on yi = [yi a(j), Xa X, j {1, , hi}], obtain an integer solution ˆyi. 8: for all Xa X do 9: Let ka {1, , hi} such that ˆyi(ka) = 1. Define fi a = ka. 10: end for 11: end for 12: for all Xa X do 13: Let lk be the unique label present in the cluster C(m, fm a ). Assign lk to Xa. 14: end for ROUNDING-BASED MOVES FOR SEMI-METRIC LABELING cluster C(i, j) and define p(i, j) = j . Similarly, we call C(i, j) a child of C(i 1, j ). Without loss of generality, we assume that there exists a single cluster at level 1 that contains all the labels, and that each cluster at level m contains a single label. Algorithm 5 describes the hierarchical rounding procedure. Given a clustering C, it proceeds in a top-down fashion through the hierarchy while assigning each random variable to a cluster in the current level. Let fi a be the index of the cluster assigned to the random variable Xa in the level i. In the first step, the rounding procedure assigns all the random variables to the unique cluster C(1, 1) (step 1). At each step i, it assigns each random variable to a unique cluster in the level i by computing a conditional probability distribution as follows. The conditional probability yi a(j) of assigning the random variable Xa to the cluster C(i, j) is proportional to P lk C(i,j) ya(k) if p(i, j) = fi 1 a (steps 3-6). The conditional probability yi a(j) = 0 if p(i, j) = fi 1 a , that is, a random variable cannot be assigned to a cluster C(i, j) if it wasn t assigned to its parent in the previous step. Using a rounding procedure (complete or interval) for yi, we obtain an assignment of random variables to the clusters at level i (step 7). Once such an assignment is obtained, the values fi a are computed for all random variables Xa (steps 8-10). At the end of step m, hierarchical rounding would have assigned each random variable to a unique cluster in the level m. Since each cluster at level m consists of a single label, this provides us with a labeling of the MRF (steps 12-14). Algorithm 6 The hierarchical move-making algorithm. input Unary potentials θa( ), edge weights wab, distance function d. 1: for all j {1, , h} do 2: Let lk be the unique label in the cluster C(m, j). Define xm,j a = lk for all Xa X. 4: for all i {2, , m} do 5: for all j {1, , hm i+1} do 6: Define Lm i+1,j a = {xm i+2,j a , p(m i + 2, j ) = j, j {1, , hm i+2}}. 7: Using a move-making algorithm (complete or interval), compute the labeling xm i+1,j under the constraint xm i+1,j a Lm i+1,j a . 10: The final solution is x1,1. Our goal is to design a move-making algorithm whose multiplicative bound matches the approximation factor of the hierarchical rounding procedure for any choice of hierarchical clustering C. To this end, we propose the hierarchical move-making algorithm, which extends the hierarchical graph cuts approach for hierarchically well-separated tree (HST) metrics proposed by Kumar and Koller (2009). Algorithm 6 provides its main steps. In contrast to hierarchical rounding, the move-making algorithm traverses the hierarchy in a bottom-up fashion while computing a labeling for each cluster in the current level. Let xi,j be the labeling corresponding to the cluster C(i, j). At the first step, when considering the level m of the clustering, all the random variables are assigned the same label. Specifically, xm,j a is equal to the unique label contained in the cluster C(m, j) (steps 1-3). At step i, it computes the labeling xm i+1,j for each cluster C(m i + 1, j) by using the labelings computed in the previous step. Specifically, it restricts the label assigned to a random variable Xa in the labeling xm i+1,j to the subset of labels that were assigned to it by the labelings corresponding to the children of C(m i + 1, j) (step 6). Under this restriction, the labeling xm i+1,j is computed KUMAR AND DOKANIA by approximately minimizing the energy using a move-making algorithm (step 7). Implicit in our description is the assumption that that we will use a move-making algorithm (complete or interval) in step 7 of Algorithm 6 whose multiplicative bound matches the approximation factor of the rounding procedure (complete or interval) used in step 7 of Algorithm 5. Note that, unlike the hierarchical graph cuts approach (Kumar and Koller, 2009), the hierarchical move-making algorithm can be used for any arbitrary clustering and not just the one specified by an HST metric. The following theorem establishes the theoretical guarantees of the hierarchical move-making algorithm and the hierarchical rounding procedure. Theorem 8 The tight multiplicative bound of the hierarchical move-making algorithm is equal to the tight approximation factor of the hierarchical rounding procedure. The proof of the above theorem is given in Appendix C. The following known result is its corollary. Corollary 9 The multiplicative bound of the hierarchical move-making algorithm is O(1) for an HST metric distance. The above corollary follows from the approximation factor of the hierarchical rounding procedure proved by Kleinberg and Tardos (1999), but it was independently proved by Kumar and Koller (2009). It is worth noting that the above result was also used to obtain an approximation factor of O(log h) for the general metric labeling problem by Kleinberg and Tardos (1999) and a matching multiplicative bound of O(log h) by Kumar and Koller (2009). The time complexity of the hierarchical move-making algorithm depends on two factors: (i) the hierarchy, which defines the subproblems; and (ii) whether we use a complete or an interval move-making algorithm to solve the subproblems. In what follows, we analyze its worst case time complexity over all possible hierarchies. Clearly, since complete move-making is more expensive than interval move-making, the worst case complexity of hierarchical move-making will be achieved when we use complete move-making to solve each subproblem. We begin by noting that each problem is defined using a smaller label set. For simplicitly, let us assume that the clusters of the hierarchy are balanced, and that each subproblem is solved over h h labels. It follows that the total number of subproblems that we would be required to solve is O((h/h )(log(h)/ log(h )) 1). Next, observe that the complexity of the complete move-making algorithms is cubic in the number of labels. Thus, it follows that in order to maximize the total time complexity of hierarchical move-making, we need to set h = h. In other words, hierarchical move-making algorithm is at most as computationally complex as the complete move-making algorithm. Recall that the complete move-making complexity is O(nmh3 log(m)). Hence, hierarchical move-making is significantly faster than solving the LP relaxation. 8. Experiments We demonstrate the efficacy of rounding-based moves by comparing them to several state of the art methods using both synthetic and real data. 8.1 Synthetic Experiments We generated two synthetic data sets to conduct our experiments. The first data set consists of random grid MRFs of size 100 100, where each random variable can take one of 10 labels. The unary ROUNDING-BASED MOVES FOR SEMI-METRIC LABELING potentials were sampled from a uniform distribution over [0, 10]. The edge weights were sampled from a uniform distribution over [0, 3]. We considered four types of pairwise potentials: (i) truncated linear metric, where the truncation is sampled from a uniform distribution over [1, 5]; (ii) truncated quadratic semi-metric, where the truncation is sampled from a uniform distribution over [1, 25]; (iii) random metrics, generated by computing the shortest path on graphs whose vertices correspond to the labels and whose edge lengths are uniformly distributed over [1, 10]; (iv) random semi-metrics, where the distance between two labels is sampled from a uniform distribution over [1, 10]. For each type of pairwise potentials, we generated 500 different MRFs. The second data set is similar to the first one, except that it is defined on a smaller grid of size 20 20. The smaller grid size allows us to test the mixture-of-tree algorithm (Andrew et al., 2011) that is closely related to the serial rounding procedure of Archer et al. (2004). We generated 5 different grids by sampling their edge weights from a uniform distribution over [5, 100]. For each set of edge weights, we use the cut-based decomposition method of Racke (2008) to approximate the original grid graph using a mixture of tree-structured graphs. We generated three types of unary potentials by uniformly sampling from the range [0, 100], [100, 1000] and [1000, 10000] respectively. Similar to the first data set, we considered four types of pairwise potentials. For each pair of unary potential type and pairwise potential type, we generated 100 different MRFs. This provided us with 1200 MRFs for each set of edge weights, and a total of 6000 MRFs to perform our experiments. 8.1.2 METHODS We report results obtained by the following state of the art methods using the first data set: (i) belief propagation (BP) (Pearl, 1998); (ii) sequential tree-reweighted message passing (TRW) (Kolmogorov, 2006), which optimizes the dual of the LP relaxation, and provides comparable results to other LP relaxation based approaches; (iii) expansion algorithm (EXP) (Boykov et al., 1999); and (iv) swap algorithm (SWAP) (Boykov et al., 1998). We compare the above methods to a hierarchy move-making algorithm (HIER), where a set of hierarchies is obtained by approximating a given semi-metric as a mixture of r-HST metrics using the method proposed by Fakcharoenphol et al. (2003). We refer the reader to (Fakcharoenphol et al., 2003; Kumar and Koller, 2009) for details. Each subproblem of the hierarchical move-making algorithm is solved by interval move-making with interval length q = 1 (which corresponds to the expansion algorithm). In addition, for the truncated linear and truncated quadratic cases, we present results of interval move-making (INT) using the optimal interval length reported by Kumar and Torr (2008). For the second data set, we report the results for all the above methods, as well as the mixtureof-tree (MOT) algorithm. The smaller size of the grid in the second data set allows us to obtain a mixture of tree-structured MRFs using the method of Racke (2008), which was not possible for the larger 100 100 grid used in the first data set. 8.1.3 RESULTS Figure 1 shows the results for the first data set. In terms of the energy, TRW is the most accurate. However, it is slow as it optimizes the dual of the LP relaxation. The labelings obtained by BP have high energy values. The standard move-making algorithms, EXP and SWAP, are fast due to the use of efficient minimum st-cut solvers. However, they are not as accurate as TRW. For the truncated linear and quadratic pairwise potentials, INT provides labelings with comparable energy to those of TRW, and is also computationally efficient. However, for general metrics and semi-metrics, KUMAR AND DOKANIA 0 2 4 6 8 10 12 4.8 BP TRW DUAL EXP SWAP INT HIER 0 5 10 15 4 BP TRW DUAL EXP SWAP INT HIER Truncated Linear Truncated Quadratic 0 2 4 6 8 10 5.1 BP TRW DUAL EXP SWAP HIER 0 2 4 6 8 10 5.1 BP TRW DUAL EXP SWAP HIER Metric Semi-Metric Figure 1: Results for the first synthetic data set. The x-axis shows the time in seconds, while the y-axis shows the energy value. The dashed line shows the value of the dual of the LP obtained by TRW. Best viewed in color. it is not obvious how to obtain the optimal interval length. The HIER method is more generally applicable as there exist standard methods to approximate a semi-metric with a mixture of r-HST metrics (Fakcharoenphol et al., 2003). It provides very accurate labelings (comparable to TRW), and is efficient in practice as it relies on solving each subproblem using an iterative move-making algorithm. Figure 2 shows the results for the second data set. The results of all the methods are similar to the first data set. However, the additional MOT algorithm performs very poorly compared to all other methods. This may be explained by the fact that, while MOT provides the same worst-case guarantees as serial rounding over all possible metric distance functions, it does not match the accuracy of serial rounding for a given distance function. ROUNDING-BASED MOVES FOR SEMI-METRIC LABELING Truncated Linear Truncated Quadratic Metric Semi-Metric Figure 2: Results for the second synthetic data set. The x-axis shows the time in seconds, while the y-axis shows the energy value. The dashed line shows the value of the dual of the LP obtained by TRW. Best viewed in color. KUMAR AND DOKANIA 8.2 Dense Stereo Correspondence Given two epipolar rectified images of the same scene, the problem of dense stereo correspondence requires us to obtain a correspondence between the pixels of the images. This problem can be modeled as semi-metric labeling, where the random variables represent the pixels of one of the images, and the labels represent the disparity values. A disparity label li for a random variable Xa representing a pixel (ua, va) of an image indicates that its corresponding pixel lies in location (ua + i, va). For the above problem, we use the unary potentials and edge weights that are specified by Szeliski et al. (2008). We use two types of pairwise potentials: (i) truncated linear with the truncation set at 4; and (ii) truncated quadratic with the truncation set at 16. 8.2.2 METHODS We report results on all the baseline methods that were used in the synthetic experiments, namely, BP, TRW, EXP, and SWAP. Since the pairwise potentials are either truncated linear or truncated quadratic, we report results for the interval move-making algorithm INT, which uses the optimal value of the interval length. We also show the results obtained by the hierarchical move-making algorithm (HIER), where once again the hierarchies are obtained by approximating the semi-metric as a mixture of r-HST metrics. 8.2.3 RESULTS Figure 3-Figure 8 shows the results for various standard pairs of images. Note that, similar to the synthetic experiments, TRW is the most accurate in terms of energy, but it is computationally inefficient. The results obtained by BP are not accurate. The standard move-making algorithms, EXP and SWAP, are fast but not as accurate as TRW. Among the rounding-based move-making algorithms INT is slower as it solves a minimum st-cut problem on a large graph at each iteration. In contrast, HIER uses an interval length of 1 for each subproblem and is therefore more efficient. The energy obtained by HIER is comparable to TRW. 9. Discussion For any general distance function that can be used to specify the semi-metric labeling problem, we proved that the approximation factor of a large family of parallel rounding procedures is matched by the multiplicative bound of move-making algorithms. This generalizes previously known results on the guarantees of move-making algorithms in two ways: (i) in contrast to previous results (Kumar and Koller, 2009; Kumar and Torr, 2008; Veksler, 1999) that focused on special cases of distance functions, our results are applicable to arbitrary semi-metric distance functions; and (ii) the guarantees provided by our theorems are tight. Our experiments confirm that the rounding-based move-making algorithms provide similar accuracy to the LP relaxation, while being significantly faster due to the use of efficient minimum st-cut solvers. Several natural questions arise. What is the exact characterization of the rounding procedures for which it is possible to design matching move-making algorithms? Can we design rounding-based move-making algorithms for other combinatorial optimization problems? Answering these questions ROUNDING-BASED MOVES FOR SEMI-METRIC LABELING Image 1 Image 2 Ground Truth BP TRW SWAP Time=9.1s, Energy=686350 Time=55.8s, Energy=654128 Time=4.4s, Energy=668031 EXP INT HIER Time=3.3s, Energy=657005 Time=87.2s, Energy=656945 Time=34.6s, Energy=654557 Figure 3: Results for the tsukuba image pair with truncated linear pairwise potentials. KUMAR AND DOKANIA Image 1 Image 2 Ground Truth BP TRW SWAP Time=29.9s, Energy=1586856 Time=115.9s, Energy=1415343 Time=7.1s, Energy=1562459 EXP INT HIER Time=5.1s, Energy=1546777 Time=275.6s, Energy=1533114 Time=40.7s, Energy=1499134 Figure 4: Results for the tsukuba image pair with truncated quadratic pairwise potentials. ROUNDING-BASED MOVES FOR SEMI-METRIC LABELING Image 1 Image 2 Ground Truth BP TRW SWAP Time=16.4s, Energy=3003629 Time=105.2s, Energy=2943481 Time=7.7s, Energy=2954819 EXP INT HIER Time=11.5s, Energy=2953157 Time=273.1s, Energy=2959133 Time=105.7s, Energy=2946177 Figure 5: Results for the venus image pair with truncated linear pairwise potentials. KUMAR AND DOKANIA Image 1 Image 2 Ground Truth BP TRW SWAP Time=54.3s, Energy=4183829 Time=223.0s, Energy=3080619 Time=22.8s, Energy=3240891 EXP INT HIER Time=30.3s, Energy=3326685 Time=522.3s, Energy=3216829 Time=113s, Energy=3210882 Figure 6: Results for the venus image pair with truncated quadratic pairwise potentials. ROUNDING-BASED MOVES FOR SEMI-METRIC LABELING Image 1 Image 2 Ground Truth BP TRW SWAP Time=47.5s, Energy=1771965 Time=317.7s, Energy=1605057 Time=35.2s, Energy=1606891 EXP INT HIER Time=26.5s, Energy=1603057 Time=878.5s, Energy=1606558 Time=313.7s, Energy=1596279 Figure 7: Results for the teddy image pair with truncated linear pairwise potentials. KUMAR AND DOKANIA Image 1 Image 2 Ground Truth BP TRW SWAP Time=178.6s, Energy=4595612 Time=512.0s, Energy=1851648 Time=48.5s, Energy=1914655 EXP INT HIER Time=41.9s, Energy=1911774 Time=2108.6s, Energy=1890418 Time=363.2s, Energy=1873082 Figure 8: Results for the teddy image pair with truncated quadratic pairwise potentials. ROUNDING-BASED MOVES FOR SEMI-METRIC LABELING will not only expand our theoretical understanding, but also result in the development of efficient and accurate algorithms. Acknowledgments This work is partially funded by the European Research Council under the European Community s Seventh Framework Programme (FP7/2007-2013)/ERC Grant agreement number 259112. Appendix A. Proof of Theorem 1 We first establish the theoretical property of the complete move-making algorithm using the following lemma. Lemma 10 The tight multiplicative bound of the complete move-making algorithm is equal to the submodular distortion of the distance function. Proof The submodular distortion of a distance function d is obtained by computing its tightest submodular overestimation as follows: d = argmin d t (7) s.t. d (li, lj) td(li, lj), li, lj L, d (li, lj) d(li, lj), li, lj L, d (li, lj) + d (li+1, lj+1) d (li, lj+1) + d (li+1, lj), li, lj L\{lh}. In order to prove the theorem, it is important to note that the definition of submodular distance function implies the following: d(li, lj) + d(li , lj ) d(li, lj ) + d(li , lj), i > i, j > j. A simple proof for the above claim can be found in (Flach and Schlesinger, 2006). We denote the submodular distortion of d by B. By definition, it follows that d(li, lj) d(li, lj) Bd(li, lj), li, lj L. (8) We denote an optimal labeling of the original semi-metric labeling problem as x , that is, x = argmin x Ln Xa X θa(xa) + X (Xa,Xb) E wabd(xa, xb). (9) As the semi-metric labeling problem is NP-hard, an optimal labeling x cannot be computed efficiently using any known algorithm. In order to obtain an approximate solution ˆx, the complete move-making algorithm replaces the original distance function d by its submodular overestimation d, that is, ˆx = argmin x Ln Xa X θa(xa) + X (Xa,Xb) E wabd(xa, xb). (10) KUMAR AND DOKANIA Since the pairwise potentials in the above problem are submodular, the approximate solution ˆx can be obtained by solving a single minimum st-cut problem using the method of Flach and Schlesinger (2006). Using inequality (8), it follows that X Xa X θa(ˆxa) + X (Xa,Xb) E wabd(ˆxa, ˆxb) Xa X θa(ˆxa) + X (Xa,Xb) E wabd(ˆxa, ˆxb) Xa X θa(x a) + X (Xa,Xb) E wabd(x a, x b) Xa X θa(x a) + B X (Xa,Xb) E wabd(x a, x b). The above inequality proves that the multiplicative bound of the complete move-making algorithm is at most B. In order to prove that it is exactly equal to B, we need to construct an example for which the bound is tight. To this end, let lk and lk be two labels in the set L such that k < k and d(lk, lk ) d(lk, lk ) = B. Since B is the minimum possible value of the maximum ratio of the estimated distance d to the original distance d, such a pair of labels must exist (otherwise, the submodular distortion can be reduced further). Let us assume that there exists an lj L such that j < k. Other cases (where j > k or k < j < k ) can be handled similarly. Note that since d is submodular, it follows that d(lk, lj) + d(lj, lk ) d(lk, lk ). (11) We define a semi-metric labeling problem over two random variables Xa and Xb connected by an edge with weight wab = 1. The unary potentials are defined as follows: 0, if i = k, d(lk,lk )+d(lk,lj) d(lj,lk ) 2 , if i = j, otherwise, 0, if i = k , d(lk,lk ) d(lk,lj)+d(lj,lk ) 2 , if i = j, otherwise. For the above semi-metric labeling problem, it can be verified that an optimal solution x of problem (9) is the following: x a = lk and x b = lk . Furthermore, using inequality (11), it can be shown that the following is an optimal solution of problem (10): ˆxa = lj and ˆxb = lj. In other words, ˆx is a valid approximate labeling provided by the complete move-making algorithm. The labelings x and ˆx satisfy the following equality: X Xa X θa(ˆxa) + X (Xa,Xb) E wabd(ˆxa, ˆxb) = X Xa X θa(x a) + B X (Xa,Xb) E wabd(x a, x b). ROUNDING-BASED MOVES FOR SEMI-METRIC LABELING Therefore, the tight multiplicative bound of the complete move-making algorithm is exactly equal to the submodular distortion of the distance function d. We now turn our attention to the complete rounding procedure for the LP relaxation. Before we can establish its tight approximation factor, we need to compute the expected distance between the labels assigned to a pair of neighboring random variables. Recall that, in our notation, we denote a feasible solution of the LP relaxation by y. For any feasible solution y, we define ya as the vector whose elements are the unary variables of y for the random variable Xa X, that is, ya = [ya(i), li L]. (12) Similarly, we define yab as the vector whose elements are the pairwise variables of y for the neighboring random variables (Xa, Xb) E, that is, yab = [yab(i, j), li, lj L]. (13) Furthermore, using ya, we define Ya as In other words, if ya is interpreted as the probability distribution over the labels of Xa, then Ya is the corresponding cumulative distribution. Given a feasible solution y, we denote the integer solution obtained using the complete rounding procedure as ˆy. The distance between the two labels encoded by vectors ˆya and ˆyb will be denoted by ˆd(ˆya, ˆyb). In other words, if ˆfa and ˆfb are the indices of the labels assigned to Xa and Xb (that is, ˆya( ˆfa) = 1 and ˆyb( ˆfb) = 1), then ˆd(ˆya, ˆyb) = d(l ˆfa, l ˆfb). The following shorthand notation would be useful for our analysis. 2 (d(li, l1) + d(li, lh) d(li+1, l1) d(li+1, lh)) , i {1, , h 1}, (14) D2(i, j) = 1 2 (d(li, lj+1) + d(li+1, lj) d(li, lj) d(li+1, lj+1)) , i, j {1, , h 1}. Using the above notation, we can state the following lemma on the expected distance of the rounded solution for two neighboring random variables. Lemma 11 Let y be a feasible solution of the LP relaxation, Ya and Yb be cumulative distributions of ya and yb, and ˆy be the integer solution obtained by the complete rounding procedure for y. Then, the following equation holds true: E( ˆd(ˆya, ˆyb)) = i=1 Ya(i)D1(i) + j=1 Yb(j)D1(j) + j=1 |Ya(i) Yb(j)|D2(i, j). KUMAR AND DOKANIA Proof We define ˆfa and ˆfb to be the indices of the labels assigned to Xa and Xb by the rounded integer solution ˆy. In other words, ˆya(i) = 1 if and only if i = ˆfa and ˆyb(j) = 1 if and only if j = ˆfb. We define binary variables za(i) and zb(j) as follows: za(i) = 1 if i ˆfa, 0 otherwise, zb(j) = 1 if j ˆfb, 0 otherwise. For complete rounding, it can be verified that E(za(i)) = Ya(i). (15) Furthermore, we also define binary variables zab(i, j) that indicate whether i and j are contained within the interval defined by ˆfa and ˆfb. Formally, zab(i, j) = 1 if min{i, j} min{ ˆfa, ˆfb} and max{i, j} < max{ ˆfa, ˆfb}, 0 otherwise. For complete rounding, it can be verified that E(zab(i, j)) = |Ya(i) Yb(j)|. (16) Using the result of Flach and Schlesinger (2006), we know that ˆd(ˆya, ˆyb) = i=1 za(i)D1(i) + j=1 zb(j)D1(j) + j=1 zab(i, j)D2(i, j). The proof of the lemma follows by taking the expectation of the LHS and the RHS of the above equation and simplifying the RHS using the linearity of expectation and equations (15) and (16). In order to state the next lemma, we require the definition of uncrossing pairwise variables. Given unary variables ya and yb, the pairwise variable vector y ab is called uncrossing with respect to ya and yb if it satisfies the following properties: j=1 y ab(i, j) = ya(i), i {1, 2, , h}, i=1 y ab(i, j) = yb(i), j {1, 2, , h}, y ab(i, j) 0, i, j {1, 2, , h}, min{y ab(i, j ), y ab(i , j)} = 0, i, j, i , j {1, 2, , h}, i < i , j < j . (17) The following lemma establishes a connection between the expected distance between the labels assigned by complete rounding and the pairwise cost specified by uncrossing pairwise variables. Lemma 12 Let y be a feasible solution of the LP relaxation, and ˆy be the integer solution obtained by the complete rounding procedure for y. Furthermore, let y ab be uncrossing pairwise variables with respect to ya and yb. Then, the following equation holds true: E( ˆd(ˆya, ˆyb)) = j=1 d(li, lj)y ab(i, j). ROUNDING-BASED MOVES FOR SEMI-METRIC LABELING Proof We define Ya and Yb to be the cumulative distributions corresponding to ya and yb respectively. We claim that the uncrossing property (17) implies the following condition: j=1 y ab(i, j) = min{Ya(i ), Yb(j )}, i , j {1, , h}. (18) To prove this claim, assume that Ya(i ) < Yb(j ). The other cases can be handled similarly. Since y ab satisfies the constraints of the LP relaxation, it follows that: j=1 y ab(i, j) = Yb(j ), j=1 y ab(i, j) = Ya(i ). (19) Since the LHS of equality (18) is less than or equal to the LHS of both the above equations, it follows that i X j=1 y ab(i, j) min{Ya(i ), Yb(j )}. (20) Therefore, there must exist a k > i and k j such that y ab(k, k ) = 0. Otherwise, the LHS in the above inequality will be exactly equal to Yb(j ), which would result in a contradiction. By the uncrossing property (17), we know that min{y ab(i, j), y ab(k, k )} = 0 if i i and j > j . Therefore, y ab(i, j) = 0 for all i i and j > j , which proves the claim. Combining equations (19) and (20), we get the following: j=j +1 y ab(i, j) + j=1 y ab(i, j) = |Ya(i) Yb(j)|, i , j {1, , h}. By solving for y ab using the above equations, we get j=1 d(li, lj)y ab(i, j) = i=1 Ya(i)D1(i) + j=1 Yb(j)D1(j) + j=1 |Ya(i) Yb(j)|D2(i, j). Using the previous lemma, this proves that E( ˆd(ˆya, ˆyb)) = j=1 d(li, lj)y ab(i, j). Our next lemma establishes that uncrossing pairwise variables are optimal for submodular distance functions. KUMAR AND DOKANIA Lemma 13 Let y ab be the uncrossing pairwise variables with respect to the unary variables ya and yb. Let d : L L R+ be a submodular distance function. Then the following condition holds true: y ab = argmin d(i, j)yab(i, j), (21) yab(i, j) = ya(i), i {1, , h}, yab(i, j) = yb(j), j {1, , h}, yab(i, j) 0, i, j {1, , h}. Proof We prove the lemma by contradiction. Suppose that the optimal solution to the above problem is y ab, which is not uncrossing. Let min{y ab(i, j ), y ab(i , j)} = λ = 0, where i < i and j < j . Since d is submodular, it implies that d(li, lj) + d(li , lj ) d(li, lj ) + d(li , lj). Therefore the objective function of problem (21) can be reduced further by the following modification: y ab(i, j ) y ab(i, j ) λ, y ab(i , j) y ab(i , j) λ, y ab(i, j) y ab(i, j) + λ, y ab(i , j ) y ab(i , j ) + λ. The resulting contradiction proves our claim that the uncrossing pairwise variables y ab are an optimal solution of problem (21). Using the above lemmas, we will now obtain the tight approximation factor of the complete rounding procedure. Lemma 14 The tight approximation factor of the complete rounding procedure is equal to the submodular distortion of the distance function. Proof We denote a feasible fractional solution of the LP relaxation by y and the rounded solution by ˆy. Consider a pair of neighboring random variables (Xa, Xb) X. We define uncrossing pairwise variables y ab with respect to ya and yb. Using Lemmas 12 and 13, the approximation factor of the ROUNDING-BASED MOVES FOR SEMI-METRIC LABELING complete rounding procedure can be shown to be at most B as follows: E( ˆd(ˆya, ˆyb)) = j=1 d(li, lj)y ab(i, j) d(li, lj)y ab(i, j) d(li, lj)yab(i, j) j=1 d(li, lj)yab(i, j). In order to prove that the approximation factor of the complete rounding is exactly B, we need an example where the above inequality holds as an equality. The key to obtaining a tight example lies in the Lagrangian dual of problem (7). In order to specify its dual, we need three types of dual variables. The first type, denoted by α(i, j), corresponds to the constraint d (li, lj) td(li, lj). The second type, denoted by β(i, j), corresponds to the constraint d (li, lj) d(li, lj). The third type, denoted by γ(i, j), corresponds to the constraint d (li, lj) + d (li+1, lj+1) d (li, lj+1) + d (li+1, lj). Using the above variables, the dual of problem (7) is given by j=1 d(li, lj)β(i, j) (22) j=1 d(li, lj)α(i, j) = 1, β(i, j) = α(i, j) γ(i, j 1) γ(i 1, j) + γ(i 1, j 1) + γ(i, j), i, j {1, , h}, α(i, j) 0, β(i, j) 0, γ(i, j) 0, i, j {1, , h}. We claim that the above dual problem has an optimal solution (α , β , γ ) that satisfies the following property: min{β (i, j ), β (i , j)} = 0, i, i , j, j {1, , h}, i < i , j < j . (23) We refer to the optimal dual solution β that satisfies the above property as uncrossing dual variables as it is analogous to uncrossing pairwise variables. The above claim, namely, the existence of an uncrossing optimal dual solution, can be proved by contradiction as follows. Suppose there exists no KUMAR AND DOKANIA optimal solution that satisfies the above property. Then consider the following problem, which is the dual of the problem of finding the tightest submodular overestimate of the submodular function d: d(li, lj)β(i, j) (24) d(li, lj)α(i, j) = 1, β(i, j) = α(i, j) γ(i, j 1) γ(i 1, j) + γ(i 1, j 1) + γ(i, j), i, j {1, , h}, α(i, j) 0, β(i, j) 0, γ(i, j) 0, i, j {1, , h}. By strong duality, problem (24) has an optimal value of 1. However, the optimal solution of problem (22), which is also a feasible solution for problem (24), provides a value strictly greater than 1. This results in a contradiction that proves our claim. The optimal dual variables that satisfy property (23) allow us to construct an example that proves that the approximation factor B of the complete rounding procedure is tight. Specifically, we define yab(i, j) = α (i, j) Ph i =1 Ph j =1 α (i , j ) , i, j {1, , h}, yab(i, j), i {1, , h}, yab(i, j), j {1, , h}. Note that the pairwise variables yab must minimize the pairwise potential corresponding to the unary variables ya and yb, that is, yab = argmin yab li,lj L d(li, lj)yab(i, j) (25) lj L yab(i, j) = ya(i), li L li L yab(i, j) = yb(j), lj L yab(i, j) 0, li, lj L. If the above statement was not true, then the value of the dual problem (22) could be increased further. ROUNDING-BASED MOVES FOR SEMI-METRIC LABELING We also define the following pairwise variables: y ab(i, j) = β (i, j) Ph i =1 Ph j =1 β (i , j ) , i, j {1, , h}, j=1 y ab(i, j), i {1, , h}, i=1 y ab(i, j), j {1, , h}. It can be verified that y a(i) = ya(i), y b(j) = yb(j), i, j {1, , h}. The above condition follows from the constraints of problem (22). Due to the uncrossing property of β , the pairwise variables y ab are uncrossing with respect to ya and yb. By Lemma 12, this implies that E( ˆd(ˆya, ˆyb)) = j=1 d(li, lj)y ab(i, j), where ˆya and ˆyb are integer solutions obtained by the complete rounding procedure. By strong duality, it follows that E( ˆd(ˆya, ˆyb)) = B j=1 d(li, lj)yab(i, j). (26) The existence of an example that satisfies properties (25) and (26) implies that the tight approximation factor of the complete rounding procedure is B. Lemmas 10 and 14 together prove Theorem 1. Appendix B. Proof of Theorem 3 We begin by establishing the theoretical properties of the interval-move making algorithm. Recall that, given an interval of labels I = {ls, , le} of at most q consecutive labels and a labeling ˆx, we define Ia = I {ˆxa} for all random variables Xa X. In order to use the interval move-making algorithm, we compute a submodular distance function dˆxa,ˆxb : Ia Ib R+ for all pairs of neighboring random variables (Xa, Xb) E as follows: dˆxa,ˆxb = argmin d t (27) s.t. d (li, lj) td(li, lj), li Ia, lj Ib, d (li, lj) d(li, lj), li Ia, lj Ib, d (li, lj) + d (li+1, lj+1) d (li, lj+1) + d (li+1, lj), li, lj I\{le}, d (li, le) + d (li+1, ˆxb) d (li, ˆxb) + d (li+1, le), li I\{le}, d (le, lj) + d (ˆxa, lj+1) d (le, lj+1) + d (ˆxa, lj), lj I\{le}, d (le, le) + d(ˆxa, ˆxb) d (le, ˆxb) + d (ˆxa, le). KUMAR AND DOKANIA For any interval I and labeling x, we define the following sets: V(x, I) = {Xa|Xa X, xa I}, A(x, I) = {(Xa, Xb)|(Xa, Xb) E, xa I, xb I}, B1(x, I) = {(Xa, Xb)|(Xa, Xb) E, xa I, xb / I}, B2(x, I) = {(Xa, Xb)|(Xa, Xb) E, xa / I, xb I}, B(x, I) = B1(x) B2(x). In other words, V(x, I) is the set of all random variables whose label belongs to the interval I. Similarly, A(x, I) is the set of all neighboring random variables such that the labels assigned to both the random variables belong to the interval I. The set B(x, I) contains the set of all neighboring random variables such that only one of the two labels assigned to the two random variables belongs to the interval I. Given the set of all intervals I and a labeling ˆx, we define the following for all xa, xb L: D(xa, xb; ˆxa, ˆxb) = X I I,A(x,I) (Xa,Xb) dˆxa,ˆxb(xa, xb) I I,B1(x,I) (Xa,Xb) dˆxa,ˆxb(xa, ˆxb) I I,B2(x,I) (Xa,Xb) dˆxa,ˆxb(ˆxa, xb). Using the above notation, we are ready to state the following lemma on the theoretical guarantee of the interval move-making algorithm. Lemma 15 The tight multiplicative bound of the interval move-making algorithm is equal to 1 q max xa,xb,ˆxa,ˆxb L,xa =xb D(xa, xb; ˆxa, ˆxb) d(xa, xb) . Proof We denote an optimal labeling by x and the estimated labeling by ˆx. Let t [1, q] be a uniformly distributed random integer. Using t, we define the following set of non-overlapping intervals: It = {[1, t], [t + 1, t + q], , [., h]}. For each interval I It, we define a labeling x I as follows: x I a = x a if x a I, ˆxa otherwise. Since ˆx is the labeling obtained after the interval move-making algorithm converges, it follows that Xa X θa(ˆxa) + X (Xa,Xb) E wabdˆxa,ˆxb(ˆxa, ˆxb) X Xa X θa(x I a) + X (Xa,Xb) E wabdˆxa,ˆxb(x I a, x I b). ROUNDING-BASED MOVES FOR SEMI-METRIC LABELING By canceling out the common terms, we can simplify the above inequality as X Xa V(x ,I) θa(ˆxa) (Xa,Xb) A(x ,I) wabdˆxa,ˆxb(ˆxa, ˆxb) (Xa,Xb) B1(x ,I) wabdˆxa,ˆxb(ˆxa, ˆxb) (Xa,Xb) B2(x ,I) wabdˆxa,ˆxb(ˆxa, ˆxb) Xa V(x ,I) θa(x a) (Xa,Xb) A(x ,I) wabdˆxa,ˆxb(x a, x b) (Xa,Xb) B1(x ,I) wabdˆxa,ˆxb(x a, ˆxb) (Xa,Xb) B2(x ,I) wabdˆxa,ˆxb(ˆxa, x b). We now sum the above inequality over all the intervals I It. Note that the resulting LHS is at least equal to the energy of the labeling ˆx when the distance function between the random variables (Xa, Xb) is dˆxa,ˆxb. This implies that X Xa X θa(ˆxa) + X (Xa,Xb) E wabdˆxa,ˆxb(ˆxa, ˆxb) Xa X θa(x a) (Xa,Xb) A(x ,I) wabdˆxa,ˆxb(x a, x b) (Xa,Xb) B1(x ,I) wabdˆxa,ˆxb(x a, ˆxb) (Xa,Xb) B2(x ,I) wabdˆxa,ˆxb(ˆxa, x b). Taking the expectation on both sides of the above inequality with respect to the uniformly distributed random integer t [1, q] proves that the multiplicative bound of the interval move-making algorithm is at most equal to 1 q max xa,xb,ˆxa,ˆxb L,xa =xb D(xa, xb; ˆxa, ˆxb) d(xa, xb) . A tight example with two random variables Xa and Xb with wab = 1 can be constructed similar to the one shown in Lemma 10. KUMAR AND DOKANIA We now turn our attention to the interval rounding procedure. Let y be a feasible solution of the LP relaxation, and ˆy be the integer solution obtained using interval rounding. Once again, we define the unary variable vector ya and the pairwise variable vector yab as specified in equations (12) and (13) respectively. Similar to the previous appendix, we denote the expected distance between ˆya and ˆyb as ˆd(ˆya, ˆyb). Given an interval I = {ls, , le} of at most q consecutive labels, we define a vector YI a for each random variable Xa as follows: Y I a (i) = j=s ya(j), i {1, , e s + 1}. In other words, YI a is the cumulative distribution of ya within the interval I. Furthermore, for each pair of neighboring random variables (Xa, Xb) E we define ZI ab = max{Y I a (e s + 1), Y I b (e s + 1)} min{Y I a (e s + 1), Y I b (e s + 1)}, ZI a(i) = min{Y I a (i), Y I b (e s + 1)}, i {1, , e s + 1}, ZI b(j) = min{Y I b (j), Y I a (e s + 1)}, j {1, , e s + 1}. The following shorthand notation would be useful to concisely specify the exact form of ˆd(ˆya, ˆyb). DI 0 = max li,lj,|{li,lj} I|=1 d(li, lj), DI 1(i) = 1 2 (d(ls+i 1, ls) + d(ls+i 1, le) d(ls+i, ls) d(ls+i, le)) , i {1, , e s}, DI 2(i, j) = 1 2 (d(ls+i 1, ls+j) + d(ls+i, ls+j 1) d(ls+i 1, ls+j 1) d(ls+i, ls+j)) , i, j {1, , e s}. In other words, DI 0 is the maximum distance between two labels such that only one of the two labels lies in the interval I. The terms DI 1 and DI 2 are analogous to the terms defined in equation (14). Using the above notation, we can state the following lemma on the expected distance of the rounded solution for two neighboring random variables. Lemma 16 Let y be a feasible solution of the LP relaxation, and ˆy be the integer solution obtained by the interval rounding procedure for y using the set of intervals I. Then, the following inequality holds true: E( ˆd(ˆya, ˆyb)) 1 q I={ls, ,le} I ZI ab DI 0 + i=1 ZI a(i)DI 1(i) + j=1 ZI b(j)DI 1(j)+ j=1 |ZI a(i) ZI b(j)|DI 2(i, j) Proof We begin the proof by establishing the probability of a random variable Xa being assigned a label in an iteration of the interval rounding procedure. The total number of intervals in the set ROUNDING-BASED MOVES FOR SEMI-METRIC LABELING I is h + q 1. Out of all the intervals, each label li is present in q intervals. Thus, the probability of choosing an interval that contains the label li is q/(h + q 1). Once an interval containing li is chosen, the probability of assigning the label li to Xa is ya(i). Thus, the probability of assigning a label li to Xa is ya(i)q/(h + q 1). Summing over all labels li, we observe that the probability of assigning a label to Xa in an iteration of the interval rounding procedure is q/(h + q 1). Now we consider two neighboring random variables (Xa, Xb) E. In the current iteration of interval rounding, given an interval I, the probability of exactly one of the two random variables getting assigned a label in I is exactly equal to ZI ab. In this case, we have to assume that the expected distance between the two variables will be the maximum possible, that is, DI 0. If both the variables are assigned a label in I, then a slight modification of Lemma 11 gives us the expected distance as i=1 ZI a(i)DI 1(i) + j=1 ZI b(j)DI 1(j) + j=1 |ZI a(i) ZI b(j)|DI 2(i, j). Thus, the expected distance between the labels of Xa and Xb conditioned on at least one of the two random variables getting assigned in an iteration is equal to 1 (h + q 1) I={ls, ,le} I ZI ab DI 0 + i=1 ZI a(i)DI 1(i) + j=1 ZI b(j)DI 1(j)+ j=1 |ZI a(i) ZI b(j)|DI 2(i, j) where the term 1/(h + q 1) corresponds to the probability of choosing an interval from the set I. In order to compute ˆd(ˆya, ˆyb), we need to divide the above term with the probability of at least one of the two random variables getting assigned a label in the current iteration. Since the two cumulative distributions YI a and YI b can be arbitrarily close to each other without being exactly equal, we can only lower bound the probability of at least one of Xa and Xb being assigned a label at the current iteration by the probability of Xa being assigned a label in the current iteration. Since the probability of Xa being assigned a label is q/(h + q 1), it follows that E( ˆd(ˆya, ˆyb)) 1 q I={ls, ,le} I ZI ab DI 0 + i=1 ZI a(i)DI 1(i) + j=1 ZI b(j)DI 1(j)+ j=1 |ZI a(i) ZI b(j)|DI 2(i, j) Using the above lemma, we can now establish the theoretical property of the interval rounding procedure. Lemma 17 The tight approximation factor of the interval rounding procedure is equal to 1 q max xa,xb,ˆxa,ˆxb L,xa =xb D(xa, xb; ˆxa, ˆxb) d(xa, xb) . KUMAR AND DOKANIA Proof Similar to Lemma 14, the key to proving the above lemma lies in the dual of problem (27). Given the current labels ˆxa and ˆxb of the random variables Xa and Xb respectively, as well as an interval I = {ls, , le}, problem (27) computes the tightest submodular overestimation dˆxa,ˆxb : Ia Ib R+ where Ia = I {ˆxa} and Ib = I {ˆxb}. Similar to problem (22), the dual of problem (27) consists of three types of dual variables. The first type, denoted by α(i, j; ˆxa, ˆxb, I) corresponds to the constraint d (li, lj) td(li, lj). The second type, denoted by β(i, j; ˆxb, ˆxb, I), corresponds to the constraint d (li, lj) d(li, lj). The third type, denoted by γ(i, j; ˆxa, ˆxb, I), corresponds to the constraint d (li, lj) + d (li+1, lj+1) d (li, lj+1) + d (li+1, lj). Let (α (ˆxa, ˆxb, I), β (ˆxa, ˆxb, I), γ (ˆxa, ˆxb, I)) denote an optimal solution of the dual. A modification of Lemma 14 can be used to show that there must exist an optimal solution such that β (ˆxa, ˆxb, I) is uncrossing. We consider the values of ˆxa and ˆxb for which we obtain the tight multiplicative bound of the interval move-making algorithm. For these values of ˆxa and ˆxb, we define yab(i, j; ˆxa, ˆxb) = P I I α (i, j; ˆxa, ˆxb, I) P i ,j P I I α (i , j ; ˆxa, ˆxb, I), i, j {1, , h}, ya(i; ˆxa, ˆxb) = yab(i, j; ˆxa, ˆxb), i {1, , h}, yb(j; ˆxa, ˆxb) = yab(i, j; ˆxa, ˆxb), j {1, , h}, y ab(i, j; ˆxa, ˆxb) = P I I β (i, j; ˆxa, ˆxb, I) P i ,j P I I β (i , j ; ˆxa, ˆxb, I), i, j {1, , h}, y a(i; ˆxa, ˆxb) = j=1 y ab(i, j; ˆxa, ˆxb), i {1, , h}, y b(j; ˆxa, ˆxb) = i=1 y ab(i, j; ˆxa, ˆxb), j {1, , h}. The constraints of the dual of problem (27) ensure that ya(i; ˆxa, ˆxb) = y a(i; ˆxa, ˆxb), i {1, , h}, yb(j; ˆxa, ˆxb) = y b(j; ˆxa, ˆxb), j {1, , h}. ROUNDING-BASED MOVES FOR SEMI-METRIC LABELING Given the distributions ya(ˆxa, ˆxb) and yb(ˆxa, ˆxb), we claim that the pairwise variables yab(ˆxa, ˆxb) must minimize the pairwise potentials corresponding to these distributions, that is, yab(ˆxa, ˆxb) = argmin yab li,lj L d(li, lj)yab(i, j) lj L yab(i, j) = ya(i; ˆxa, ˆxb), li L li L yab(i, j) = yb(j; ˆxa, ˆxb), lj L yab(i, j) 0, li, lj L. If the above claim was not true, then we could further increase the value of at least one of the duals of problem (27) corresponding to some interval I. Furthermore, since y ab(ˆxa, ˆxb) is constructed using β (ˆxa, ˆxb, I), which is uncrossing, a modification of Lemma 12 can be used to show that ˆd(ˆya(ˆxa, ˆxb), ˆyb(ˆxa, ˆxb)) = X li,lj L d(li, lj)y ab(i, j; ˆxa, ˆxb). Here ˆya(ˆxa, ˆxb) and ˆyb(ˆxa, ˆxb) are the rounded solutions obtained by the interval rounding procedure applied to ya(ˆxa, ˆxb) and yb(ˆxa, ˆxb). By duality, it follows that P li,lj L d(li, lj)y ab(i, j; ˆxa, ˆxb) P li,lj L d(li, lj)yab(i, j; ˆxa, ˆxb) 1 q max xa,xb,ˆxa,ˆxb L,xa =xb D(xa, xb; ˆxa, ˆxb) d(xa, xb) . Strong duality implies that there must exist a set of variables for which the above inequality holds as an equality. This proves the lemma. Lemmas 15 and 17 together prove Theorem 3. Appendix C. Proof of Theorem 8 Proof The proof of the above theorem proceeds in a similar fashion to the previous two theorems, namely, by computing the dual of the problems that are used to obtain the tightest submodular overestimation of the given distance function. In what follows, we provide a brief sketch of the proof and omit the details. We start with the hierarchical move-making algorithm. Recall that this algorithm computes a labeling xi,j for each cluster C(i, j) L, where C(i, j) denotes the j-th cluster at the i-th level. In order to obtain a labeling xi,j it makes use of the labelings computed at the (i + 1)-th level. Specifically, let Li,j a = {xi+1,j a , p(i + 1, j ) = j, j {1, , hi+1}}. In other words, Li,j a is the set of labels that were assigned to the random variable Xa by the labelings corresponding to the children of the current cluster C(i, j). In order to compute the labeling xi,j, the algorithm has to choose one label from the set Li,j a . This is achieved either using the complete move-making algorithm or the interval move-making algorithm. The algorithm is initialized by define xm,j a = k for all Xa X where k is the unique label in the cluster C(m, j) The algorithm terminates by computing x1,1, which is the final labeling. KUMAR AND DOKANIA Let the multiplicative bound corresponding to the computation of xi,j be denoted by Bi,j. From the arguments of Theorems 1 and 3, it follows that there must exist dual variables (α (i, j), β (i, j), γ (i, j)) that provide an approximation factor that is exactly equal to Bi,j and β (i, j) is uncrossing. For each cluster C(i, j), we define variables δa(k; i, j) for k Li,j a as follows: δa(k; i, j) = k Li,j b α (k, k ; i, j) P l Li,j b α (l, l ; i, j). Using the above variables we define unary variables ya(k; i, j) for each k Li,j as follows: ya(k; 1, 1) = δa(k; 1, 1), ya(k; i, j) = ya(xi 1,p(i,j) a ; i 1, p(i, j))δa(k; i, j). Similarly, we define variables δb(k ; i, j) for each k Li,j b as follows: δb(k ; i, j) = k Li,j a α (k, k ; i, j) P l Li,j b α (l, l ; i, j). Using the above variables, we define unary variables yb(k ; i, j) for each k Li,j b as follows: yb(k ; 1, 1) = δb(k ; 1, 1), yb(k ; i, j) = ya(xi 1,p(i,j) a ; i 1, p(i, j))δb(k ; i, j). Since each cluster in the m-th level contains a single unique label, it follows that ya = [ya(xm,j a ; m, j), j = 1, , h], yb = [yb(xm,j b ; m, j), j = 1, , h], are valid unary variables for the LP relaxation. It can be shown that applying the hierarchical rounding procedure on the variables ya and yb provides an approximation factor that is exactly equal to the multiplicative bound of the hierarchical move-making algorithm. T. Ajanthan, R. Hartley, M. Salzmann, and H. Li. Iteratively reweighted graph cut for multi-label MRFs with non-convex priors. Co RR, 2014. M. Andrew, M. Hajiaghayi, H. Karloff, and A. Moitra. Capacitated metric labeling. In SODA, 2011. A. Archer, J. Fakcharoenphol, C. Harrelson, R. Krauthgamer, K. Talvar, and E. Tardos. Approximate classification via earthmover metrics. In SODA, 2004. Y. Boykov and V. Kolmogorov. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. PAMI, 2004. Y. Boykov, O. Veksler, and R. Zabih. Markov random fields with efficient approximations. In CVPR, 1998. ROUNDING-BASED MOVES FOR SEMI-METRIC LABELING Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. In ICCV, 1999. C. Chekuri, S. Khanna, J. Naor, and L. Zosin. Approximation algorithms for the metric labeling problem via a new linear programming formulation. In SODA, 2001. P. Dokania and M. P. Kumar. Parsimonious labeling. In ICCV, 2015. J. Fakcharoenphol, S. Rao, and K. Talwar. A tight bound on approximating arbitrary metrics by tree metrics. In STOC, 2003. B. Flach and D. Schlesinger. Transforming an arbitrary minsum problem into a binary one. Technical report, TU Dresden, 2006. A. Globerson and T. Jaakkola. Fixing max-product: Convergent message passing algorithms for MAP LP-relaxations. In NIPS, 2007. A. Gupta and E. Tardos. A constant factor approximation algorithm for a class of classification problems. In STOC, 2000. T. Hazan and A. Shashua. Convergent message-passing algorithms for inference over general graphs with convex free energy. In UAI, 2008. J. Kappes, B. Andres, F. Hamprecht, C. Schnoerr, S. Nowozin, D. Batra, S. Kim, B. Kausler, T. Kroeger, J. Lellmann, N. Komodakis, B. Savchynskyy, and C. Rother. A comparative study of modern inference techniques for structured discrete energy minimization problems. IJCV, 2015. J. Kleinberg and E. Tardos. Approximation algorithms for classification problems with pairwise relationships: Metric labeling and Markov random fields. In STOC, 1999. V. Kolmogorov. Convergent tree-reweighted message passing for energy minimization. PAMI, 2006. N. Komodakis, N. Paragios, and G. Tziritas. MRF optimization via dual decomposition: Messagepassing revisited. In ICCV, 2007. N. Komodakis, G. Tziritas, and N. Paragios. Performance vs computational efficiency for optimizing single and dynamic MRFs: Setting the state of the art with primal dual strategies. CVIU, 2008. A. Koster, C. van Hoesel, and A. Kolen. The partial constraint satisfaction problem: Facets and lifting theorems. Operations Research Letters, 1998. M. P. Kumar. Rounding-based moves for metric labeling. In NIPS, 2014. M. P. Kumar and D. Koller. MAP estimation of semi-metric MRFs via hierarchical graph cuts. In UAI, 2009. M. P. Kumar and P. Torr. Improved moves for truncated convex models. In NIPS, 2008. R. Manokaran, J. Naor, P. Raghavendra, and R. Schwartz. SDP gaps and UGC hardness for multiway cut, 0-extension and metric labeling. In STOC, 2008. P. Pansari and M. P. Kumar. Truncated max-of-convex models. Co RR, 2015. KUMAR AND DOKANIA J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kauffman, 1998. H. Racke. Optimal hierarchical decompositions for congestion minimization in networks. In STOC, 2008. P. Ravikumar, A. Agarwal, and M. Wainwright. Message-passing for graph-structured linear programs: Proximal projections, convergence and rounding schemes. In ICML, 2008. M. Schlesinger. Syntactic analysis of two-dimensional visual signals in noisy conditions. Kibernetika, 1976. R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M. Tappen, and C. Rother. A comparative study of energy minimization methods for Markov random fields with smoothness-based priors. PAMI, 2008. D. Tarlow, D. Batra, P. Kohli, and V. Kolmogorov. Dynamic tree block coordinate ascent. In ICML, 2011. O. Veksler. Efficient graph-based energy minimization methods in computer vision. Ph D thesis, Cornell University, 1999. O. Veksler. Graph cut based optimization for MRFs with truncated convex priors. In CVPR, 2007. M. Wainwright, T. Jaakkola, and A. Willsky. MAP estimation via agreement on trees: Message passing and linear programming. Transactions on Information Theory, 2005. C. Wang, N. Komodakis, and N. Paragios. Markov random field modeling, inference and learning in computer vision and image understanding: A survey. CVIU, 2013. Y. Weiss, C. Yanover, and T. Meltzer. MAP estimation, linear programming and belief propagation with convex free energies. In UAI, 2007. T. Werner. A linear programming approach to max-sum problem: A review. PAMI, 2007. T. Werner. Revisting the linear programming relaxation approach to Gibbs energy minimization and weighted constraint satisfaction. PAMI, 2010.