# roundingbased_moves_for_metric_labeling__712f4402.pdf

Rounding-based Moves for Metric Labeling

M. Pawan Kumar Ecole Centrale Paris & INRIA Saclay pawan.kumar@ecp.fr

Metric labeling is a special case of energy minimization for pairwise Markov random ﬁelds. The energy function consists of arbitrary unary potentials, and pairwise potentials that are proportional to a given metric distance function over the label set. Popular methods for solving metric labeling include (i) move-making algorithms, which iteratively solve a minimum st-cut problem; and (ii) the linear programming (LP) relaxation based approach. In order to convert the fractional solution of the LP relaxation to an integer solution, several randomized rounding procedures have been developed in the literature. We consider a large class of parallel rounding procedures, and design move-making algorithms that closely mimic them. We prove that the multiplicative bound of a move-making algorithm exactly matches the approximation factor of the corresponding rounding procedure for any arbitrary distance function. Our analysis includes all known results for move-making algorithms as special cases.

1 Introduction

A Markov random ﬁeld (MRF) is a graph whose vertices are random variables, and whose edges specify a neighborhood over the random variables. Each random variable can be assigned a value from a set of labels, resulting in a labeling of the MRF. The putative labelings of an MRF are quantitatively distinguished from each other by an energy function, which is the sum of potential functions that depend on the cliques of the graph. An important optimization problem associate with the MRF framework is energy minimization, that is, ﬁnding a labeling with the minimum energy.

Metric labeling is a special case of energy minimization, which models several useful low-level vision tasks [3, 4, 18]. It is characterized by a ﬁnite, discrete label set and a metric distance function over the labels. The energy function in metric labeling consists of arbitrary unary potentials and pairwise potentials that are proportional to the distance between the labels assigned to them. The problem is known to be NP-hard [20]. Two popular approaches for metric labeling are: (i) movemaking algorithms [4, 8, 14, 15, 21], which iteratively improve the labeling by solving a minimum st-cut problem; and (ii) linear programming (LP) relaxation [5, 13, 17, 22], which is obtained by dropping the integral constraints in the corresponding integer programming formulation. Movemaking algorithms are very efﬁcient due to the availability of fast minimum st-cut solvers [2] and are very popular in the computer vision community. In contrast, the LP relaxation is signiﬁcantly slower, despite the development of specialized solvers [7, 9, 11, 12, 16, 19, 22, 23, 24, 25]. However, when used in conjunction with randomized rounding algorithms, the LP relaxation provides the best known polynomial-time theoretical guarantees for metric labeling [1, 5, 10].

At ﬁrst sight, the difference between move-making algorithms and the LP relaxation appears to be the standard accuracy vs. speed trade-off. However, for some special cases of distance functions, it has been shown that appropriately designed move-making algorithms can match the theoretical guarantees of the LP relaxation [14, 15, 20]. In this paper, we extend this result for a large class of randomized rounding procedures, which we call parallel rounding. In particular we prove that for any arbitrary (semi-)metric distance function, there exist move-making algorithms that match the theoretical guarantees provided by parallel rounding. The proofs, the various corollaries of our

theorems (which cover all previously known guarantees) and our experimental results are deferred to the accompanying technical report.

2 Preliminaries

Metric Labeling. The problem of metric labeling is deﬁned over an undirected graph G = (X, E). The vertices X = {X1, X2, , Xn} are random variables, and the edges E specify a neighborhood relationship over the random variables. Each random variable can be assigned a value from the label set L = {l1, l2, , lh}. We assume that we are also provided with a metric distance function d : L L R+ over the labels.

We refer to an assignment of values to all the random variables as a labeling. In other words, a labeling is a vector x Ln, which speciﬁes the label xa assigned to each random variable Xa. The hn different labelings are quantitatively distinguished from each other by an energy function Q(x), which is deﬁned as follows:

Xa X θa(xa) + X

(Xa,Xb) E wabd(xa, xb).

Here, the unary potentials θa( ) are arbitrary, and the edge weights wab are non-negative. Metric labeling requires us to ﬁnd a labeling with the minimum energy. It is known to be NP-hard.

Multiplicative Bound. As metric labeling plays a central role in low-level vision, several approximate algorithms have been proposed in the literature. A common theoretical measure of accuracy for an approximate algorithm is the multiplicative bound. In this work, we are interested in the multiplicative bound of an algorithm with respect to a distance function. Formally, given a distance function d, the multiplicative bound of an algorithm is said to be B if the following condition is satisﬁed for all possible values of unary potentials θa( ) and non-negative edge weights wab: X

Xa X θa(ˆxa) + X

(Xa,Xb) E wabd(ˆxa, ˆxb) X

Xa X θa(x a) + B X

(Xa,Xb) E wabd(x a, x b). (1)

Here, ˆx is the labeling estimated by the algorithm for the given values of unary potentials and edge weights, and x is an optimal labeling. Multiplicative bounds are greater than or equal to 1, and are invariant to reparameterizations of the unary potentials. A multiplicative bound B is said to be tight if the above inequality holds as an equality for some value of unary potentials and edge weights.

Linear Programming Relaxation. An overcomplete representation of a labeling can be speciﬁed using the following variables: (i) unary variables ya(i) {0, 1} for all Xa X and li L such that ya(i) = 1 if and only if Xa is assigned the label li; and (ii) pairwise variables yab(i, j) {0, 1} for all (Xa, Xb) E and li, lj L such that yab(i, j) = 1 if and only if Xa and Xb are assigned labels li and lj respectively. This allows us to formulate metric labeling as follows:

li L θa(li)ya(i) + X

li,lj L wabd(li, lj)yab(i, j),

li L ya(i) = 1, Xa X,

lj L yab(i, j) = ya(i), (Xa, Xb) E, li L,

li L yab(i, j) = yb(j), (Xa, Xb) E, lj L,

ya(i) {0, 1}, yab(i, j) {0, 1}, Xa X, (Xa, Xb) E, li, lj L.

By relaxing the ﬁnal set of constraints such that the optimization variables can take any value between 0 and 1 inclusive, we obtain a linear program (LP). The computational complexity of solving the LP relaxation is polynomial in the size of the problem.

Rounding Procedure. In order to prove theoretical guarantees of the LP relaxation, it is common to use a rounding procedure that can covert a feasible fractional solution y of the LP relaxation to a feasible integer solution ˆy of the integer linear program. Several rounding procedures have been

proposed in the literature. In this work, we focus on the randomized parallel rounding procedures proposed in [5, 10]. These procedures have the property that, given a fractional solution y, the probability of assigning a label li L to a random variable Xa X is equal to ya(i), that is,

Pr(ˆya(i) = 1) = ya(i). (2)

We will describe the various rounding procedures in detail in sections 3-5. For now, we would like to note that our reason for focusing on the parallel rounding of [5, 10] is that they provide the best known polynomial-time theoretical guarantees for metric labeling. Speciﬁcally, we are interested in their approximation factor, which is deﬁned next.

Approximation Factor. Given a distance function d, the approximation factor for a rounding procedure is said to be F if the following condition is satisﬁed for all feasible fractional solutions y:

li,lj L d(li, lj)ˆya(i)ˆyb(j)

li,lj L d(li, lj)yab(i, j). (3)

Here, ˆy refers to the integer solution, and the expectation is taken with respect to the randomized rounding procedure applied to the feasible solution y.

Given a rounding procedure with an approximation factor of F, an optimal fractional solution y of the LP relaxation can be rounded to a labeling ˆy that satisﬁes the following condition:

li L θa(li)ˆya(i) + X

li,lj L wabd(li, lj)ˆya(i)ˆyb(j)

li L θa(li)y a(i) + F X

li,lj L wabd(li, lj)y ab(i, j).

The above inequality follows directly from properties (2) and (3). Similar to multiplicative bounds, approximation factors are always greater than or equal to 1, and are invariant to reparameterizations of the unary potentials. An approximation factor F is said to be tight if the above inequality holds as an equality for some value of unary potentials and edge weights.

Submodular Energy Function. We will use the following important fact throughout this paper. Given an energy function deﬁned using arbitrary unary potentials, non-negative edge weights and a submodular distance function, an optimal labeling can be computed in polynomial time by solving an equivalent minimum st-cut problem [6]. Recall that a submodular distance function d over a label set L = {l1, l2, , lh} satisﬁes the following properties: (i) d (li, lj) 0 for all li, lj L, and d (li, lj) = 0 if and only if i = j; and (ii) d (li, lj) + d (li+1, lj+1) d (li, lj+1) + d (li+1, lj) for all li, lj L\{lh} (where \ refers to set difference).

3 Complete Rounding and Complete Move

We start with a simple rounding scheme, which we call complete rounding. While complete rounding is not very accurate, it would help illustrate the ﬂavor of our results. We will subsequently consider its generalizations, which have been useful in obtaining the best-known approximation factors for various special cases of metric labeling.

The complete rounding procedure consists of a single stage where we use the set of all unary variables to obtain a labeling (as opposed to other rounding procedures discussed subsequently). Algorithm 1 describes its main steps. Intuitively, it treats the value of the unary variable ya(i) as the probability of assigning the label li to the random variable Xa. It obtains a labeling by sampling from all the distributions ya = [ya(i), li L] simultaneously using the same random number.

It can be shown that using a different random number to sample the distributions ya and yb of two neighboring random variables (Xa, Xb) E results in an inﬁnite approximation factor. For example, let ya(i) = yb(i) = 1/h for all li L, where h is the number of labels. The pairwise variables yab that minimize the energy function are yab(i, i) = 1/h and yab(i, j) = 0 when i = j. For the above feasible solution of the LP relaxation, the RHS of inequality (3) is 0 for any ﬁnite F, while the LHS of inequality (3) is strictly greater than 0 if h > 1. However, we will shortly show that using the same random number r for all random variables provides a ﬁnite approximation factor.

Algorithm 1 The complete rounding procedure. input A feasible solution y of the LP relaxation. 1: Pick a real number r uniformly from [0, 1]. 2: for all Xa X do 3: Deﬁne Ya(0) = 0 and Ya(i) = Pi j=1 ya(j) for all li L. 4: Assign the label li L to the random variable Xa if Ya(i 1) < r Ya(i). 5: end for

We now turn our attention to designing a move-making algorithm whose multiplicative bound matches the approximation factor of the complete rounding procedure. To this end, we modify the range expansion algorithm proposed in [15] for truncated convex pairwise potentials to a general (semi-)metric distance function. Our method, which we refer to as the complete move-making algorithm, considers all putative labels of all random variables, and provides an approximate solution in a single iteration. Algorithm 2 describes its two main steps. First, it computes a submodular overestimation of the given distance function by solving the following optimization problem:

d = argmin d t (4)

s.t. d (li, lj) td(li, lj), li, lj L, d (li, lj) d(li, lj), li, lj L, d (li, lj) + d (li+1, lj+1) d (li, lj+1) + d (li+1, lj), li, lj L\{lh}.

The above problem minimizes the maximum ratio of the estimated distance to the original distance over all pairs of labels, that is, maxi =j d (li, lj)/d(li, lj). We will refer to the optimal value of problem (4) as the submodular distortion of the distance function d. Second, it replaces the original distance function by the submodular overestimation and computes an approximate solution to the original metric labeling problem by solving a single minimum st-cut problem. Note that, unlike the range expansion algorithm [15] that uses the readily available submodular overestimation of a truncated convex distance (namely, the corresponding convex distance function), our approach estimates the submodular overestimation via the LP (4). Since the LP (4) can be solved for any arbitrary distance function, it makes complete move-making more generally applicable.

Algorithm 2 The complete move-making algorithm. input Unary potentials θa( ), edge weights wab, distance function d. 1: Compute a submodular overestimation of d by solving problem (4). 2: Using the approach of [6], solve the following problem via an equivalent minimum st-cut problem: ˆx = argmin x Ln

Xa X θa(xa) + X

(Xa,Xb) E wabd(xa, xb).

The following theorem establishes the theoretical guarantees of the complete move-making algorithm and the complete rounding procedure.

Theorem 1. The tight multiplicative bound of the complete move-making algorithm is equal to the submodular distortion of the distance function. Furthermore, the tight approximation factor of the complete rounding procedure is also equal to the submodular distortion of the distance function.

In terms of computational complexities, complete move-making is signiﬁcantly faster than solving the LP relaxation. Speciﬁcally, given an MRF with n random variables and m edges, and a label set with h labels, the LP relaxation requires at least O(m3h3log(m2h3)) time, since it consists of O(mh2) optimization variables and O(mh) constraints. In contrast, complete move-making requires O(nmh3log(m)) time, since the graph constructed using the method of [6] consists of O(nh) nodes and O(mh2) arcs. Note that complete move-making also requires us to solve the linear program (4). However, since problem (4) is independent of the unary potentials and the edge weights, it only needs to be solved once beforehand in order to compute the approximate solution for any metric labeling problem deﬁned using the distance function d.

4 Interval Rounding and Interval Moves

Theorem 1 implies that the approximation factor of the complete rounding procedure is very large for distance functions that are highly non-submodular. For example, consider the truncated linear distance function deﬁned as follows over a label set L = {l1, l2, , lh}:

d(li, lj) = min{|i j|, M}.

Here, M is a user speciﬁed parameter that determines the maximum distance. The tightest submodular overestimation of the above distance function is the linear distance function, that is, d(li, lj) = |i j|. This implies that the submodular distortion of the truncated linear metric is (h 1)/M, and therefore, the approximation factor for the complete rounding procedure is also (h 1)/M. In order to avoid this large approximation factor, Chekuri et al. [5] proposed an interval rounding procedure, which captures the intuition that it is beneﬁcial to assign similar labels to as many random variables as possible.

Algorithm 3 provides a description of interval rounding. The rounding procedure chooses an interval of at most q consecutive labels (step 2). It generates a random number r (step 3), and uses it to attempt to assign labels to previously unlabeled random variables from the selected interval (steps 4-7). It can be shown that the overall procedure converges in a polynomial number of iterations with a probability of 1 [5]. Note that if we ﬁx q = h and z = 1, interval rounding becomes equivalent to complete rounding. However, the analyses in [5, 10] shows that other values of q provide better approximation factors for various special cases.

Algorithm 3 The interval rounding procedure. input A feasible solution y of the LP relaxation. 1: repeat 2: Pick an integer z uniformly from [ q + 2, h]. Deﬁne an interval of labels I = {ls, , le}, where s = max{z, 1} is the start index and e = min{z + q 1, h} is the end index. 3: Pick a real number r uniformly from [0, 1]. 4: for all Unlabeled random variables Xa do 5: Deﬁne Ya(0) = 0 and Ya(i) = Ps+i 1 j=s ya(j) for all i {1, , e s + 1}. 6: Assign the label ls+i 1 I to the Xa if Ya(i 1) < r Ya(i). 7: end for 8: until All random variables have been assigned a label.

Our goal is to design a move-making algorithm whose multiplicative bound matches the approximation factor of interval rounding for any choice of q. To this end, we propose the interval move-making algorithm that generalizes the range expansion algorithm [15], originally proposed for truncated convex distances, to arbitrary distance functions. Algorithm 4 provides its main steps. The central idea of the method is to improve a given labeling ˆx by allowing each random variable Xa to either retain its current label ˆxa or to choose a new label from an interval of consecutive labels. In more detail, let I = {ls, , le} L be an interval of labels of length at most q (step 4). For the sake of simplicity, let us assume that ˆxa / I for any random variable Xa. We deﬁne Ia = I S{ˆxa} (step 5). For each pair of neighboring random variables (Xa, Xb) E, we compute a submodular distance function dˆxa,ˆxb : Ia Ib R+ by solving the following linear program (step 6):

dˆxa,ˆxb = argmin d t (5)

s.t. d (li, lj) td(li, lj), li Ia, lj Ib, d (li, lj) d(li, lj), li Ia, lj Ib, d (li, lj) + d (li+1, lj+1) d (li, lj+1) + d (li+1, lj), li, lj I\{le}, d (li, le) + d (li+1, ˆxb) d (li, ˆxb) + d (li+1, le), li I\{le}, d (le, lj) + d (ˆxa, lj+1) d (le, lj+1) + d (ˆxa, lj), lj I\{le}, d (le, le) + d(ˆxa, ˆxb) d (le, ˆxb) + d (ˆxa, le).

Similar to problem (4), the above problem minimizes the maximum ratio of the estimated distance to the original distance. However, instead of introducing constraints for all pairs of labels, it is only

considers pairs of labels li and lj where li Ia and lj Ib. Furthermore, it does not modify the distance between the current labels ˆxa and ˆxb (as can be seen in the last constraint of problem (5)).

Given the submodular distance functions dˆxa,ˆxb, we can compute a new labeling x by solving the following optimization problem via minimum st-cut using the method of [6] (step 7):

x = argmin x

Xa X θa(xa) + X

(Xa,Xb) E wabdˆxa,ˆxb(xa, xb)

s.t. xa Ia, Xa X. (6)

If the energy of the new labeling x is less than that of the current labeling ˆx, then we update our labeling to x (steps 8-10). Otherwise, we retain the current estimate of the labeling and consider another interval. The algorithm converges when the energy does not decrease for any interval of length at most q. Note that, once again, the main difference between interval move-making and the range expansion algorithm is the use of an appropriate optimization problem, namely the LP (5), to obtain a submodular overestimation of the given distance function. This allows us to use interval move-making for the general metric labeling problem, instead of focusing on only truncated convex models.

Algorithm 4 The interval move-making algorithm. input Unary potentials θa( ), edge weights wab, distance function d, initial labeling x0. 1: Set current labeling to initial labeling, that is, ˆx = x0. 2: repeat 3: for all z [ q + 2, h] do 4: Deﬁne an interval of labels I = {ls, , le}, where s = max{z, 1} is the start index and e = min{z + q 1, h} is the end index. 5: Deﬁne Ia = I S{ˆxa} for all random variables Xa X. 6: Obtain submodular overestimates dˆxa,ˆxb for each pair of neighboring random variables (Xa, Xb) E by solving problem (5). 7: Obtain a new labeling x by solving problem (6). 8: if Energy of x is less than energy of ˆx then 9: Update ˆx = x. 10: end if 11: end for 12: until Energy cannot be decreased further.

The following theorem establishes the theoretical guarantees of the interval move-making algorithm and the interval rounding procedure.

Theorem 2. The tight multiplicative bound of the interval move-making algorithm is equal to the tight approximation factor of the interval rounding procedure.

An interval move-making algorithm that uses an interval length of q runs for at most O(h/q) iterations. This follows from a simple modiﬁcation of the result by Gupta and Tardos [8] (speciﬁcally, theorem 3.7). Hence, the total time complexity of interval move-making is O(nmhq2log(m)), since each iteration solves a minimum st-cut problem of a graph with O(nq) nodes and O(mq2) arcs. In other words, interval move-making is at most as computationally complex as complete move-making, which in turn is signiﬁcantly less complex than solving the LP relaxation. Note that problem (5), which is required for interval move-making, is independent of the unary potentials and the edge weights. Hence, it only needs to be solved once beforehand for all pairs of labels (ˆxa, ˆxb) L L in order to obtain a solution for any metric labeling problem deﬁned using the distance function d.

5 Hierarchical Rounding and Hierarchical Moves

We now consider the most general form of parallel rounding that has been proposed in the literature, namely the hierarchical rounding procedure [10]. The rounding relies on a hierarchical clustering of the labels. Formally, we denote a hierarchical clustering of m levels for the label set L by C = {C(i), i = 1, , m}. At each level i, the clustering C(i) = {C(i, j) L, j = 1, , hi} is

mutually exclusive and collectively exhaustive, that is, [

j C(i, j) = L, C(i, j) C(i, j ) = , j = j .

Furthermore, for each cluster C(i, j) at the level i > 2, there exists a unique cluster C(i 1, j ) in the level i 1 such that C(i, j) C(i 1, j ). We call the cluster C(i 1, j ) the parent of the cluster C(i, j) and deﬁne p(i, j) = j . Similarly, we call C(i, j) a child of C(i 1, j ). Without loss of generality, we assume that there exists a single cluster at level 1 that contains all the labels, and that each cluster at level m contains a single label.

Algorithm 5 The hierarchical rounding procedure. input A feasible solution y of the LP relaxation. 1: Deﬁne f 1 a = 1 for all Xa X. 2: for all i {2, , m} do 3: for all Xa X do 4: Deﬁne zi a(j) for all j {1, , hi} as follows:

zi a(j) = P k,lk C(i,j) ya(k) if p(i, j) = f i 1 a , 0 otherwise.

5: Deﬁne yi a(j) for all j {1, , hi} as follows:

yi a(j) = zi a(j) Phi j =1 zia(j )

6: end for 7: Using a rounding procedure (complete or interval) on yi = [yi a(j), Xa X, j {1, , hi}], obtain an integer solution ˆyi. 8: for all Xa X do 9: Let ka {1, , hi} such that ˆyi(ka) = 1. Deﬁne f i a = ka. 10: end for 11: end for 12: for all Xa X do 13: Let lk be the unique label present in the cluster C(m, f m a ). Assign lk to Xa. 14: end for

Algorithm 5 describes the hierarchical rounding procedure. Given a clustering C, it proceeds in a top-down fashion through the hierarchy while assigning each random variable to a cluster in the current level. Let f i a be the index of the cluster assigned to the random variable Xa in the level i. In the ﬁrst step, the rounding procedure assigns all the random variables to the unique cluster C(1, 1) (step 1). At each step i, it assigns each random variable to a unique cluster in the level i by computing a conditional probability distribution as follows. The conditional probability yi a(j) of assigning the random variable Xa to the cluster C(i, j) is proportional to P

lk C(i,j) ya(k) if p(i, j) = f i 1 a (steps 3-6). The conditional probability yi a(j) = 0 if p(i, j) = f i 1 a , that is, a random variable cannot be assigned to a cluster C(i, j) if it wasn t assigned to its parent in the previous step. Using a rounding procedure (complete or interval) for yi, we obtain an assignment of random variables to the clusters at level i (step 7). Once such an assignment is obtained, the values f i a are computed for all random variables Xa (steps 8-10). At the end of step m, hierarchical rounding would have assigned each random variable to a unique cluster in the level m. Since each cluster at level m consists of a single label, this provides us with a labeling of the MRF (steps 12-14).

Our goal is to design a move-making algorithm whose multiplicative bound matches the approximation factor of the hierarchical rounding procedure for any choice of hierarchical clustering C. To this end, we propose the hierarchical move-making algorithm, which extends the hierarchical graph cuts approach for hierarchically well-separated tree (HST) metrics proposed in [14]. Algorithm 6 provides its main steps. In contrast to hierarchical rounding, the move-making algorithm traverses the hierarchy in a bottom-up fashion while computing a labeling for each cluster in the current level. Let xi,j be the labeling corresponding to the cluster C(i, j). At the ﬁrst step, when considering the level m of the clustering, all the random variables are assigned the same label. Speciﬁcally, xm,j a

Algorithm 6 The hierarchical move-making algorithm. input Unary potentials θa( ), edge weights wab, distance function d. 1: for all j {1, , h} do 2: Let lk be the unique label is the cluster C(m, j). Deﬁne xm,j a = lk for all Xa X. 3: end for 4: for all i {2, , m} do 5: for all j {1, , hm i+1} do 6: Deﬁne Lm i+1,j a = {xm i+2,j a , p(m i + 2, j ) = j, j {1, , hm i+2}}. 7: Using a move-making algorithm (complete or interval), compute the labeling xm i+1,j under the constraint xm i+1,j a Lm i+1,j a . 8: end for 9: end for 10: The ﬁnal solution is x1,1.

is equal to the unique label contained in the cluster C(m, j) (steps 1-3). At step i, it computes the labeling xm i+1,j for each cluster C(m i + 1, j) by using the labelings computed in the previous step. Speciﬁcally, it restricts the label assigned to a random variable Xa in the labeling xm i+1,j to the subset of labels that were assigned to it by the labelings corresponding to the children of C(m i + 1, j) (step 6). Under this restriction, the labeling xm i+1,j is computed by approximately minimizing the energy using a move-making algorithm (step 7). Implicit in our description is the assumption that that we will use a move-making algorithm (complete or interval) in step 7 of Algorithm 6 whose multiplicative bound matches the approximation factor of the rounding procedure (complete or interval) used in step 7 of Algorithm 5. Note that, unlike the hierarchical graph cuts approach [14], the hierarchical move-making algorithm can be used for any arbitrary clustering and not just the one speciﬁed by an HST metric.

The following theorem establishes the theoretical guarantees of the hierarchical move-making algorithm and the hierarchical rounding procedure. Theorem 3. The tight multiplicative bound of the hierarchical move-making algorithm is equal to the tight approximation factor of the hierarchical rounding procedure.

Note that hierarchical move-making solves a series of problems deﬁned on a smaller label set. Since the complexity of complete and interval move-making is superlinear in the number of labels, it can be veriﬁed that the hierarchical move-making algorithm is at most as computationally complex as the complete move-making algorithm (corresponding to the case when the clustering consists of only one cluster that contains all the labels). Hence, hierarchical move-making is signiﬁcantly faster than solving the LP relaxation.

6 Discussion

For any general distance function that can be used to specify the (semi-)metric labeling problem, we proved that the approximation factor of a large family of parallel rounding procedures is matched by the multiplicative bound of move-making algorithms. This generalizes previously known results on the guarantees of move-making algorithms in two ways: (i) in contrast to previous results [14, 15, 20] that focused on special cases of distance functions, our results are applicable to arbitrary semi-metric distance functions; and (ii) the guarantees provided by our theorems are tight. Our experiments (described in the technical report) conﬁrm that the rounding-based move-making algorithms provide similar accuracy to the LP relaxation, while being signiﬁcantly faster due to the use of efﬁcient minimum st-cut solvers.

Several natural questions arise. What is the exact characterization of the rounding procedures for which it is possible to design matching move-making algorithms? Can we design rounding-based move-making algorithms for other combinatorial optimization problems? Answering these questions will not only expand our theoretical understanding, but also result in the development of efﬁcient and accurate algorithms.

Acknowledgements. This work is funded by the European Research Council under the European Community s Seventh Framework Programme (FP7/2007-2013)/ERC Grant agreement number 259112.

[1] A. Archer, J. Fakcharoenphol, C. Harrelson, R. Krauthgamer, K. Talvar, and E. Tardos. Approximate classiﬁcation via earthmover metrics. In SODA, 2004. [2] Y. Boykov and V. Kolmogorov. An experimental comparison of min-cut/max-ﬂow algorithms for energy minimization in vision. PAMI, 2004. [3] Y. Boykov, O. Veksler, and R. Zabih. Markov random ﬁelds with efﬁcient approximations. In CVPR, 1998. [4] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. In ICCV, 1999. [5] C. Chekuri, S. Khanna, J. Naor, and L. Zosin. Approximation algorithms for the metric labeling problem via a new linear programming formulation. In SODA, 2001. [6] B. Flach and D. Schlesinger. Transforming an arbitrary minsum problem into a binary one. Technical report, TU Dresden, 2006. [7] A. Globerson and T. Jaakkola. Fixing max-product: Convergent message passing algorithms for MAP LP-relaxations. In NIPS, 2007. [8] A. Gupta and E. Tardos. A constant factor approximation algorithm for a class of classiﬁcation problems. In STOC, 2000. [9] T. Hazan and A. Shashua. Convergent message-passing algorithms for inference over general graphs with convex free energy. In UAI, 2008. [10] J. Kleinberg and E. Tardos. Approximation algorithms for classiﬁcation problems with pairwise relationships: Metric labeling and Markov random ﬁelds. In STOC, 1999. [11] V. Kolmogorov. Convergent tree-reweighted message passing for energy minimization. PAMI, 2006. [12] N. Komodakis, N. Paragios, and G. Tziritas. MRF optimization via dual decomposition: Message-passing revisited. In ICCV, 2007. [13] A. Koster, C. van Hoesel, and A. Kolen. The partial constraint satisfaction problem: Facets and lifting theorems. Operations Research Letters, 1998. [14] M. P. Kumar and D. Koller. MAP estimation of semi-metric MRFs via hierarchical graph cuts. In UAI, 2009. [15] M. P. Kumar and P. Torr. Improved moves for truncated convex models. In NIPS, 2008. [16] P. Ravikumar, A. Agarwal, and M. Wainwright. Message-passing for graph-structured linear programs: Proximal projections, convergence and rounding schemes. In ICML, 2008. [17] M. Schlesinger. Syntactic analysis of two-dimensional visual signals in noisy conditions. Kibernetika, 1976. [18] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M. Tappen, and C. Rother. A comparative study of energy minimization methods for Markov random ﬁelds with smoothness-based priors. PAMI, 2008. [19] D. Tarlow, D. Batra, P. Kohli, and V. Kolmogorov. Dynamic tree block coordinate ascent. In ICML, 2011. [20] O. Veksler. Efﬁcient graph-based energy minimization methods in computer vision. Ph D thesis, Cornell University, 1999. [21] O. Veksler. Graph cut based optimization for MRFs with truncated convex priors. In CVPR, 2007. [22] M. Wainwright, T. Jaakkola, and A. Willsky. MAP estimation via agreement on trees: Message passing and linear programming. Transactions on Information Theory, 2005. [23] Y. Weiss, C. Yanover, and T. Meltzer. MAP estimation, linear programming and belief propagation with convex free energies. In UAI, 2007. [24] T. Werner. A linear programming approach to max-sum problem: A review. PAMI, 2007. [25] T. Werner. Revisting the linear programming relaxation approach to Gibbs energy minimization and weighted constraint satisfaction. PAMI, 2010.