# optimizing_ratio_of_monotone_set_functions__dbf3cce3.pdf

Optimizing Ratio of Monotone Set Functions

Chao Qian1, Jing-Cheng Shi2, Yang Yu2, Ke Tang1, Zhi-Hua Zhou2

1UBRI, School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China 2National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China {chaoqian, ketang}@ustc.edu.cn, {shijc, yuy, zhouzh}@lamda.nju.edu.cn

This paper considers the problem of minimizing the ratio of two set functions, i.e., f/g. Previous work assumed monotone and submodular of the two functions, while we consider a more general situation where g is not necessarily submodular. We derive that the greedy approach Greed Ratio, as a ﬁxed time algorithm, achieves a |X | (1+(|X | 1)(1 κf ))γ(g)- approximation ratio, which also improves the previous bound for submodular g. If more time can be spent, we present the PORM algorithm, an anytime randomized iterative approach minimizing f and g simultaneously. We show that PORM using reasonable time has the same general approximation guarantee as Greed Ratio, but can achieve better solutions in cases and applications.

1 Introduction

Minimizing the ratio of two set functions, i.e., f/g, can be found useful in many applications. For examples, in machine learning tasks, optimizing F-measure [Rijsbergen and Joost, 1974], linear discriminant analysis [Mc Lachlan, 2004], normalized cut [Shi and Malik, 2000], etc., involve ratio optimization. Recently, Bai et al. [2016] studied the ratio minimization problem where the functions are monotone and submodular. We will denote this problem as RS minimization. In [Bai et al., 2016], several algorithms with bounded approximation guarantees were proposed for RS minimization. The Greed Ratio algorithm iteratively selects one element that makes the ratio of the marginal gain by this element minimized. Other methods ﬁrst connect RS minimization to the problem of minimizing the difference between submodular functions [Iyer and Bilmes, 2012] or connect it to the problem of submodular optimization with submodular constraints [Iyer and Bilmes, 2013], and then apply the existing techniques of the related problems. Greed Ratio was also shown to obtain the best empirical performance in the application of F-measure maximization.

This work was supported by the NSFC (61603367, 61333014), the Jiangsu SF (BK20160066) and the Collaborative Innovation Center of Novel Software Technology and Industrialization.

In this paper, we consider a more general situation, i.e., the function g can be non-submodular. We ﬁrst prove that Greed Ratio obtains a |X | 1+(|X | 1)(1 ˆ κf (X )) 1 γ ,|X |(g)- approximation guarantee (Theorem 1), where |X | is the size of an optimal solution X , ˆκf(X ) is the curvature of f, and γ ,|X |(g) is the submodularity ratio of g. Note that when g is submodular, our derived bound becomes |X | 1+(|X | 1)(1 ˆ κf (X )), which improves the previous known bound 1 1 eκf 1 [Bai et al., 2016]. Particularly, the approximation bound is improved from to |X | for κf = 1, and improved from e e 1 to 1 for κf = 0. Note that the greedy nature of Greed Ratio results in an efﬁcient ﬁxed time algorithm, but meanwhile may limit its performance. We then propose a Pareto optimization [Qian et al., 2015b; 2016] method, PORM, which is an anytime algorithm that can use more time to ﬁnd better solutions. PORM ﬁrst reformulates the original problem f/g as a bi-objective optimization problem that minimizes f and maximizes g simultaneously, then employs a randomized iterative algorithm to solve it, and ﬁnally selects the solution with the smallest ratio from the maintained set of solutions. Compared with singlebit forward search by Greed Ratio, PORM can perform backward search, multi-path search and multi-bit search, which may help alleviate the issue of getting trapped in local optima. Our main theoretical results for PORM are that, within reasonable time,

PORM can achieve the same general approximation guarantee as Greed Ratio (Theorem 2).

In a case of F-measure maximization, PORM can escape the local optimum by backward search, multi-path search and multi-bit search to ﬁnd an optimum, while Greed Ratio cannot (Theorem 3).

Experimental results on F-measure maximization exhibit the superior performance of PORM.

2 Minimizing Ratio of Monotone Functions

Given a ﬁnite set V = {v1, v2, . . . , vn}, we study the functions f : 2V R deﬁned on subsets of V . A set function f : 2V R is monotone if for any X Y , f(X) f(Y ). Without loss of generality, we assume that monotone functions are normalized, i.e., f( ) = 0. A set function f : 2V

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

R is submodular [Nemhauser et al., 1978] if for any X Y ,

f(Y ) f(X) X

v Y \X f(X {v}) f(X) . (1)

We then introduce two concepts, which will be used in our analysis. The submodularity ratio as presented in Deﬁnition 1 characterizes how close a set function f is to submodularity. It is easy to see from Eq. (1) that f is submodular iff γX,k(f) = 1 for any X and k. The curvature as presented in Deﬁnition 2 characterizes how close a monotone submodular set function f is to modularity. It is easy to verify that ˆκf(X) κf(X) κf. Note that f is modular iff κf = 0. Deﬁnition 1 (Submodularity Ratio [Das and Kempe, 2011]). Let f be a non-negative set function. The submodularity ratio of f with respect to a set X and a parameter k 1 is

γX,k(f) = min L X,S:|S| k,S L=

P v S f(L {v}) f(L)

f(L S) f(L) .

Deﬁnition 2 (Curvature [Conforti and Cornu ejols, 1984; Vondr ak, 2010; Iyer et al., 2013]). Let f be a monotone submodular set function. The total curvature of f is

κf = 1 min v V :f(v)>0

f(V ) f(V \ {v}) . f(v).

The curvature with respect to a set X V is

κf(X) = 1 min v X:f(v)>0

f(X) f(X \ {v}) . f(v),

and an alternative notion is

ˆκf(X) = 1 X

f(X) f(X \ {v}) . X

We study the problem in Deﬁnition 3 that minimizes the ratio of a monotone submodular function f and a monotone function g. We can assume that v V , f(v) > 0, because for any v with f(v) = 0, we can simply add it into the ﬁnal subset, which will not increase the ratio. Note that we only consider minimization since maximizing f/g is equivalent to minimizing g/f. Deﬁnition 3 (The General Problem). Given a monotone submodular function f : 2V R+ and a monotone function g :2V R+, the task is as follows:

arg min X V f(X)/g(X). (2)

Maximizing the F-measure in information retrieval is a special instance of our studied problem with g being submodular, which will also be studied in this paper. Given a bipartite graph G(V, W, E), where V is a set of objects, W is a set of words and each edge (v, w) E means that the object v contains the word w, we deﬁne the function Γ : 2V 2W as for any X V , Γ(X) = {w W | v X, (v, w) E}, i.e., Γ(X) is the set of words contained by the objects in X. Then, the information retrieval problem of ﬁnding a subset of objects that exactly cover a target set of words O W can be formulated as maximizing the F-measure of the coverage on O, as shown in Deﬁnition 4. It is easy to verify that both |Γ(X) O| and |O| + |Γ(X)| are monotone and submodular, where | | denotes the size of a set.

Algorithm 1 Greed Ratio Algorithm Input: monotone submodular functions f, g : 2V R+ Output: a subset X V Process: 1: Let X0 = , R = V and i = 0. 2: repeat 3: v arg minv R f(Xi {v}) f(Xi)

g(Xi {v}) g(Xi) .

4: Xi+1 = Xi {v}. 5: R = {v | g(Xi+1 {v}) g(Xi+1) > 0}. 6: i = i + 1. 7: until R = 8: return Xi with i arg mini f(Xi)/g(Xi)

Deﬁnition 4 (F-measure Maximization). Given a bipartite graph G(V, W, E) and a target subset O W, the task is as follows:

arg max X V

F(X) = 2|Γ(X) O|

|O| + |Γ(X)|

3 The Greed Ratio Method Bai et al. [2016] have recently investigated the problem of minimizing the ratio of monotone submodular functions, i.e., the function g in Deﬁnition 3 is submodular. They proved that the Greed Ratio algorithm can obtain a 1 1 eκf 1 - approximation guarantee, and also conducted experiments to show that Greed Ratio achieves the best performance on Fmeasure maximization. As shown in Algorithm 1, Greed Ratio iteratively selects one element v such that the ratio of the marginal gain on f and g by adding v is minimized. We theoretically analyze the performance of Greed Ratio for our studied general problem, i.e., g is not necessarily submodular. We prove its general approximation bound in Theorem 1, where OPT denotes the optimal function value of Eq. (2). Let X be an optimal solution with the minimum size, i.e., f(X )/g(X ) = OPT and |X | = min{|X| | f(X)/g(X) = OPT}. The proof idea is that the best single element v obtains the desired approximation bound (as shown in Lemma 2), and the subset output by Greed Ratio (i.e., line 8 of Algorithm 1) obviously satisﬁes that f(Xi )/g(Xi ) f(v )/g(v ). Lemma 1. [Iyer et al., 2013] Given a monotone submodular function f : 2V R+, it holds that, for any X V , X

v X f(v) |X| 1 + (|X| 1)(1 ˆκf(X))f(X).

Lemma 2. For minimizing the ratio f/g where f is monotone submodular and g is monotone, there exists v V such that

f(v ) g(v ) |X | 1 + (|X | 1)(1 ˆκf(X )) 1 γ ,|X |(g) OPT.

Proof. Let v arg minv V f(v)/g(v). By the deﬁnition of submodularity ratio (i.e., Deﬁnition 1), we get

g(X ) P v X g(v) γ ,|X |(g) 1 γ ,|X |(g) g(v ) f(v )

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

1 γ ,|X |(g) g(v ) f(v ) |X | f(X ) 1 + (|X | 1)(1 ˆκf(X )),

where the last inequality is derived by Lemma 1. Thus, the lemma holds.

Theorem 1. For minimizing the ratio f/g where f is monotone submodular and g is monotone, Greed Ratio ﬁnds a subset X V with

f(X) g(X) |X | 1 + (|X | 1)(1 ˆκf(X )) 1 γ ,|X |(g) OPT.

For the special case of g being submodular, γX,k(g) = 1 for any X and k, and thus the obtained bound in Theorem 1 becomes |X | 1+(|X | 1)(1 ˆ κf (X )), as in Corollary 1. This improves the previous bound 1 1 eκf 1 [Bai et al., 2016], since

|X | 1+(|X | 1)(1 ˆ κf (X )) 1 1 ˆ κf (X ) 1 1 κf 1 1 eκf 1 ,

where the ﬁrst inequality is by ˆκf(X ) [0, 1], the second inequality is by ˆκf(X ) κf(X ) κf, and the third is by κf = 1 (1 κf) eκf 1. Particularly, the previous known bound becomes vacuous when κf = 1, while our derived bound has an upper limit |X |. Note that when f is modular (i.e., κf = 0), our derived bound discloses that Greed Ratio ﬁnds an optimal solution, while the known bound in [Bai et al., 2016] shows that Greed Ratio only achieves a e e 1-approximation guarantee.

Corollary 1. For minimizing the ratio f/g where both f and g are monotone and submodular, Greed Ratio ﬁnds a subset X V with f(X)

g(X) |X | 1+(|X | 1)(1 ˆ kf (X )) OPT.

4 The PORM Method

The greedy nature of Greed Ratio may limit its performance. To alleviate the issue of getting trapped in local optima, we propose a new approach PORM adopting the Pareto Optimization [Qian et al., 2015a; 2015b] idea. Pareto optimization is a recently emerged framework that uses bi-objective optimization as an intermediate step to solve single-objective optimization problems. It was previously applied for constrained optimization [Qian et al., 2015a; 2015b], where the degree of constraint violation is optimized with the original objective simultaneously. Differently, PORM is for unconstrained optimization, where two objectives are created by dividing the original objective. PORM reformulates the original problem Eq. (2) as a biobjective minimization problem

arg minx {0,1}n f(x), g(x) .

That is, PORM minimizes f and maximizes g simultaneously. Note that we use a Boolean vector x {0, 1}n to represent a subset X V , where the i-th bit xi is a binary indicator of the membership of vi in X. In this paper, we will not distinguish x {0, 1}n and its corresponding subset X. In the bi-objective setting, both the two objective values have to be considered for comparing two solutions x and x . x weakly dominates x (i.e., x is better than x , denoted as

Algorithm 2 PORM algorithm Input: a monotone submodular function f : {0, 1}n R+ and a monotone function g : {0, 1}n R+ Parameter: the number T of iterations Output: a solution x {0, 1}n Process: 1: Select x from {0, 1}n uniformly at random. 2: Let P = {x} and t = 0. 3: while t < T do 4: Select x from P uniformly at random. 5: Generate x by ﬂipping each bit of x with prob. 1/n. 6: if z P such that z x then 7: P = (P \ {z P | x z}) {x }. 8: Q = {z P | |z| = |x |}. 9: z1 = arg minz Q f(z), z2 = arg maxz Q g(z), z3 = arg minz Q f(z)/g(z). 10: P = (P \ Q) {z1, z2, z3}. 11: end if 12: t = t + 1. 13: end while 14: return arg minx P f(x)/g(x)

x x ) if f(x) f(x ) g(x) g(x ); x dominates x (i.e., x is strictly better, denoted as x x ) if x x and either f(x) < f(x ) or g(x) > g(x ). But if neither x is better than x nor x is better than x, they are incomparable. The procedure of PORM is described in Algorithm 2. It starts from a random solution (line 1) and then iteratively tries to improve the solutions in the archive P (lines 3-13). In each iteration, a new solution x is generated by randomly ﬂipping bits of an archived solution x selected from the current P (lines 4-5); if x is not dominated by any previously archived solution (line 6), it will be added into P and meanwhile those previously archived solutions weakly dominated by x will be removed from P (line 7). Note that although the domination-based comparison makes the archive P contain only incomparable solutions, the size of P can be large, which may reduce the efﬁciency of PORM. In order to control the size of P and meanwhile avoid the loss of useful information, we keep three informative solutions for each subset size |x| {0, 1, . . . , n} (lines 8-10): z1 and z2 are two boundary solutions with the smallest f value and the largest g value, respectively, and z3 is the solution with the smallest ratio. Note that z3 can be the same as z1 or z2. Thus, it is easy to see that |P| is upper bounded by 1 + 3(n 1) + 1 = 3n 1. PORM repeats for T iterations. The value of T affects the quality of the produced solution, which will be analyzed in the next section. After the iterations, the solution having the smallest ratio in P is selected (line 14). Compared with Greed Ratio, PORM can escape local optima by three different ways: (1) backward search that ﬂips one bit value from 1 to 0; (2) multi-path search that maintains several (incomparable) solutions in the archive P; (3) multi-bit search that ﬂips more than one 0-bits to 1-bits simultaneously. These advantages of PORM over Greed Ratio will be theoretically shown in the next section.

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

5 Approximation Guarantee of PORM

We ﬁrst prove the general approximation bound of PORM in Theorem 2, where E[T] denotes the expected number of iterations. We can see that PORM reaches the same approximation guarantee as Greed Ratio. Note that E[T] depends on the f value of the initial solution generated in line 1 of Algorithm 2 (denoted as finit) and the minimum f value fmin = min{f(v) | v V }.

Theorem 2. For minimizing the ratio f/g where f is monotone submodular and g is monotone, PORM with E[T] en(3n 1)(1 + 1 1 κf (1 + log finit

fmin )) ﬁnds a solution x with

f(x) g(x) |X | 1 + (|X | 1)(1 ˆκf(X )) 1 γ ,|X |(g) OPT.

The proof idea is to follow the behavior of Greed Ratio. That is, the optimization process is divided into two phases: (1) starts from an initial random solution and ﬁnishes until ﬁnding the special solution {0}n (i.e., ); (2) starts after phase (1) and ﬁnishes until ﬁnding a solution with the desired approximation guarantee. The expected number of iterations of phase (1) as shown in Lemma 4 is derived by using Lemma 3, a recently proposed approach for analyzing the hitting time of a random process. Note that log here is the natural logarithm.

Lemma 3. [Doerr et al., 2012] Let S R+ be a ﬁnite set of positive numbers with minimum smin. Let {Xt}t N be a sequence of random variables over S {0}. Let τ be the random variable that denotes the ﬁrst point in time t N for which Xt = 0. Suppose that there exists δ > 0 such that E[Xt Xt+1 | Xt = s] δs holds for all s S. Then for all s0 S, we have E[τ | X0 = s0] 1 + log(s0/smin) /δ.

Lemma 4. For minimizing the ratio f/g where f is monotone submodular and g is monotone, the expected number of iterations until PORM ﬁnds the solution {0}n is at most en(3n 1)

1 κf (1 + log finit

Proof. We use Lemma 3 to prove it. Let Xt = min{f(x) | x P} after t iterations of PORM. Note that Xt = 0 implies that the solution {0}n is found, since f(x) > 0 for any nonempty subset x. Thus, the variable τ in Lemma 3 is just the number of iterations required by PORM for ﬁnding {0}n. We investigate E[Xt Xt+1|Xt]. Let ˆx be the corresponding solution with f(ˆx) = Xt. We ﬁrst show that Xt cannot increase, i.e., Xt+1 Xt. If ˆx is not deleted, Xt obviously will not increase. Note that there are two possible cases for removing ˆx from P. If ˆx is deleted in line 7 of Algorithm 2, the newly included solution x must weakly dominate ˆx, implying f(x ) f(ˆx). If ˆx is deleted in line 10, |ˆx| = |x | and f(x ) must be smaller than f(ˆx), since the solution in Q with the smallest f value is kept. Thus, Xt will not increase. We then show that Xt can decrease by ﬂipping only one 1-bit of ˆx. Let Pmax denote the largest size of P during the optimization process. In the (t + 1)-th iteration, we consider that ˆx is selected in line 4 of Algorithm 2, which happens with probability at least 1 Pmax due to uniform selection; and in line 5, only the i-th bit of ˆx (i.e., ˆxi) is ﬂipped, which happens with probability 1

n)n 1 1 en. If ˆxi = 1, the

newly generated solution x = ˆx\{vi}. By the monotonicity of f, we have f(x ) = f(ˆx \ {vi}) f(ˆx). If the inequality strictly holds, x now has the smallest f value and will be added into P, which leads to Xt+1 = f(ˆx \ {vi}) < Xt. If f(ˆx \ {vi}) = f(ˆx), obviously Xt+1 = Xt; but we can still write Xt+1 = f(ˆx \ {vi}). Thus, we have shown that Xt does not increase, and can decrease by ﬂipping only one 1-bit of ˆx. We then get

E[Xt+1|Xt] P

en Pmax + 1 |ˆx| en Pmax

which implies that E[Xt Xt+1 | Xt] = Xt E[Xt+1 | Xt]

Xt f(ˆx\{vi})

en Pmax = P v ˆx f(ˆx) f(ˆx\{v})

By the deﬁnition of curvature (i.e., Deﬁnition 2), we have

1 ˆκf(ˆx) =

v ˆ x f(ˆx) f(ˆx\{v})

v ˆ x f(ˆx) f(ˆx\{v})

where the inequality is by Eq. (1), i.e., f(ˆx) = f(ˆx) f( ) P v ˆx(f(v) f( )) = P v ˆx f(v). Thus,

E[Xt Xt+1|Xt] 1 ˆ κf (ˆx) en Pmax f(ˆx) 1 κf en(3n 1)Xt,

where the last inequality is by Pmax 3n 1, Xt = f(ˆx), and ˆκf(ˆx) κf(ˆx) κf. That is, the condition of Lemma 3 holds with δ = 1 κf en(3n 1). Note that X0 = finit and smin = fmin. By Lemma 3, we get

E[τ | X0 = finit] en(3n 1)

1 + log finit

Proof of Theorem 2. We analyze phase (2) after ﬁnding {0}n. Note that once {0}n is found, it will always be in the archive P, because it has the smallest f value and no other solutions can dominate it. By selecting {0}n in line 4 of Algorithm 2 and ﬂipping only the bit corresponding to the best single element v arg minv V f(v)/g(v) in line 5, which happens with probability at least 1 Pmax 1

n)n 1 1 en(3n 1), the new solution x = {v } is generated. Then, x is used to update P (i.e., lines 6-11). This will make P always contain a solution which either weakly dominates x or is incomparable with x but has a smaller ratio. That is, P will always contain a solution x with

f(x) g(x) f(v )

g(v ) |X | 1+(|X | 1)(1 ˆ κf (X )) OP T γ ,|X |(g),

where the last inequality is by Lemma 2. Thus, phase (2) needs at most en(3n 1) expected number of iterations. By combining the two phases, we get that the total expected number of iterations for ﬁnding a solution with a |X | 1+(|X | 1)(1 ˆ κf (X )) 1 γ ,|X |(g)-approximation ratio is

E[T] en(3n 1) 1 + 1 1 κf

1 + log finit

By using an illustrative example of F-measure maximization in information retrieval, we then prove in Theorem 3 that Greed Ratio will get trapped in a local optimal solution, while PORM can avoid local optima by three different ways, and ﬁnally ﬁnd a global optimal solution. As shown in Deﬁnition 5,

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

this example has a unique global optimal solution {1}n 10 (i.e., {v1, . . . , vn 1}). The proof idea is mainly that Greed Ratio will ﬁrst select the object vn due to the greedy nature and will be misled by it, while PORM can avoid vn by backward search, multi-path search or multi-bit search, which will be shown in the proof, respectively.

Deﬁnition 5 (An Example of F-measure Maximization). Let V = {v1, v2 . . . , vn}. The function Γ and the target set O satisfy that

(1) 1 i, j n 1 : Γ(vi) Γ(vj) = ,

|Γ(vi)| = |Γ(vj)| = n2;

(2) 1 i n 1 : |O Γ(vi)| = n2 1,

|O| = (n 1)(n2 1); (3) 1 i n 1 : Γ(vn) Γ(vi) O Γ(vi), |Γ(vn) Γ(vi)|=n+2, |Γ(vn)|=|Γ(vn) O|+1.

Theorem 3. For the F-measure maximization example as in Deﬁnition 5, PORM with E[T] en(3n 1)(2+2 log n) ﬁnds the optimal solution {1}n 10, while Greed Ratio cannot.

Proof. By the deﬁnition of F(x) = (2|Γ(x) O|)/(|O| + |Γ(x)|), we derive that, for any x with |x| = i xn = 0, F(x) = 2(n2 1)i n3 n2 n+1+n2i, which increases with i and reaches the maximum 1 1 2n2 1 when i = n 1; for any x with

|x| = i xn = 1, F(x) = 4n+2+2(n2 n 3)i n3 n2+n+2+(n2 n 2)i, which increases with i and reaches the maximum 1 1 2n2 2n 1+2/n when i = n. Thus, {1}n 10 is the unique optimal solution. For Greed Ratio, it ﬁrst selects one object v with the best ratio of the marginal gain, i.e., the largest (2|Γ(v) O|)/(|O|+ |Γ(v)| |O|) value. For any 1 i n 1, we have

|Γ(vi)| = 2(n2 1)

n2 < 2(n2+n 2)

n2+n 1 = 2|Γ(vn) O|

Thus, Greed Ratio will ﬁrst select vn. From line 8 of Algorithm 1, it is easy to see that the ﬁnal output subset by Greed Ratio will always contain vn. Thus, Greed Ratio cannot ﬁnd the optimal solution {v1, . . . , vn 1}. For the PORM algorithm, the problem of F-measure maximization is implemented as minimizing f(x) = |O| + |Γ(x)| and maximizing g(x) = |Γ(x) O| simultaneously. We then prove that PORM can ﬁnd the optimal solution {v1, . . . , vn 1} by three different ways. [Backward search] The idea is that PORM ﬁrst efﬁciently ﬁnds all the objects V , and then simply deleting vn from V can produce the optimal solution. Let Gt = max{g(x) | x P} after t iterations of Algorithm 2. We ﬁrst use Lemma 3 to derive the number of iterations (denoted as T1) until Gt reaches the maximum |O|. Let Xt = |O| Gt. Then, the random variable τ in Lemma 3 is just T1, because Xt = 0 is equivalent to Gt = |O|. Since E[Xt Xt+1|Xt] = E[Gt+1 Gt|Gt], we only need to analyze the change of Gt. Let ˆx be the corresponding solution with g(ˆx) = Gt. As in the proof of Lemma 4, we can similarly show that Gt does not decrease, and can increase by

ﬂipping only one 0-bit of ˆx. We then get

E[Gt+1 Gt | Gt] X

i:ˆxi=0 g(ˆx {vi}) g(ˆx)

g(ˆx {v}) g(ˆx)

en Pmax g(V ) g(ˆx)

en Pmax Xt en(3n 1),

where the second inequality is by the submodularity of g, i.e., Eq. (1). That is, the condition of Lemma 3 holds with δ = 1 en(3n 1). Note that X0 = |O| G0 |O| = (n 1)(n2 1) and smin = n2 1 (n + 2). By Lemma 3, we get

E[T1] = E[τ | X0] en(3n 1) (1 + 2 log n) .

From Deﬁnition 5, we know that a solution x with g(x) = |Γ(x) O| = |O| must contain v1, . . . , vn 1. Then, there are two possible solutions: {1}n 10 and {1}n. To derive an upper bound on the number of iterations for ﬁnding the optimal solution {1}n 10, we pessimistically assume that {1}n 10 is not found. For the case that P contains {1}n, the optimal solution {1}n 10 can be generated by selecting {1}n in line 4 and ﬂipping the last 1-bit in line 5, which happens with probability at least 1 Pmax 1

n)n 1 1 en(3n 1). Denote the number of iterations in this phase as T2. We then have

E[T2] en(3n 1).

Therefore, we get that the expected number of iterations for PORM ﬁnding the optimal solution {1}n 10 is

E[T] E[T1] + E[T2] en(3n 1)(2 + 2 log n).

[Multi-path search] Let xi (where 1 i n 1) denote any solution such that |xi| = i xi n = 0, i.e., there are i 1s in the ﬁrst n 1 bits and the last bit is 0. The idea is that PORM ﬁrst efﬁciently ﬁnds the empty set {0}n, and then follows the path x1 x2 xn 1 to produce the optimal solution. Note that although x1 is worse than the solution {0}n 11 (i.e., {vn}) on the original objective f/g, they are incomparable in the bi-objective setting, and thus x1 will be kept in P, which allows PORM to follow a path different from that by Greed Ratio to ﬁnd the optimal solution. Let T1 denote the number of iterations for ﬁnding the solution {0}n. Due to the fact that the f(x) value increases with the number of 1-bits of x, we can derive a better upper bound for E[T1] than Lemma 4. Let j denote the minimum number of 1-bits of the solutions in P. First, j cannot increase, because the smallest f value of the solutions in P cannot increase. Second, j can decrease by 1 in each iteration with probability at least 1 Pmax j

n)n 1 j en(3n 1), since it is sufﬁcient to select a solution with j 1s in line 4 and ﬂip only one of its j 1-bits in line 5. Thus, for reaching j = 0,

E[T1] Pn j=1 en(3n 1)

j en(3n 1)(1 + log n).

Starting from {0}n, let T2 denote the number of iterations for following the path x1 x2 xn 1 (i.e., the optimal solution). Note that for 1 i n 1, xi cannot be weakly dominated by any other solution, and there are only two different objective vectors for |x| = i. Thus, according to the updating procedure of PORM (lines 6-11 of Algorithm 2),

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

we know that once xi is found, it will be always kept in P. The probability of xi xi+1 is at least 1 Pmax n 1 i

n (1 1 n)n 1 n 1 i en(3n 1), since it is sufﬁcient to select xi in line 4 and then ﬂip only one of its ﬁrst n 1 i 0-bits. Thus,

E[T2] Pn 2 i=0 en(3n 1)

n 1 i en(3n 1)(1 + log n).

Combining the above two phases leads to the total expected number of iterations for ﬁnding the optimal solution:

E[T] E[T1] + E[T2] en(3n 1)(2 + 2 log n).

[Multi-bit search] The idea is that ﬂipping two 0s in the ﬁrst n 1 bits of {0}n simultaneously can ﬁnd the solution x2, which has a better f/g value than the solution x with |x| = 2 xn = 1 found by Greed Ratio; then following the path x2 x3 xn 1 can ﬁnd the optimal solution. Using the same analysis as multi-path search, PORM can ﬁrst ﬁnd the solution {0}n in at most en(3n 1)(1 + log n) expected number of iterations. After that, selecting {0}n in line 4 and only ﬂipping any two 0s in its ﬁrst n 1 bits can generate the solution x2 with probability at least

1 Pmax ( n 1 2 ) n2 (1 1

n)n 2 (n 1)(n 2)

2e(3n 1)n2 . Compared with the

solution x with |x| = 2 xn = 1, f(x2)

g(x2) = (n+1)n2 n+1

(n+1)n2 n 2

2n2 5 = f(x)

g(x). PORM then can easily follow the path x2 x3 xn 1 to ﬁnd the optimal solution. The total expected number of iterations is at most

en(3n 1)(1 + log n) + 2e(3n 1)n2

(n 1)(n 2) + Pn 2 i=2 en(3n 1)

n 1 i en(3n 1)(2 + 2 log n).

Taking the minimum of the expected number of iterations for ﬁnding the optimal solution by backward search, multipath search and multi-bit search, the theorem holds.

6 Empirical Study

We conducted experiments on F-measure maximization to investigate the actual performance of PORM. We compare PORM only with Greed Ratio, since it is the previous algorithm achieving the best empirical performance [Bai et al., 2016]. The generalized form of F-measure is used for evaluation, i.e., Fp(X) = (|Γ(X) O|)/(p|O|+(1 p)|Γ(X)|). We will test p = {0.2, . . . , 0.8}. Note that the F-measure in Deﬁnition 4 is just F0.5. For PORM, the number T of iterations is set to 3en2(2+log |Γ(Xinit)|) (where Xinit is the initial subset generated by PORM), as suggested by Theorem 2. Note that we have used the lower bound 1 for 1 1 κf . We use synthetic (syn-100, syn-1000) as well as real-world (ldc-100, ldc-1000) data sets. For syn-100, a bipartite graph G(V, W, E) with |V | = |W| = 100 is randomly generated by adding one edge between v V and w W independently with probability 0.05; the target set O with |O| = 20 is randomly chosen from W. For syn-1000, |V | = |W| = 1000, |O| = 100 and each edge is added with probability 0.01. ldc-100 and ldc-1000 contain 100 and 1000 English sentences, respectively, which are randomly sampled from the LDC2002E18 text data (https://www.ldc.upenn.edu/).

0.2 0.3 0.4 0.5 0.6 0.7 0.8 p

Improvement Ratio

ldc100 ldc1000 syn100 syn1000

Figure 1: Ratio of improvement of PORM to Greed Ratio.

0 10 20 30 40 50 Running time in n2

PORM Greed Ratio

(a) on syn-100; p = 0.8

0 20 40 60 Running time in n2

PORM Greed Ratio

(b) on ldc-100; p = 0.8

Figure 2: Performance v.s. running time of PORM.

Their target sets contain 1000 randomly chosen words. For each data set, we generate ten random instances and report the average results. Note that since PORM is a randomized algorithm, its run is further repeated 10 times independently for each data set instance. Figure 1 shows the percentages of the solution quality that PORM improves from Greed Ratio, where we can observe that PORM is always better than Greed Ratio and can have more than 3% improvement. Comparing the running time (in the number of function calculations), Greed Ratio takes the time in the order of n2; PORM is set to use 3en2(2 + log |Γ(Xinit)|) time according to the theoretical upper bound (i.e., a worst case) for PORM being good. We empirically examine how effective PORM is in practice. By selecting Greed Ratio as the baseline, we plot the curve of the F-measure over the running time for PORM on syn-100 and ldc-100 with p = 0.8, as shown in Figure 2. The x-axis is in n2, the running time of Greed Ratio. We can observe that PORM takes about only 6% (3/52) and 3% (2/69) of the worst-case running time to achieve a better performance, respectively. This implies that PORM can be efﬁcient in practice.

7 Conclusion

In this paper, we study the problem of minimizing the ratio f/g, where f is monotone submodular and g is monotone. We prove the approximation bound of Greed Ratio for the problem, which even improves the previous result for g being submodular. We then propose a new algorithm PORM, and prove that PORM can achieve the same general approximation guarantee as Greed Ratio, but can have a better ability of avoiding local optima. The empirical results on the application of F-measure maximization verify the superior performance of PORM.

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

References [Bai et al., 2016] W. Bai, R. Iyer, K. Wei, and J. Bilmes. Algorithms for optimizing the ratio of submodular functions. In Proceedings of the 33rd International Conference on Machine Learning (ICML 16), pages 2751 2759, New York, NY, 2016. [Conforti and Cornu ejols, 1984] M. Conforti and G. Cornu ejols. Submodular set functions, matroids and the greedy algorithm: Tight worst-case bounds and some generalizations of the Rado-Edmonds theorem. Discrete Applied Mathematics, 7(3):251 274, 1984. [Das and Kempe, 2011] A. Das and D. Kempe. Submodular meets spectral: Greedy algorithms for subset selection, sparse approximation and dictionary selection. In Proceedings of the 28th International Conference on Machine Learning (ICML 11), pages 1057 1064, Bellevue, WA, 2011. [Doerr et al., 2012] B. Doerr, D. Johannsen, and C. Winzen. Multiplicative drift analysis. Algorithmica, 64(4):673 697, 2012. [Iyer and Bilmes, 2012] R. Iyer and J. Bilmes. Algorithms for approximate minimization of the difference between submodular functions, with applications. In Proceedings of the 28th Conference on Uncertainty in Artiﬁcial Intelligence (UAI 12), pages 407 417, Catalina Island, CA, 2012. [Iyer and Bilmes, 2013] R. Iyer and J. Bilmes. Submodular optimization with submodular cover and submodular knapsack constraints. In Advances in Neural Information Processing Systems 26 (NIPS 13), pages 2436 2444, Lake Tahoe, NV, 2013. [Iyer et al., 2013] R. Iyer, S. Jegelka, and J. Bilmes. Curvature and optimal algorithms for learning and minimizing submodular functions. In Advances in Neural Information Processing Systems 26 (NIPS 13), pages 2742 2750, Lake Tahoe, NV, 2013. [Mc Lachlan, 2004] G. J. Mc Lachlan. Discriminant Analysis and Statistical Pattern Recognition. Wiley Interscience, 2004. [Nemhauser et al., 1978] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions I. Mathematical Programming, 14(1):265 294, 1978. [Qian et al., 2015a] C. Qian, Y. Yu, and Z.-H. Zhou. On constrained Boolean Pareto optimization. In Proceedings of the 24th International Joint Conference on Artiﬁcial Intelligence (IJCAI 15), pages 389 395, Buenos Aires, Argentina, 2015. [Qian et al., 2015b] C. Qian, Y. Yu, and Z.-H. Zhou. Subset selection by Pareto optimization. In Advances in Neural Information Processing Systems 28 (NIPS 15), pages 1765 1773, Montreal, Canada, 2015. [Qian et al., 2016] C. Qian, J.-C. Shi, Y. Yu, K. Tang, and Z.-H. Zhou. Parallel Pareto optimization for subset selection. In Proceedings of the 25th International Joint Con-

ference on Artiﬁcial Intelligence (IJCAI 16), pages 1939 1945, New York, NY, 2016. [Rijsbergen and Joost, 1974] V. Rijsbergen and C. Joost. Foundation of evaluation. Journal of Documentation, 30(4):365 373, 1974. [Shi and Malik, 2000] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888 905, 2000. [Vondr ak, 2010] J. Vondr ak. Submodularity and curvature: The optimal algorithm. RIMS Kokyuroku Bessatsu B, 23:253 266, 2010.

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)