# on_subset_selection_with_general_cost_constraints__d5ea32cb.pdf

On Subset Selection with General Cost Constraints

Chao Qian1, Jing-Cheng Shi2, Yang Yu2, Ke Tang1

1UBRI, School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China 2National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China {chaoqian, ketang}@ustc.edu.cn, {shijc, yuy}@lamda.nju.edu.cn

This paper considers the subset selection problem with a monotone objective function and a monotone cost constraint, which relaxes the submodular property of previous studies. We ﬁrst show that the approximation ratio of the generalized greedy algorithm is α

2 (1 1 eα ) (where α is the submodularity ratio); and then propose POMC, an anytime randomized iterative approach that can utilize more time to ﬁnd better solutions than the generalized greedy algorithm. We show that POMC can obtain the same general approximation guarantee as the generalized greedy algorithm, but can achieve better solutions in cases and applications.

1 Introduction The problem of selecting a k-element subset that maximizes a monotone submodular objective f lays in the core of many applications, which has been addressed with gradually relaxed assumptions on the constraint. With cardinality constraints: The subset selection problem with a cardinality constraint |X| k was studied early. It was shown to be NP-hard. The greedy algorithm, which iteratively selects one element with the largest marginal gain, can achieve a (1 1

e)-approximation guarantee [Nemhauser et al., 1978]. This approximation ratio is optimal in general [Nemhauser and Wolsey, 1978]. With linear cost constraints: After that, the problem with a linear cost constraint c(X) B (where c is a linear function) attracted more attentions. The original greedy algorithm meets some trouble: it has an unbounded approximation ratio [Khuller et al., 1999]. The generalized greedy algorithm was then developed to achieve a 1

e)-approximation guarantee [Krause and Guestrin, 2005], which was further improved to (1 1 e) [Lin and Bilmes, 2010]. The generalized greedy algorithm iteratively selects the element with the largest ratio of the marginal gain on f and c, and ﬁnally outputs the better of the found subset and the best single element. By partial enumerations, the generalized greedy algo-

This work was supported by the NSFC (61603367, 61375061, 61672478, U1605251, U1613216), the Jiangsu SF (BK20160066) and the CCF-Tencent Open Research Fund.

rithm can even achieve a (1 1

e)-approximation ratio, but with much more computation time [Krause and Guestrin, 2005]. With monotone submodular cost constraints: Iyer and Bilmes [2013] later considered the problem with a monotone submodular cost constraint. By choosing appropriate surrogate functions for f and c to optimize over, several algorithms with bounded approximation guarantees were proposed. However, in some real applications such as mobile robotic sensing and door-to-door marketing, the cost function can be non-submodular [Herer, 1999], which appeals for more general optimization. With monotone cost constraints: Zhang and Vorobeychik [2016] analyzed the case that the cost function c is only monotone. By introducing the concept of submodularity ratio, they proved that the generalized greedy algorithm achieves a 1

e)-approximation ratio, comparing with the optimal solution with a slightly relaxed budget constraint. Most of the above-mentioned previous studies considered monotone submodular objective functions. Meanwhile, some applications such as sparse regression and dictionary selection involve non-submodular objective functions [Das and Kempe, 2011]. This paper considers the problem of maximizing monotone functions with monotone cost constraints, that is, both the objective function f and the cost function c can be non-submodular. First, we extend the analytical results in [Zhang and Vorobeychik, 2016], and derive that the generalized greedy algorithm obtains a αf

2 (1 1 eαf )-approximation guarantee (Theorem 1), where αf is the submodularity ratio of f. The greedy nature of the generalized greedy algorithm results in an efﬁcient ﬁxed time algorithm, but at the same time limits its performance. We then propose a Pareto optimization [Qian et al., 2015; 2016] method, POMC, which is an anytime algorithm that can use more time to ﬁnd better solutions. POMC ﬁrst reformulates the original constrained optimization problem as a bi-objective optimization problem that maximizes f and minimizes c simultaneously, then employs a randomized iterative algorithm to solve it, and ﬁnally selects the best feasible solution from the maintained set of solutions. We show that POMC can achieve the same general approximation guarantee as the generalized greedy algorithm (Theorem 2). Moreover, in a case of the inﬂuence maximization application, POMC can escape the local optimum, while the generalized greedy algorithm cannot (Theorem 3).

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

Experimental results on sensor placement and inﬂuence maximization with both cardinality and routing constraints exhibit the superior performance of POMC.

2 Preliminaries

The General Problem. Given a ﬁnite set V = {v1, . . . , vn}, we study the functions f : 2V R over subsets of V . Such a set function f is monotone if for any X Y , f(X) f(Y ). Without loss of generality, we assume that monotone functions are normalized, i.e., f( ) = 0. A set function f : 2V R is submodular [Nemhauser et al., 1978] if for any X Y V and v / Y ,

f(X v) f(X) f(Y v) f(Y ); (1)

or for any X Y V ,

f(Y ) f(X) X

v Y \X f(X v) f(X) . (2)

Note that we represent a set {v} with a single element as v. Two concepts will be used in our analysis. The submodularity ratio in Deﬁnition 1 characterizes how close a set function is to submodularity. It is easy to see from Eq. (1) that f is submodular iff αf = 1. Note that an alternative definition based on Eq. (2) was also used in [Das and Kempe, 2011]. The curvature in Deﬁnition 2 characterizes how close a monotone submodular set function f is to modularity. It is easy to verify that 1 1 κf(X) 1 κf 0. But in this paper, we also use it for a general monotone function f as in [Zhang and Vorobeychik, 2016]. Without the submodular property, it only holds that 1 κf(X) αf(1 κf) 0.

Deﬁnition 1 (Submodularity Ratio [Zhang and Vorobeychik, 2016]). The submodularity ratio of a non-negative set function f is deﬁned as αf = min X Y,v / Y f(X v) f(X)

f(Y v) f(Y ) .

Deﬁnition 2 (Curvature [Conforti and Cornu ejols, 1984; Vondr ak, 2010; Iyer et al., 2013]). Let f be a monotone submodular set function. The total curvature of f is

κf = 1 min v V :f(v)>0

f(V ) f(V \ v) f(v).

The curvature with respect to a set X V is

κf(X) = 1 min v X:f(v)>0

f(X) f(X \ v) f(v).

Our studied problem as presented in Deﬁnition 3 is to maximize a monotone objective function f subject to an upper limit on a monotone cost function c. Since the exact computation of c(X) (e.g., a shortest walk to visit a subset of vertices on a graph) may be unsolvable in polynomial time, we assume that only an ψ(n)-approximation function ˆc(X) can be obtained as in [Zhang and Vorobeychik, 2016], where c(X) ˆc(X) ψ(n) c(X). If ψ(n) = 1, ˆc(X) = c(X).

Deﬁnition 3 (The General Problem). Given a monotone objective function f : 2V R+, a monotone cost function c : 2V R+ and a budget B, the task is as follows:

arg max X V f(X) s.t. c(X) B. (3)

Sensor Placement. Sensor placement as presented in Definition 4 is to decide where to take measurements such that

the entropy of selected locations is maximized, where oj denotes a random variable representing the observations collected from location vj by installing a sensor. Note that the entropy H( ) is monotone and submodular. The traditional sensor placement problem [Krause et al., 2008] has an upper limit on the selected number of sensors, that is, c(X) = |X|. However, in mobile robotic sensing domains, both the costs of moving between locations and that of making measurements at locations need to be considered. Let a graph G(V, E) characterize the routing network of all the locations, where ce is the cost of traversing an edge e E and cv is the cost of visiting a node v V . The cost function is usually deﬁned as c(X) = c R(X) + P v X cv [Zhang and Vorobeychik, 2016], where c R(X) is the cost of the shortest walk to visit each node in X at least once (i.e, a variant of the TSP problem). Note that c R(X) is generally nonsubmodular [Herer, 1999] and cannot be exactly computed in polynomial time. These two kinds of cost constraints will be called cardinality and routing constraints, respectively.

Deﬁnition 4 (Sensor Placement). Given n locations V = {v1, . . . , vn}, a cost function c and a budget B, the task is as follows: arg max X V H({oj | vj X}) s.t. c(X) B.

Inﬂuence Maximization. Inﬂuence maximization is to identify a set of inﬂuential users in social networks. Let a directed graph G(V, E) represent a social network, where each node is a user and each edge (u, v) E has a probability pu,v representing the strength of inﬂuence from user u to v. As presented in Deﬁnition 5, the goal is to ﬁnd a subset X of V such that the expected number of nodes activated by propagating from X is maximized. A fundamental propagation model is Independence Cascade (IC). Starting from a seed set X, it uses a set At to record the nodes activated at time t, and at time t + 1, each inactive neighbor v of u At becomes active with probability pu,v; this process is repeated until no nodes get activated at some time. The set of nodes activated by propagating from X is denoted as IC(X), which is a random variable. The objective E[|IC(X)|] is monotone and submodular. The traditional inﬂuence maximization problem [Kempe et al., 2003] is with cardinality constraints, i.e., c(X) = |X|. However, some special applications are with routing constraints, such as door-to-door marketing needs to consider the traversing time cost to visit households of choice.

Deﬁnition 5 (Inﬂuence Maximization). Given a directed graph G(V, E), edge probabilities pu,v ((u, v) E), a cost function c and a budget B, the task is as follows:

arg max X V E[|IC(X)|] s.t. c(X) B.

3 The Generalized Greedy Algorithm

Zhang and Vorobeychik [2016] have recently investigated the problem of maximizing a monotone submodular function with a monotone cost constraint, i.e., the function f in Deﬁnition 3 is submodular. They proposed the generalized greedy algorithm, as shown in Algorithm 1. It iteratively selects one element v such that the ratio of the marginal gain on f and ˆc by adding v is maximized; the better of the found subset X and the best single element v is ﬁnally returned.

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

Algorithm 1 Generalized Greedy Algorithm Input: a monotone objective function f, a monotone approximate cost function ˆc, and a budget B Output: a solution X V with ˆc(X) B Process: 1: Let X = and V = V . 2: repeat 3: v arg maxv V f(X v) f(X)

ˆc(X v) ˆc(X) .

4: if ˆc(X v ) B then X = X v end if 5: V = V \ {v }. 6: until V = 7: Let v arg maxv V :ˆc(v) B f(v). 8: return arg max S {X,v } f(S)

Let Kc = max{|X| | c(X) B}, i.e., the largest size of a feasible solution. Let X be a corresponding solution to

max f(X) | c(X) αˆc B(1+α2 c(Kc 1)(1 κc))

i.e., an optimal solution of the original problem Eq. (3) with a slightly smaller budget constraint. The generalized greedy algorithm was proved to achieve a 1

e)-approximation ratio of f( X). The key of that proof is Lemma 1, that is, the inclusion of the greedily selected element can improve f by at least a quantity proportional to the current distance to the optimum. Note that in their original proof, 1 κc(X) 1 κc is used, which is, however, not true. For a general monotone cost function c(X), it only holds that 1 κc(X) αc(1 κc) by Deﬁnitions 1 and 2. We have corrected their results by replacing αc with α2 c in Eq. (4). Lemma 1. [Zhang and Vorobeychik, 2016] Let Xi be the subset generated after the i-th iteration of Algorithm 1 until the ﬁrst time the budget constraint is violated. For the problem in Deﬁnition 3 with f being submodular, it holds that

f(Xi) f(Xi 1) ˆc(Xi) ˆc(Xi 1)

B (f( X) f(Xi 1)). We extend their results to the general situation, i.e., f is not necessarily submodular. As shown in Theorem 1, the approximation ratio now is αf

2 (1 1 eαf ). The proofs of Lemma 2 and Theorem 1 are similar to that of Lemma 1 and the theorem in [Zhang and Vorobeychik, 2016]. The major difference is that instead of Eq. (1), f(X v) f(X) αf (f(Y v) f(Y )) is used for a general monotone objective function f. Lemma 2. Let Xi be the subset generated after the i-th iteration of Algorithm 1 until the ﬁrst time the budget constraint is violated. For the problem in Deﬁnition 3, it holds that

f(Xi) f(Xi 1) αf ˆc(Xi) ˆc(Xi 1)

B (f( X) f(Xi 1)). Theorem 1. For the problem in Deﬁnition 3, the generalized greedy algorithm ﬁnds a subset X V with

f(X) (αf/2) (1 1/eαf ) f( X).

4 The POMC Approach The greedy nature of the generalized greedy algorithm may limit its performance. To alleviate the issue of getting trapped

Algorithm 2 POMC Algorithm Input: a monotone objective function f, a monotone approximate cost function ˆc, and a budget B Parameter: the number T of iterations Output: a solution x {0, 1}n with ˆc(x) B Process: 1: Let x = {0}n and P = {x}. 2: Let t = 0. 3: while t < T do 4: Select x from P uniformly at random. 5: Generate x by ﬂipping each bit of x with prob. 1/n. 6: if z P such that z x then 7: P = (P \ {z P | x z}) {x }. 8: end if 9: t = t + 1. 10: end while 11: return arg maxx P :ˆc(x) B f(x)

in local optima, we propose a new approach based on Pareto Optimization [Qian et al., 2015] for maximizing a Monotone function with a monotone Cost constraint, called POMC. POMC reformulates the original problem Eq. (3) as a biobjective maximization problem

arg maxx {0,1}n f1(x), f2(x) ,

where f1(x) = , ˆc(x) 2B f(x), otherwise , f2(x) = ˆc(x).

That is, POMC maximizes the objective function f and minimizes the approximate cost function ˆc simultaneously. Setting f1 to is to exclude overly infeasible solutions. Note that we use a Boolean vector x {0, 1}n to represent a subset X V , where the i-th bit xi = 1 means that vi X, and xi = 0 means that vi / X. In this paper, we will not distinguish x {0, 1}n and its corresponding subset X. In the bi-objective setting, both the two objective values have to be considered for comparing two solutions x and x . x weakly dominates x (i.e., x is better than x , denoted as x x ) if f1(x) f1(x ) f2(x) f2(x ); x dominates x (i.e., x is strictly better, denoted as x x ) if x x and either f1(x) > f1(x ) or f2(x) > f2(x ). But if neither x is better than x nor x is better than x, they are incomparable. The procedure of POMC is described in Algorithm 2. Starting from the empty set {0}n (line 1), it repeatedly tries to improve the quality of solutions in the archive P (lines 3-10). In each iteration, a new solution x is generated by randomly ﬂipping bits of an archived solution x selected from the current P (lines 4-5); if x is not dominated by any previously archived solution (line 6), it will be added into P, and meanwhile those solutions worse than x will be removed (line 7). Note that P always contains incomparable solutions. POMC repeats for T iterations. The value of T is a parameter, which could affect the quality of the produced solution. Their relationship will be analyzed in the next section, and we will use the theoretically derived T value in the experiments. After the iterations, the solution having the largest f value and satisfying the budget constraint in P is selected (line 11).

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

Compared with the generalized greedy algorithm, POMC may escape local optima by two different ways: (1) backward search, i.e., ﬂipping one bit value from 1 to 0; (2) multi-bit search, i.e., ﬂipping more than one 0-bits to 1-bits simultaneously. This advantage of POMC over the generalized greedy algorithm will be theoretically shown in the next section.

5 Approximation Guarantee of POMC We ﬁrst prove the general approximation bound of POMC in Theorem 2, where E[T] denotes the expected number of iterations. Let Pmax denote the largest size of P during the run of POMC, and let δˆc = min{ˆc(X v) ˆc(X) | X V, v / X}. Note that we assume that δˆc > 0. The proof relies on Lemma 3, which can be directly derived from Lemma 2. This is because the proof of Lemma 2 does not depend on Xi 1, but only requires that the element added into Xi 1 has the largest ratio of the marginal gain on f and ˆc. Lemma 3. For any X V , let v arg maxv / X f(X v) f(X)

ˆc(X v) ˆc(X) . It holds that

f(X v ) f(X) αf ˆc(X v ) ˆc(X)

B (f( X) f(X)). Theorem 2. For the problem in Deﬁnition 3, POMC with E[T] en BPmax/δˆc ﬁnds a subset X V with

f(X) (αf/2) (1 1/eαf ) f( X).

Proof. The proof is accomplished by analyzing the increase of a quantity Jmax, which is deﬁned as Jmax = max{j [0, B) | x P, ˆc(x) j f(x) (1 (1 αf j Bk)k) f( X) for some k}. The initial value of Jmax is 0, since POMC starts from {0}n. Assume that currently Jmax = i < B. Let x be the corresponding solution with the value i, i.e., ˆc(x) i and

f(x) 1 1 αf i Bk k f( X) for some k.

We ﬁrst show that Jmax cannot decrease. If x is kept in P, Jmax obviously will not decrease. If x is deleted from P (line 7 of Algorithm 2), the newly included solution x must weakly dominate x, i.e., f(x ) f(x) and ˆc(x ) ˆc(x), which makes Jmax i. We then show that Jmax can increase by at least δˆc in each iteration with probability at least 1 en Pmax . By Lemma 3, we know that ﬂipping one speciﬁc 0-bit of x (i.e., adding a speciﬁc element) can generate a new solution x such that

f(x ) f(x) αf ˆc(x ) ˆc(x)

B (f( X) f(x)).

By applying the above two inequalities, we get

f(x ) 1 1 αf i Bk k 1 αf ˆc(x ) ˆc(x)

1 1 αf i+ˆc(x ) ˆc(x)

B(k+1) k+1 f( X),

where the second inequality is by applying the AM-GM inequality. Since ˆc(x ) i + ˆc(x ) ˆc(x), x will be included into P; otherwise, x must be dominated by one solution in P (line 6 of Algorithm 2), and this implies that Jmax has already been larger than i, which contradicts with the assumption Jmax = i. After including x , Jmax i+ˆc(x ) ˆc(x)

i+δˆc. Let Pmax denote the largest size of P during the run of POMC. Thus, Jmax can increase by at least δˆc in one iteration with probability at least 1 Pmax 1

n)n 1 1 en Pmax , where 1 Pmax is a lower bound on the probability of selecting x in line 4 of Algorithm 2 due to uniform selection and 1

n)n 1 is the probability of ﬂipping a speciﬁc bit of x and keeping other bits unchanged in line 5. Then, it needs at most en Pmax expected number of iterations to increase Jmax by at least δˆc. Let v arg maxv / x f(x v) f(x)

ˆc(x v) ˆc(x) . It is easy to see that after at most B δˆc en Pmax expected number of iterations, Jmax + ˆc(x v ) ˆc(x) B. This implies that there exists one solution x in P satisfying that ˆc(x) Jmax < B and

f(x v ) 1 1 αf Jmax+ˆc(x v ) ˆc(x)

Bk k f( X) 1 1 eαf f( X).

Let y = arg maxv V :ˆc(v) B f(v). We then get

f(x v ) =f(x)+(f(x v ) f(x)) f(x)+f(v )/αf f(x) + f(y)/αf (f(x) + f(y))/αf,

where the ﬁrst inequality is by Deﬁnition 1, and the last is by αf [0, 1]. Thus, after at most en BPmax/δˆc iterations in expectation, P must contain a solution x with ˆc(x) B and

f(x) + f(y) αf(1 1/eαf ) f( X).

Note that {0}n will always be in P, since it has the smallest ˆc value and no other solutions can dominate it. Thus, y can be generated in one iteration by selecting {0}n in line 4 of Algorithm 2 and ﬂipping only the corresponding 0-bit in line 5, whose probability is at least 1 Pmax 1

n)n 1 1 en Pmax . That is, y will be generated in at most en Pmax expected iterations. According to the updating procedure of P (lines 6-8), we know that once y is produced, P will always contain a solution z y, i.e., ˆc(z) ˆc(y) B and f(z) f(y). By line 11 of Algorithm 2, the best solution satisfying the budget constraint will be ﬁnally returned. Thus, POMC using E[T] en BPmax/δˆc ﬁnds a solution with the f value at least

max{f(x), f(y)} (αf/2) (1 1/eαf ) f( X).

The above theorem shows that POMC can obtain the same general approximation ratio as the generalized greedy algorithm. By using an illustrative example of inﬂuence maximization with cardinality constraints, we then prove in Theorem 3 that the generalized greedy algorithm will get trapped in local optima, while POMC can ﬁnd the global optimum. As shown in Example 1, it has a unique global optimal solution {v1, . . . , vk 1, vk+1, . . . , v2k 1}. The proof idea is that the generalized greedy algorithm will ﬁrst select vk due to the greedy nature and will be misled by it, while POMC can avoid vk by backward search and multi-bit search, which will be shown in the proof, respectively.

Example 1. The parameters of inﬂuence maximization with cardinality constraints in Deﬁnition 5 are set as: the graph G(V, E) is shown in Figure 1 where each edge has a probability 1 and n = 4k + 5, and the budget B = 2k 2.

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

𝑣𝑘 1 𝑣𝑘 𝑣𝑘+1 𝑣2𝑘 1 𝑣1 𝑣𝑘 2 𝑣𝑘+2

Figure 1: A social network graph.

Lemma 4 (Multiplicative Drift). [Doerr et al., 2012] Let S R+ be a set of positive numbers with minimum smin. Let {Xt}t N be a sequence of random variables over S {0}. Let τ be the random variable that denotes the ﬁrst time for which Xt = 0. If there exists a real number δ > 0 such that E[Xt Xt+1 | Xt = s] δs holds for all s S, then for all s0 S, we have E[τ | X0 = s0] 1 + log(s0/smin) /δ.

Theorem 3. For Example 1, POMC with E[T] = O(Bn log n) ﬁnds the optimal solution, while the generalized greedy algorithm cannot.

Proof. Since each edge probability is 1, the objective f(x) is just the number of nodes reached from X. It is easy to verify that the solution {v1, . . . , vk 1, vk+1, . . . , v2k 1} with the f value n 1 is optimal. For the generalized greedy algorithm, we only need to consider the gain on f since the gain on the cost c is always 1 for cardinality constraints. It ﬁrst selects vk, which has the largest f value 7. By the procedure of Algorithm 1, we know that the output solution must contain vk, which implies that the optimal solution cannot be found. For the POMC algorithm, the problem is implemented as maximizing f(x) and minimizing |x| simultaneously. We then prove that POMC can ﬁnd the optimal solution {v1, . . . , vk 1, vk+1, . . . , v2k 1} by two different ways. [Backward search] The idea is that POMC ﬁrst efﬁciently ﬁnds a solution with the maximum f value n, then reaches the solution {v1, . . . , v2k 1} by deleting elements vi with i 2k, and ﬁnally deleting vk can produce the optimal solution. Let Ft = max{f(x)|x P} after t iterations of POMC. We ﬁrst use Lemma 4 to derive the number of iterations (denoted as T1) until Ft = n. Let Xt = n Ft. Then, the variable τ in Lemma 4 is just T1, because Xt = 0 is equivalent to Ft = n. Since E[Xt Xt+1|Xt] = E[Ft+1 Ft|Ft], we only need to analyze the change of Ft. Let ˆx be the corresponding solution with f(ˆx) = Ft. First, Ft will not decrease, since deleting ˆx in line 7 of Algorithm 2 implies that the newly included solution x ˆx, i.e, f(x ) f(ˆx). We then show that Ft can increase by ﬂipping only one 0-bit of ˆx. Consider that ˆx is selected in line 4 and only ˆxi is ﬂipped in line 5, whose probability is at least 1 Pmax 1

n)n 1. If ˆxi = 0, the newly generated solution x = ˆx vi. By the monotonicity of f, f(x ) = f(ˆx vi) f(ˆx). If the inequality strictly holds, x now has the largest f value and will be added into P, which leads to Ft+1 = f(ˆx vi) > Ft. If f(ˆx vi) = f(ˆx), obviously Ft+1 = Ft = f(ˆx vi). Thus,

E[Xt Xt+1|Xt] = E[Ft+1 Ft|Ft] P i:ˆxi=0 f(ˆx vi) f(ˆx)

v V \ˆ x f(ˆx v) f(ˆx)

en Pmax f(V ) f(ˆx)

en Pmax = n Ft en Pmax = Xt en Pmax ,

where the second inequality is by the submodularity of f, i.e., Eq. (2). Note that X0 n smin = 1. By Lemma 4, we get E[T1] = E[τ | X0] en Pmax(1 + log n). From Example 1, we know that a solution x with f(x) = n must contain v1, . . . , v2k 1. Let xi (0 i n (2k 1)) denote a solution with f(xi) = n and |xi| = 2k 1 + i. P will always contain exactly one xi, because any solution with the f value smaller than n cannot dominate xi and the solutions in P are incomparable. By selecting xi in line 4 and ﬂipping only one of the last i 1-bits, whose probability is at least 1 Pmax i n(1 1

n)n 1, xi 1 will be generated. Since xi 1 xi, xi will be replaced by xi 1. Let T2 denote the number of iterations until P contains x0. Thus, we get E[T2] Pn (2k 1) i=1 en Pmax

i en Pmax(1 + log n). After ﬁnding x0 = {v1, . . . , v2k 1}, the optimal solution can be generated by selecting x0 in line 4 and ﬂipping only the 1-bit corresponding to vk in line 5, whose probability is at least 1 Pmax 1

n)n 1. Denote the number of iterations in this phase as T3. We then have E[T3] en Pmax. Since P only contains incomparable solutions, each value of one objective can correspond to at most one solution in P. The solutions with |x| 2B have the f value , and must be excluded from P. Thus, Pmax 2B. By combining the above three phases, we get that the expected number of iterations for POMC ﬁnding the optimal solution is

E[T] E[T1] + E[T2] + E[T3] = O(Bn log n).

[Multi-bit search] Let xi (2 i 2k 2) denote the best solution with |x| = i. It is easy to verify that xi must contain vk 1 and vk+1, and inserting a speciﬁc element into xi can generate one xi+1. The idea is that ﬂipping the two 0s (corresponding to vk 1 and vk+1) of the solution {0}n simultaneously can ﬁnd x2; then following the path x2 x3 x2k 2 can produce the optimal solution. The probability of generating x2 from {0}n is at least 1 Pmax 1 n2 (1 1

n)n 2 1 en2Pmax . Note that once xi is generated, it will always be in P, since it cannot be dominated by any other solution. The probability of xi xi+1 is at least 1 Pmax 1

n)n 1 1 en Pmax , since it is sufﬁcient to select xi in line 4 and then ﬂip only one speciﬁc 0-bit. Thus,

E[T] en2Pmax + (2k 4) en Pmax = O(Bn2).

Taking the minimum of the expected number of iterations for ﬁnding the optimal solution by backward search and multi-bit search, the theorem holds.

6 Experiments

We empirically compare POMC with the generalized greedy algorithm (denoted as Greedy), the previous best algorithm [Zhang and Vorobeychik, 2016], on sensor placement and inﬂuence maximization with both cardinality and routing constraints. Note that we implement the enhancement of Greedy, that is, elements continue to be added in line 7 of Algorithm 1 until the budget constraint is violated. As POMC is a randomized algorithm, we repeat the run 10 times independently and report the average results.

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

5 6 7 8 9 10 Budget B

POMC Greedy

0.5 0.6 0.7 0.8 0.9 1 Budget B

POMC Greedy

5 6 7 8 9 10 Budget B

OPT POMC Greedy

0.5 0.6 0.7 0.8 0.9 1 Budget B

POMC Greedy

(a) (Berkeley, cardinality) (b) (Berkeley, routing) (c) (Beijing, cardinality) (d) (Beijing, routing) Figure 2: Sensor placement (entropy: the larger the better).

5 10 15 20 25 30 Budget B

Influence Spread

POMC Greedy

(a) (Digg, cardinality)

5 6 7 8 9 10 Budget B

Influence Spread

POMC Greedy

(b) (Synthetic, routing) Figure 3: Inﬂuence Maximization (spread: the larger the better).

0 20 40 60 Running time in Bn

POMC Greedy

(a) Berkeley, cardinality, B = 10

0 10 20 30 Running time in n2

POMC Greedy

(b) Berkeley, routing, B = 1 Figure 4: Performance v.s. running time of POMC.

Sensor Placement. We use two real-world data sets: one (http://db.csail.mit.edu/labdata/labdata. html) is collected from sensors installed at 55 locations of the Intel Berkeley Research lab; the other [Zheng et al., 2013] is air quality data collected from 36 monitoring stations in Beijing. The light (discretized into 5 bins with equal size) and temperature measures are extracted, respectively. The routing network is constructed as a complete graph, where the edge cost corresponds to the physical distance between two locations (normalized to [0,1]) and the node cost is set as 0.1. Note that the entropy is calculated using the observed frequency; the routing cost is approximately computed by the nearest neighbor method [Rosenkrantz et al., 1977]. For cardinality and routing constraints, the budget B is set as {5, 6, . . . , 10} and {0.5, 0.6, . . . , 1}, respectively. The results plotted in Figure 2 show that POMC is better than Greedy in most cases, and never worse. Note that on Figure 2(c) (where OPT denotes the optimal solution by exhaustive enumeration), Greedy has already been nearly optimal. Inﬂuence Maximization. For cardinality constraints, we use a real-world data set (http://www.isi.edu/ lerman/downloads/digg2009.html) collected from the social news website Digg. It contains two tables, the friendship links between users and the user votes on news stories [Hogg and Lerman, 2012]. After preprocessing, we get a directed graph with 3523 nodes and 90244 edges. We use the method in [Barbieri et al., 2012] to estimate the edge probabilities from the user votes. For routing constraints, we use simulated networks. A social network with 400 nodes is constructed by the well-known Barabasi-Albert (BA) model [Albert and Barab asi, 2002], and the edge probability is set as 0.1. A routing network is constructed by the Erdos-Renyi (ER) model [Erd o S and R enyi, 1959], which adds one edge with probability 0.02; the node cost is set as 0.1 and the edge cost is the random Euclidean distance between nodes.

For estimating the inﬂuence spread, i.e., the expected number of active nodes, we simulate the diffusion process 1,000 times independently and use the average as an estimation. The results in Figure 3 show that POMC performs better. Running Time. For the number T of iterations of POMC, we used en BPmax/δˆc suggested by Theorem 2. For cardinality constraints, T is 2e B2n, since Pmax 2B (as in the proof of Theorem 3) and δˆc = 1; Greedy takes the time in the order of Bn. Since 2e B2n is the theoretical upper bound (i.e., a worst case) for POMC being good, we empirically examine how effective POMC is in practice. By selecting Greedy as the baseline, we plot the curve of the entropy over the running time for POMC on the Berkeley data with B = 10, as shown in Figure 4(a). The x-axis is in Bn, the running time of Greedy. We can observe that POMC takes about only 7% (4/55) of the worst-case time to achieve a better performance. This implies that POMC can be efﬁcient in practice. For routing constraints, we set Pmax = n: when the number of solutions in P exceeds n, we delete the solution with the smallest ratio of f and ˆc except the best feasible one. Since at least a node cost 0.1 is increased, δˆc 0.1. We thus used T = 10e Bn2 for POMC. The time of Greedy is in the order of n2. From Figure 4(b), we can observe that POMC also quickly reaches a better performance.

7 Conclusion

In this paper, we study the problem of maximizing monotone functions with monotone cost constraints. We ﬁrst extend the previous analysis to show the approximation ratio of the generalized greedy algorithm. We then propose a new algorithm POMC, and prove that POMC can obtain the same general approximation ratio as the generalized greedy algorithm, but can have a better ability of avoiding local optima. The superior performance of POMC is also empirically veriﬁed.

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

References [Albert and Barab asi, 2002] R. Albert and A.-L. Barab asi. Statistical mechanics of complex networks. Reviews of Modern Physics, 74(1):47, 2002. [Barbieri et al., 2012] N. Barbieri, F. Bonchi, and G. Manco. Topic-aware social inﬂuence propagation models. In Proceedings of the 12th IEEE International Conference on Data Mining (ICDM 12), pages 81 90, Brussels, Belgium, 2012. [Conforti and Cornu ejols, 1984] M. Conforti and G. Cornu ejols. Submodular set functions, matroids and the greedy algorithm: Tight worst-case bounds and some generalizations of the Rado-Edmonds theorem. Discrete Applied Mathematics, 7(3):251 274, 1984. [Das and Kempe, 2011] A. Das and D. Kempe. Submodular meets spectral: Greedy algorithms for subset selection, sparse approximation and dictionary selection. In Proceedings of the 28th International Conference on Machine Learning (ICML 11), pages 1057 1064, Bellevue, WA, 2011. [Doerr et al., 2012] B. Doerr, D. Johannsen, and C. Winzen. Multiplicative drift analysis. Algorithmica, 64(4):673 697, 2012. [Erd o S and R enyi, 1959] P. Erd o S and A. R enyi. On random graphs I. Publ. Math. Debrecen, 6:290 297, 1959. [Herer, 1999] Y. Herer. Submodularity and the traveling salesman problem. European Journal of Operational Research, 114(3):489 508, 1999. [Hogg and Lerman, 2012] T. Hogg and K. Lerman. Social dynamics of digg. EPJ Data Science, 1(5):1 26, 2012. [Iyer and Bilmes, 2013] R. Iyer and J. Bilmes. Submodular optimization with submodular cover and submodular knapsack constraints. In Advances in Neural Information Processing Systems 26 (NIPS 13), pages 2436 2444, Lake Tahoe, NV, 2013. [Iyer et al., 2013] R. Iyer, S. Jegelka, and J. Bilmes. Curvature and optimal algorithms for learning and minimizing submodular functions. In Advances in Neural Information Processing Systems 26 (NIPS 13), pages 2742 2750, Lake Tahoe, NV, 2013. [Kempe et al., 2003] D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of inﬂuence through a social network. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 03), pages 137 146, Washington, DC, 2003. [Khuller et al., 1999] S. Khuller, A. Moss, and J. Naor. The budgeted maximum coverage problem. Information Processing Letters, 70(1):39 45, 1999. [Krause and Guestrin, 2005] A. Krause and C. Guestrin. A note on the budgeted maximization of submodular functions. Technical Report No. CMU-CALD-05-103, 2005. [Krause et al., 2008] A. Krause, A. Singh, and C. Guestrin. Near-optimal sensor placements in Gaussian processes: Theory, efﬁcient algorithms and empirical studies. Journal of Machine Learning Research, 9:235 284, 2008.

[Lin and Bilmes, 2010] H. Lin and J. Bilmes. Multidocument summarization via budgeted maximization of submodular functions. In Proceedings of the 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 10), pages 912 920, Los Angeles, CA, 2010. [Nemhauser and Wolsey, 1978] G. L. Nemhauser and L. A. Wolsey. Best algorithms for approximating the maximum of a submodular set function. Mathematics of Operations Research, 3(3):177 188, 1978. [Nemhauser et al., 1978] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions I. Mathematical Programming, 14(1):265 294, 1978. [Qian et al., 2015] C. Qian, Y. Yu, and Z.-H. Zhou. Subset selection by Pareto optimization. In Advances in Neural Information Processing Systems 28 (NIPS 15), pages 1765 1773, Montreal, Canada, 2015. [Qian et al., 2016] C. Qian, J.-C. Shi, Y. Yu, K. Tang, and Z.-H. Zhou. Parallel Pareto optimization for subset selection. In Proceedings of the 25th International Joint Conference on Artiﬁcial Intelligence (IJCAI 16), pages 1939 1945, New York, NY, 2016. [Rosenkrantz et al., 1977] D. J. Rosenkrantz, R. E. Stearns, and P. M. Lewis, II. An analysis of several heuristics for the traveling salesman problem. SIAM Journal on Computing, 6(3):563 581, 1977. [Vondr ak, 2010] J. Vondr ak. Submodularity and curvature: The optimal algorithm. RIMS Kokyuroku Bessatsu B, 23:253 266, 2010. [Zhang and Vorobeychik, 2016] H. Zhang and Y. Vorobeychik. Submodular optimization with routing constraints. In Proceedings of the 30th AAAI Conference on Artiﬁcial Intelligence (AAAI 16), pages 819 826, Phoenix, AZ, 2016. [Zheng et al., 2013] Y. Zheng, F. Liu, and H.-P. Hsieh. Uair: When urban air quality inference meets big data. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 13), pages 1436 1444, Chicago, IL, 2013.

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)