# nonmonotone_drsubmodular_function_maximization__eeb03ed5.pdf

Non-Monotone DR-Submodular Function Maximization

Tasuku Soma The University of Tokyo tasuku soma@mist.i.u-tokyo.ac.jp

Yuichi Yoshida National Institute of Informatics, and Preferred Infrastructure, Inc. yyoshida@nii.ac.jp

We consider non-monotone DR-submodular function maximization, where DR-submodularity (diminishing return submodularity) is an extension of submodularity for functions over the integer lattice based on the concept of the diminishing return property. Maximizing non-monotone DRsubmodular functions has many applications in machine learning that cannot be captured by submodular set functions. In this paper, we present a 1 2+ϵ-approximation algorithm with a running time of roughly O( n

ϵ log2 B), where n is the size of the ground set, B is the maximum value of a coordinate, and ϵ > 0 is a parameter. The approximation ratio is almost tight and the dependency of running time on B is exponentially smaller than the naive greedy algorithm. Experiments on synthetic and real-world datasets demonstrate that our algorithm outputs almost the best solution compared to other baseline algorithms, whereas its running time is several orders of magnitude faster.

Introduction Submodular functions have played a key role in various tasks in machine learning, statistics, social science, and economics. A set function f : 2E R with a ground set E is submodular if f(X {e}) f(X) f(Y {e}) f(Y ) for arbitrary sets X, Y E with X Y , and an element e E \ Y . The importance and usefulness of submodularity in these areas are due to the fact that submodular functions naturally capture the diminishing return property. Various important functions in these areas such as the entropy function, coverage function, and utility functions satisfy this property. See, e.g., (Krause and Golovin 2014; Fujishige 2005). Recently, maximizing (non-monotone) submodular functions has attracted particular interest in the machine learning community. In contrast to minimizing submodular functions, which can be done in polynomial time, maximizing submodular functions is NP-hard in general. However, we can achieve a constant factor approximation for various settings. Notably, (Buchbinder et al. 2012) presented a very elegant double greedy algorithm for (unconstrained) submodular function maximization, which was the ﬁrst algorithm

Copyright c 2017, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

achieving 1

2-approximation, and this approximation ratio is tight (Feige, Mirrokni, and Vondrak 2011). Applications of non-monotone submodular function maximization include efﬁcient sensor placement (Krause, Singh, and Guestrin 2008), privacy in online services (Krause and Horvitz 2008), and maximum entropy sampling (Ko, Lee, and Queyranne 1995). The models and applications mentioned so far are built upon submodular set functions. Although set functions are fairly powerful for describing problems such as variable selection, we sometimes face difﬁcult situations that cannot be cast with set functions. For example, in the budget allocation problem (Alon, Gamzu, and Tennenholtz 2012), we would like to decide how much budget should be set aside for each ad source, rather than whether we use the ad source or not. A similar issue arises when we consider models allowing multiple choices of an element in the ground set. To deal with such situations, several generalizations of submodularity have been proposed. (Soma et al. 2014) devised a general framework for maximizing monotone submodular functions on the integer lattice, and showed that the budget allocation problem and its variant fall into this framework. In their framework, functions are deﬁned over the integer lattice ZE + and therefore effectively represent discrete allocations of budget. Regarding the original motivation for the diminishing return property, one can naturally consider its generalization to the integer lattice: a function f : ZE + R satisfying

f(x + χe) f(x) f(y + χe) f(y)

for x y and e E, where χe RE is the vector with χe(e) = 1 and χe(a) = 0 for every a = e. Such functions are called diminishing return submodular (DR-submodular) functions (Soma and Yoshida 2015) or coordinate-wise concave submodular functions (Milgrom and Strulovici 2009). DR-submodular functions have found various applications in generalized sensor placement (Soma and Yoshida 2015) and (a natural special case of) the budget allocation problem (Soma et al. 2014). As a related notion, a function is said to be lattice submodular if f(x) + f(y) f(x y) + f(x y) for arbitrary x and y, where and are coordinate-wise max and min, respectively. Note that DR-submodularity is

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17)

stronger than lattice submodularity in general (see, e.g., (Soma et al. 2014)). Nevertheless, we consider the DRsubmodularity to be a natural deﬁnition of submodularity, at least for the applications mentioned so far, because the diminishing return property is crucial in these real-world scenarios.

Our contributions

We design a novel polynomial-time approximation algorithm for maximizing (non-monotone) DR-submodular functions. More precisely, we consider the optimization problem

maximize f(x) subject to 0 x B, (1)

where f : ZE + R+ is a non-negative DR-submodular function, 0 is the zero vector, and B ZE + is a vector representing the maximum values for each coordinate. When B is the all-ones vector, this is equivalent to the original (unconstrained) submodular function maximization. We assume that f is given as an evaluation oracle; when we specify x ZE +, the oracle returns the value of f(x). Our algorithm achieves 1 2+ϵ-approximation for any con-

stant ϵ > 0 in O( |E|

δ ) log B (θ+log B)) time, where δ and Δ are the minimum positive marginal gain and maximum positive values, respectively, of f, B = B := maxe E B(e), and θ is the running time of evaluating (the oracle for) f. To our knowledge, this is the ﬁrst polynomialtime algorithm achieving (roughly) 1

2-approximation. We also conduct numerical experiments on the revenue maximization problem using real-world networks. The experimental results show that the solution quality of our algorithm is comparable to other algorithms. Furthermore, our algorithm runs several orders of magnitude faster than other algorithms when B is large. DR-submodularity is necessary for obtaining polynomialtime algorithms with a meaningful approximation guarantee; if f is only lattice submodular, then we cannot obtain constant-approximation in polynomial time. To see this, it sufﬁces to observe that an arbitrary univariate function is lattice submodular, and therefore ﬁnding an (approximate) maximum value must invoke O(B) queries. We note that representing an integer B requires log2 B bits. Hence, the running time of O(B) is pseudopolynomial rather than polynomial.

Fast simulation of the double greedy algorithm

Naturally, one can reduce the problem (1) to maximization of a submodular set function by simply duplicating each element e in the ground set into B(e) distinct copies and deﬁning a set function over the set of all the copies. One can then run the double greedy algorithm (Buchbinder et al. 2012) to obtain 1

2-approximation. This reduction is simple but has one large drawback; the size of the new ground set is

e E B(e), a pseudopolynomial in B. Therefore, this naive double greedy algorithm does not scale to a situation where B is large.

For scalability, we need an additional trick that reduces the pseudo-polynomial running time to a polynomial one. In monotone submodular function maximization on the integer lattice, (Soma and Yoshida 2015; 2016) provide such a speedup trick, which effectively combines the decreasing threshold technique (Badanidiyuru and Vondr ak 2014) with binary search. However, a similar technique does not apply to our setting, because the double greedy algorithm works differently from (single) greedy algorithms for monotone submodular function maximization. The double greedy algorithm examines each element in a ﬁxed order and marginal gains are used to decide whether to include the element or not. In contrast, the greedy algorithm chooses each element in decreasing order of marginal gains, and this property is crucial for the decreasing threshold technique. We resolve this issue by splitting the set of all marginal gains into polynomially many small intervals. For each interval, we approximately execute multiple steps of the double greedy algorithm at once, as long as the marginal gains remain in the interval. Because the marginal gains do not change (much) within the interval, this simulation can be done with polynomially many queries and polynomial-time overhead. To our knowledge, this speedup technique is not known in the literature and is therefore of more general interest. Very recently, (Ene and Nguyen 2016) pointed out that a DR-submodular function f : {0, 1, . . . , B}E R+ can be expressed as a submodular set function g over a polynomial-sized ground set, which turns out to be E {0, 1, . . . , k 1}, where k = log2(B + 1) . Their idea is representing x(e) in binary form for each e E, and bits in the binary representations form the new ground set. We may want to apply the double greedy algorithm to g in order to get a polynomial-time approximation algorithm. However, this strategy has the following two drawbacks: (i) The value of g(E {0, 1, . . . , k 1}) is deﬁned as f(x), where x(e) = 2k 1 for every e E. This means that we have to extend the domain of f. (ii) More crucially, the double greedy algorithm on g may return a large set such as E {0, 1, . . . , k 1} whose corresponding vector x ZE + may violate the constraint x B. Although we can resolve these issues by introducing a knapsack constraint, it is not a practical solution because existing algorithms for knapsack constraints (Lee et al. 2009; Chekuri, Vondr ak, and Zenklusen 2014) are slow and have worse approximation ratios than 1/2.

Notations For an integer n N, [n] denotes the set {1, . . . , n}. For vectors x, y ZE, we deﬁne f(x | y) := f(x + y) f(y). The ℓ1-norm and ℓ -norm of a vector x ZE are deﬁned as x 1 :=

e E|x(e)| and x := maxe E|x(e)|, respectively.

Related work

As mentioned above, there have been many efforts to maximize submodular functions on the integer lattice. Perhaps the work most related to our interest is (Gottschalk and Peis 2015), in which the authors considered maximizing

lattice submodular functions over the bounded integer lattice and designed 1

3-approximation pseudopolynomial-time algorithm. Their algorithm was also based on the double greedy algorithm, but does not include a speeding up technique, as proposed in this paper. In addition there are several studies on the constrained maximization of submodular functions (Feige, Mirrokni, and Vondrak 2011; Buchbinder et al. 2014; Buchbinder and Feldman 2016), although we focus on the unconstrained case. Many algorithms for maximizing submodular functions are randomized, but a very recent work (Buchbinder and Feldman 2016) devised a derandomized version of the double greedy algorithm. (Gotovos, Karbasi, and Krause 2015) considered maximizing non-monotone submodular functions in the adaptive setting, a concept introduced in (Golovin and Krause 2011). A continuous analogue of DR-submodular functions is considered in (Bian et al. 2016).

Algorithms In this section, we present a polynomial-time approximation algorithm for maximizing (non-monotone) DR-submodular functions. We ﬁrst explain a simple adaption of the double greedy algorithm for (set) submodular functions to our setting, which runs in pseudopolynomial time. Then, we show how to achieve a polynomial number of oracle calls. Finally, we provide an algorithm with a polynomial running time (details are placed in the full version).

Pseudopolynomial-time algorithm Algorithm 1 is an immediate extension of the double greedy algorithm for maximizing submodular (set) functions (Buchbinder et al. 2012) to our setting. We start with x = 0 and y = B, and then for each e E, we tighten the gap between x(e) and y(e) until they become exactly the same. Let α = f(χe | x) and β = f( χe | y). We note that

α + β = f(x + χe) f(x) (f(y) f(y χe)) 0

holds from the DR-submodularity of f. Hence, if β < 0, then α > 0 must hold, and we increase x(e) by one. Similarly, if α < 0, then β > 0 must hold, and we decrease y(e) by one. When both of them are non-negative, we increase x(e) by one with probability α α+β , or decrease y(e) by one

with the complement probability β α+β .

Theorem 1. Algorithm 1 is a 1

2-approximation algorithm for (1) with time complexity O( B 1 θ + B 1), where θ is the running time of evaluating f. We omit the proof as it is a simple modiﬁcation of the analysis of the original algorithm.

Algorithm with polynomially many oracle calls In this section, we present an algorithm with a polynomial number of oracle calls. Our strategy is to simulate Algorithm 1 without evaluating the input function f many times. A key observation is that, at Line 4 of Algorithm 1, we do not need to know the exact

Algorithm 1 Pseudopolynomial-time algorithm

Input: f : ZE + R+, B ZE +. Assumption: f is DR-submodular. 1: x 0, y B. 2: for e E do 3: while x(e) < y(e) do 4: α f(χe | x) and β f( χe | y). 5: if β < 0 then 6: x(e) x(e) + 1. 7: else if α < 0 then 8: y(e) y(e) 1. 9: else 10: x(e) x(e) + 1 with probability α α+β and y(e) y(e) 1 with the complement probability β α+β . If α = β = 0, we assume α α+β = 1. 11: end if 12: end while 13: end for 14: return x.

value of f(χe | x) and f( χe | y); good approximations to them are sufﬁcient to achieve an approximation guarantee close to 1

2. To exploit this observation, we ﬁrst design an algorithm that outputs (sketches of) approximations to the functions g(b) := f(χe | x + bχe) and h(b) := f( χe | y bχe). Note that g and h are non-increasing functions in b because of the DR-submodularity of f. To illustrate this idea, let us consider a non-increasing function φ : {0, 1, . . . , B 1} R and suppose that φ is non-negative (φ is either g or h later on). Let δ and Δ be the minimum and the maximum positive values of φ, respectively. Then, for each δ τ Δ of the form δ(1 + ϵ)k, we ﬁnd the minimum bτ such that φ(bτ) < τ (we regard φ(B) = ). From the non-increasing property of φ, we then have φ(b) τ for any b < bτ. Using the set of pairs {(τ, bτ)}τ, we can obtain a good approximation to φ. The details are provided in Algorithm 2.

Lemma 2. For any φ : {0, 1, . . . , B 1} R and ϵ > 0, Algorithm 2 outputs a set of pairs {(bτ, τ)}τ from which, for any b {0, 1, . . . , B 1}, we can reconstruct a value v in O(log B) time such that v φ(b) < (1 + ϵ)v if φ(b) > 0 and v = 0 otherwise. The time complexity of Algorithm 2 is O( 1

δ ) log B θ) if φ has a positive value, where δ and Δ are the minimum and maximum positive values of φ, respectively, and θ is the running time of evaluating φ, and is O(log B θ) otherwise.

Proof. Let S = {(bτ, τ)}τ be the set of pairs output by Algorithm 2. Our reconstruction algorithm is as follows: Given b {0, 1, . . . , B 1}, let (bτ , τ ) be the pair with the minimum bτ , where b < bτ . Note that such a bτ always exists because a pair of the form (B, ) is always added to S. We then output τ . The time complexity of this reconstruction algorithm is clearly O(log B). We now show the correctness of the reconstruction algorithm. If φ(b) > 0, then, in particular, we have φ(b) δ. Then, τ is the maximum value of the form δ(1+ϵ)k at most

Algorithm 2 Sketching subroutine for Algorithm 3

Input: φ : {0, 1, . . . , B 1} R, ϵ > 0. Assumption: φ is non-increasing. 1: S . We regard φ(B) = . 2: Find the minimum b0 {0, 1, . . . , B} with φ(b0) 0 by binary search. 3: if b0 1 then # φ has a positive value. 4: Δ φ(0) and δ φ(b0 1). 5: for (τ δ; τ Δ; τ (1 + ϵ)τ) do 6: Find the minimum bτ {0, 1, . . . , B} with φ(bτ) < τ by binary search. 7: S S {(bτ, τ)} 8: end for 9: if bδ = B then 10: S S {(B, 0)}. 11: end if 12: else # φ is non-positive. 13: S S {(B, 0)}. 14: end if 15: return S.

φ(b). Hence, we have τ φ(b) < (1 + ϵ)τ . If φ(b) 0, (bτ , τ ) = (B, 0) and we output zero. Finally, we analyze the time complexity of Algorithm 2. Each binary search requires O(log B) time. The number of binary searches performed is O(log1+ϵ Δ

δ ) when φ has a positive value and 1 when φ is non-positive. Hence, we have the desired time complexity.

We can regard Algorithm 2 as providing a value oracle for a function φ : {0, 1, . . . , B 1} R+ that is an approximation to the input function φ : {0, 1, . . . , B 1} R. We now describe our algorithm for maximizing DRsubmodular functions. The basic idea is similar to Algorithm 1, but when we need f(χe | x) and f( χe | y), we use approximations to them instead. Let α and β be approximations to f(χe | x) and f( χe | y), respectively, obtained by Algorithm 2. Then, we increase x(e) by one with probability α α+β and decrease y(e) by one with the comple-

ment probability β α+β . The details are given in Algorithm 3. We now analyze Algorithm 3. An iteration refers to an iteration in the while loop from Line 5. We have B 1 iterations in total. For k {1, . . . , B 1}, let xk and yk be x and y, respectively, right after the kth iteration. Note that x B 1 = y B 1 is the output of the algorithm. We deﬁne x0 = 0 and y0 = B for convenience. Let o be an optimal solution. For k {0, 1, . . . , B 1}, we then deﬁne ok = (o xk) yk. Note that o0 = o holds and o B 1 equals the output of the algorithm. We have the following key lemma. Lemma 3. For every k [ B 1], we have

E[f(ok 1) f(ok)]

2 E[f(xk) f(xk 1) + f(yk) f(yk 1)] (2)

Proof. Fix k [ B 1] and let e be the element of interest in the kth iteration. Let α and β be the values in Line 6 in

Algorithm 3 Algorithm with polynomially many queries

Input: f : ZE + R+, B ZE +, ϵ > 0. Assumption: f is DR-submodular. 1: x 0, y B. 2: for e E do 3: Deﬁne g, h : {0, 1, . . . , B 1} R as g(b) = f(χe | x + bχe) and h(b) = f( χe | y bχe). 4: Let g and h be approximations to g and h, respectively, given by Algorithm 2. 5: while x(e) < y(e) do 6: α g(x(e)) and β h(B(e) y(e)). 7: x(e) x(e)+1 with probability α α+β and y(e)

y(e) 1 with the complement probability β α+β . If α = β = 0, we assume α α+β = 1. 8: end while 9: end for 10: return x.

the kth iteration. We then have

E[f(xk) f(xk 1) + f(yk) f(yk 1)]

= α α + β f(χe | xk 1) + β α + β f( χe | yk 1)

α α + β α + β α + β β = α2 + β2

α + β , (3)

where we use the guarantee in Lemma 2 in the inequality. We next establish an upper bound of E[f(ok 1) f(ok)]. As ok = (o xk) yk, conditioned on a ﬁxed ok 1, we obtain

E[f(ok 1) f(ok)]

f(ok 1) f(ok 1 xk(e)χe)

f(ok 1) f(ok 1 yk(e)χe) . (4)

Claim 4. (4) (1+ϵ)αβ

Proof. We show this claim by considering the following three cases. If xk(e) ok 1(e) yk(e), then (4) is zero. If ok 1(e) < xk(e), then ok(e) = ok 1(e) + 1, and the ﬁrst term of (4) is

f(ok 1) f(ok 1 xk(e)χe) = f(ok 1) f(ok 1 + χe) f(yk 1 χe) f(yk 1) = f( χe | yk 1) (1 + ϵ)β.

Here, the ﬁrst inequality uses the DR-submodularity of f and the fact that ok 1 yk 1 χe, and the second inequality uses the guarantee in Lemma 2. The second term of (4) is zero, and hence we have (4) (1+ϵ)αβ

α+β . If yk(e) < ok 1(e), then by a similar argument, we have (4) (1+ϵ)αβ

We now return to proving Lemma 3. By Claim 4,

(4) (1 + ϵ)αβ

α + β 1 + ϵ

which indicates the desired result.

Theorem 5. Algorithm 3 is a 1 2+ϵ-approximation algorithm

for (1) with time complexity O( |E|

δ ) log B θ + B 1 log B ), where δ and Δ are the minimum positive marginal gain and the maximum positive value, respectively, of f and θ is the running time of evaluating f.

Proof. Summing up (2) for k [ B 1], we get

k=1 E[f(ok 1) f(ok)]

k=1 E[f(xk) f(xk 1) + f(yk) f(yk 1)].

The above sum is telescopic, and hence we obtain

E[f(o0) f(o B 1)]

2 E[f(x B 1) f(x0) + f(y B 1) f(y0)]

2 E[f(x B 1) + f(y B 1)]

= (1 + ϵ) E[f(x B 1)].

The second inequality uses the fact that f is non-negative, and the last equality uses y B 1 = x B 1. Because E[f(o0) f(o B 1)] = f(o) E[f(x B 1)], we obtain that E[f(x B 1)] 1 2+ϵf(o). We now analyze the time complexity. We only query the input function f inside of Algorithm 2, and the number of oracle calls is O( |E|

δ ) log B) by Lemma 2. Note that we invoke Algorithm 2 with g and h, and the minimum positive values of g and h are at least the minimum positive marginal gain δ of f. The number of iterations is B 1, and we need O(log B) time to access g and h. Hence, the total time complexity is as stated.

Remark 6. We note that even if f is not a non-negative function, the proof of Theorem 5 works as long as f(x0) 0 and f(y0) 0, that is, f(0) 0 and f(B) 0. Hence, given a DR-submodular function f : ZE + R and B ZE +, we can obtain a 1 2+ϵ-approximation algorithm for the following problem:

maximize f(x) min{f(0), f(B)} subject to 0 x B, (5)

This observation is useful, as the objective function often takes negative values in real-world applications.

Polynomial-time algorithm In many applications, the running time needed to evaluate the input function is a bottleneck, and hence Algorithm 3 is already satisfactory. However, it is theoretically interesting to reduce the total running time to a polynomial, and we show the following. The proof is deferred to the full version.

Theorem 7. There exists a 1 2+2ϵ-approximation algo-

rithm with time complexity O( |E|

δ ) log B (θ + log B )), where δ and Δ are the minimum positive marginal gain and the maximum positive value, respectively of f and θ is the running time of evaluating f. Here O(T) means O(T logc T) for some c N.

Experiments In this section, we show our experimental results and the superiority of our algorithm with respect to other baseline algorithms.

Experimental setting We conducted experiments on a Linux server with an Intel Xeon E5-2690 (2.90 GHz) processor and 256 GB of main memory. All the algorithms were implemented in C# and were run using Mono 4.2.3. We compared the following four algorithms:

Single Greedy (SG): We start with x = 0. For each element e E, as long as the marginal gain of adding χe to the current solution x is positive, we add it to x. The reason that we do not choose the element with the maximum marginal gain is to reduce the number of oracle calls, and our preliminary experiments showed that such a tweak does not improve the solution quality. Double Greedy (DG, Algorithm 1). Lattice Double Greedy (Lattice-DG): The 1/3approximation algorithm for maximizing non-monotone lattice submodular functions (Gottschalk and Peis 2015). Double Greedy with a polynomial number of oracle calls with error parameter ϵ > 0 (Fast-DGϵ, Algorithm 3).

We measure the efﬁciency of an algorithm by the number of oracle calls instead of the total running time. Indeed, the running time for evaluating the input function is the dominant factor of the total running, because objective functions in typical machine learning tasks contain sums over all data points, which is time consuming. Therefore, we do not consider the polynomial-time algorithm (Theorem 7) here.

Revenue maximization In this application, we consider revenue maximization on an (undirected) social network G = (V, W), where W = (wij)i,j V represents the weights of edges. The goal is to offer for free or advertise a product to users so that the revenue increases through their word-of-mouth effect on others. If we invest x units of cost on a user i V , the user becomes an advocate of the product (independently from other users) with probability 1 (1 p)x, where p (0, 1) is a parameter. This means that, for investing a unit cost to i, we have an extra chance that the user i becomes an advocate

102 103 104 105 106

Number of oracle calls

SG DG Lattice-DG Fast-DG0.5 Fast-DG0.05 Fast-DG0.005

(a) Adolescent health

102 103 104 105 106

Number of oracle calls

SG DG Lattice-DG Fast-DG0.5 Fast-DG0.05 Fast-DG0.005

(b) Advogato

102 103 104 105 106

Number of oracle calls

SG DG Lattice-DG Fast-DG0.5 Fast-DG0.05 Fast-DG0.005

(c) Twitter lists

Figure 1: Number of oracle calls

Table 1: Objective values (Our methods are highlighted in gray.)

Adolescent health B = 102 103 104 105 106

SG 280.55 2452.16 7093.73 7331.42 7331.50 DG 280.55 2452.16 7124.90 7332.96 7331.50 Lattice-DG 215.39 1699.66 6808.97 6709.11 5734.30 Fast-DG0.5 280.55 2452.16 7101.14 7331.36 7331.48 Fast-DG0.05 280.55 2452.16 7100.86 7331.36 7331.48 Fast-DG0.005 280.55 2452.16 7100.83 7331.36 7331.48 Advogato B = 102 103 104 105 106

SG 993.15 8680.87 25516.05 27325.78 27326.01 DG 993.15 8680.87 25330.91 27329.39 27326.01 Lattice-DG 753.93 6123.39 24289.09 24878.94 21674.35 Fast-DG0.5 993.15 8680.87 25520.83 27325.75 27325.98 Fast-DG0.05 993.15 8680.87 25520.52 27325.75 27325.98 Fast-DG0.005 993.15 8680.87 25520.47 27325.75 27325.98 Twitter lists B = 102 103 104 105 106

SG 882.43 7713.07 22452.61 25743.26 25744.02 DG 882.43 7713.07 22455.97 25751.42 25744.02 Lattice-DG 675.67 5263.87 20918.89 20847.48 15001.19 Fast-DG0.5 882.43 7713.07 22664.65 25743.06 25743.88 Fast-DG0.05 882.43 7713.07 22658.58 25743.06 25743.88 Fast-DG0.005 882.43 7713.07 22658.07 25743.06 25743.88

with probability p. Let S V be a set of users who advocate the product. Note that S is a random set. Following a simpliﬁed version of the model introduced by (Hartline, Mirrokni, and Sundararajan 2008), the revenue is deﬁned as i S

j V \S wij. Let f : ZE + R be the expected revenue obtained in this model, that is,

j V \S wij(1 (1 p)x(i))(1 p)x(j).

It is not hard to show that f is non-monotone DRsubmodular function (see the full version for the proof). In our experiment, we used three networks, Adolescent health (2,539 vertices and 12,969 edges), Advogato (6,541 vertices and 61,127 edges), and Twitter lists (23,370 vertices and 33,101 edges), all taken from (Kunegis 2013). We regard all the networks as undirected. We set p = 0.0001, and set wij = 1 when an edge exists between i and j and wij = 0

otherwise. We imposed the constraint 0 x(e) B for every e E, where B is chosen from {102, . . . , 106}. Table 1 shows the objective values obtained by each method. As can be seen, except for Lattice-DG, which is clearly the worst, the choice of a method does not much affect the obtained objective value for all the networks. Notably, even when ϵ is as large as 0.5, the objective values obtained by Fast-DG are almost the same as SG and DG. Figure 1 illustrates the number of oracle calls of each method. The number of oracle calls of DG and Lattice-DG is linear in B, whereas that of Fast-DG slowly grows. Although the number of oracle calls of SG also slowly grows, it is always orders of magnitude larger than that of Fast-DG with ϵ = 0.5 or ϵ = 0.05. In summary, Fast-DG0.5 achieves almost the best objective value, whereas the number of oracle calls is two or three orders of magnitude smaller than those of the other methods when B is large.

Conclusions

In this paper, we proposed a polynomial-time 1 2+ϵapproximation algorithm for non-monotone DR-submodular function maximization. Our experimental results on the revenu maximization problem showed the superiority of our method against other baseline algorithms. Maximizing a submodular set function under constraints is well studied (Lee et al. 2009; Gupta et al. 2010; Chekuri, Vondr ak, and Zenklusen 2014; Mirzasoleiman et al. 2016). An intriguing open question is whether we can obtain polynomial-time algorithms for maximizing DRsubmodular functions under constraints such as cardinality constraints, polymatroid constraints, and knapsack constraints.

Acknowledgments T. S. is supported by JSPS Grantin-Aid for Research Activity Start-up. Y. Y. is supported by JSPS Grant-in-Aid for Young Scientists (B) (No. 26730009), MEXT Grant-in-Aid for Scientiﬁc Research on Innovative Areas (No. 24106003), and JST, ERATO, Kawarabayashi Large Graph Project.

References Alon, N.; Gamzu, I.; and Tennenholtz, M. 2012. Optimizing budget allocation among channels and inﬂuencers. In WWW, 381 388. Badanidiyuru, A., and Vondr ak, J. 2014. Fast algorithms for maximizing submodular functions. In SODA, 1497 1514. Bian, Y.; Mirzasoleiman, B.; Buhmann, J. M.; and Krause, A. 2016. Guaranteed non-convex optimization: Submodular maximization over continuous domains. Co RR abs/1606.05615. Buchbinder, N., and Feldman, M. 2016. Deterministic algorithms for submodular maximization problems. In SODA, 392 403. Buchbinder, N.; Feldman, M.; Naor, J. S.; and Schwartz, R. 2012. A tight linear time (1/2)-approximation for unconstrained submodular maximization. In FOCS, 649 658. Buchbinder, N.; Feldman, M.; Naor, J. S.; and Schwartz, R. 2014. Submodular maximization with cardinality constraints. In SODA, 1433 1452. Chekuri, C.; Vondr ak, J.; and Zenklusen, R. 2014. Submodular function maximization via the multilinear relaxation and contention resolution schemes. SIAM Journal on Computing 43(6):1831 1879. Ene, A., and Nguyen, H. L. 2016. A reduction for optimizing lattice submodular functions with diminishing returns. Co RR abs/1606.08362. Feige, U.; Mirrokni, V. S.; and Vondrak, J. 2011. Maximizing non-monotone submodular functions. SIAM J. on Comput. 40(4):1133 1153. Fujishige, S. 2005. Submodular Functions and Optimization. Elsevier, 2nd edition. Golovin, D., and Krause, A. 2011. Adaptive submodularity: Theory and applications in active learning and stochastic optimization. Journal of Artiﬁcial Intelligence Research 427 486. Gotovos, A.; Karbasi, A.; and Krause, A. 2015. Nonmonotone adaptive submodular maximization. In AAAI, 1996 2003. Gottschalk, C., and Peis, B. 2015. Submodular function maximization on the bounded integer lattice. In WAOA, 133 144. Gupta, A.; Roth, A.; Schoenebeck, G.; and Talwar, K. 2010. Constrained non-monotone submodular maximization: ofﬂine and secretary algorithms. In WINE, 246 257. Hartline, J.; Mirrokni, V.; and Sundararajan, M. 2008. Optimal marketing strategies over social networks. In WWW, 189 198. Ko, C. W.; Lee, J.; and Queyranne, M. 1995. An exact algorithm for maximum entropy sampling. Operations Research 684 691. Krause, A., and Golovin, D. 2014. Submodular function maximization. In Tractability: Practical Approaches to Hard Problems. Cambridge University Press. 71 104. Krause, A., and Horvitz, E. 2008. A utility-theoretic approach to privacy and personalization. In AAAI, 1181 1188.

Krause, A.; Singh, A.; and Guestrin, C. 2008. Near-optimal sensor placements in gaussian processes: theory, efﬁcient algorithms and empirical studies. The Journal of Machine Learning Research 9:235 284. Kunegis, J. 2013. Konect - the koblenz network collection. In WWW Companion, 1343 1350. Lee, J.; Mirrokni, V. S.; Nagarajan, V.; and Sviridenko, M. 2009. Non-monotone submodular maximization under matroid and knapsack constraints. In STOC, 323 332. Milgrom, P., and Strulovici, B. 2009. Substitute goods, auctions, and equilibrium. Journal of Economic Theory 144(1):212 247. Mirzasoleiman, B.; CH, E.; Karbasi, A.; and EDU, Y. 2016. Fast constrained submodular maximization: Personalized data summarization. In ICLM 16: Proceedings of the 33rd International Conference on Machine Learning (ICML). Soma, T., and Yoshida, Y. 2015. A generalization of submodular cover via the diminishing return property on the integer lattice. In NIPS, 847 855. Soma, T., and Yoshida, Y. 2016. Maximizing submodular functions with the diminishing return property over the integer lattice. In IPCO, 325 336. Soma, T.; Kakimura, N.; Inaba, K.; and Kawarabayashi, K. 2014. Optimal budget allocation: Theoretical guarantee and efﬁcient algorithm. In ICML, 351 359.