# optimal_anonymous_independent_reward_scheme_design__acb79f1e.pdf

Optimal Anonymous Independent Reward Scheme Design

Mengjing Chen1 , Pingzhong Tang1 , Zihe Wang2,3 , Shenke Xiao1 and Xiwang Yang4

1Tsinghua University 2Renmin University of China 3Beijing Key Laboratory of Big Data Management and Analysis Methods 4Bytedance ccchmj@qq.com, kenshinping@gmail.com, wang.zihe@ruc.edu.cn, xsk15@tsinghua.org.cn, yangxiwang@bytedance.com

We consider designing reward schemes that incentivize agents to create high-quality content (e.g., videos, images, text, ideas). The problem is at the center of a real-world application where the goal is to optimize the overall quality of generated content on user-generated content platforms. We focus on anonymous independent reward schemes (AIRS) that only take the quality of an agent s content as input. We prove the general problem is NP-hard. If the cost function is convex, we show the optimal AIRS can be formulated as a convex optimization problem and propose an efﬁcient algorithm to solve it. Next, we explore the optimal linear reward scheme and prove it has a 1

2-approximation ratio, and the ratio is tight. Lastly, we show the proportional scheme can be arbitrarily bad compared to AIRS.

1 Introduction

User-generated content (UGC) platforms have become a major source for users to acquire information, knowledge, and entertainment. Representative platforms are video-sharing platform You Tube, question-and-answer platform Quora, online encyclopedia Wikipedia and lifestyle platforms Instagram and Tik Tok. According to several measurements [Luca, 2015; Zhang and Sarvary, 2014; Park et al., 2014], the UGC industry is just as important, if not more, as the search engine industry. These platforms are fundamentally different from search engines. The content returned by the former is generated by content providers (agents), while the latter returns mainly from professional authorities (at least for the ﬁrst few pages). On the one hand, user-generated content presents diversity that is vital for the success of these platforms; on the other hand, the platforms face challenges in maintaining an overall high quality of the massive content generated. From an economic perspective, the platforms are required to design an incentive scheme that rewards high-quality content, given a restricted budget [Ghosh and Hummel, 2013; Jain et al., 2014]. This is the central theoretical problem investigated in this paper.

corresponding author

The center designs a reward scheme, namely a reward function that maps a proﬁle of agents content (each with a real-valued quality score) to a reward proﬁle. The goal is to maximize the overall quality of content on the platform with a ﬁxed budget. The proportional scheme where each agent gets the reward is proportional to her contribution has been widely studied by many researchers [Xia et al., 2014; Ghosh and Hummel, 2014; Ghosh and Mc Afee, 2011; Tullock, 1980]. The proportional scheme has been proven easy to reach Nash equilibria among agents in the full information setting [Xia et al., 2014]. In this paper, we will focus on designing anonymous independent reward schemes (AIRS) in which the reward only depends on the quality of the individual content. Unlike the proportional mechanism, it does not depend on the quality of other agents content either. The merit of the independent reward model is that it is easier for an agent to compute her best strategy. For the sake of fairness, we restrict the anonymity of the reward function, which is required by almost all realworld UGC platforms. In our model, the agents have cost functions that are convex in their content quality to capture the idea that the additional cost to improve the marginal quality becomes heavier [Ghosh and Mc Afee, 2011; Ghosh and Hummel, 2014].1 Speciﬁcally, each agent has a type t indicating the ability, and she costs c(x) h(t) to produce content with quality x. When the type t is higher, function h(t) is lower. The type of users can be regarded as the skills of video clips or the design of the content. In this paper, we assume the quality of content entirely depends on the producer. We consider the Bayesian setting where each agent s ability is private information, and others only know its distribution. In our model, the UGC platform has a limited budget to reward, such as money or web trafﬁc. Given a simple reward function, an agent can choose how much effort to make to generate her content to maximize utility. Due to the incomplete information of each agent s ability, we assume the budget constraint is only required to be satisﬁed in expectation. The problem faced by the designer is then to optimize the sum of all content s quality by designing anonymous independent reward functions, subject to a ﬁxed budget. Our model can be

1The previous version considered the linear cost setting [Chen et al., 2019].

Proceedings of the Thirty-First International Joint Conference on Artiﬁcial Intelligence (IJCAI-22)

applied when a central organization needs to incentivize the community to accomplish some tasks. We model the problem of ﬁnding an optimal AIRS as an optimization problem in Section 2. In Section 3 we show that the optimization problem is equivalent to a convex optimization problem. Based on the analysis of the structure of the solution, we propose an efﬁcient algorithm in Section 4. In Section 5, we claim the assumption of the cost function is somehow necessary since it is NP-hard to compute the optimal AIRS for general cost functions. In Section 6, we explore the linear reward scheme, which is a special case of the AIRS. We show the optimal linear reward scheme is a 2-approximation to the optimal AIRS and the ratio is tight. In Section 7 we exhibit the superiority of AIRS over other reward schemes. When agents are independent and identically distributed, the optimal AIRS beats any reward scheme implemented in a symmetric Bayes-Nash equilibrium. For the proportional reward scheme, we show it cannot be better than the optimal AIRS in the full information setting. It could be arbitrarily bad compared to the optimal AIRS.

1.1 Related Work

Our work contributes to the body of literature on pricing problem [O Neill et al., 2005; Bjørndal and J ornsten, 2008; Azizan et al., 2020]. Azizan et al. [2020] consider the same setting but focus on dealing with agents non-convex cost functions in an approximation way. We focus on the optimal solution in the convex cost setting. They allow different reward functions for different agents while we do not. Our problem of designing optimal reward schemes is related to the principal-agent problem in contract theory. The principal designs compensation schemes (contracts) which incentivize agents to choose actions that maximize the principal s utility [Holmstrom, 1982; Babaioff et al., 2006; Dutting et al., 2021]. The contracts they considered are different among agents while we design the common reward schemes. Nonetheless, there also exist researches on common contract design for multiple agents [Alon et al., 2020; Xiao et al., 2020]. The signiﬁcant differences between our model and the principal-agent model are: (1) the action spaces are usually ﬁnite and discrete in the principal-agent problem while we consider continuous action spaces, (2) we have a budget constraint in our model while principal-agent model does not have that. Another line of literature is crowdsourcing, where individuals or organizations seek ideas or collect information from a large group of participants. The objective is to obtain ideas or solutions to some problems, and they only care about the best one [Chawla et al., 2019; Moldovanu and Sela, 2006]. In contrast, we are interested in the sum of the quality of the content.

2 Preliminaries

Let N = {1, 2, . . . , n} be the set of all agents. Each agent i has a type, denoted by a real number ti, which stands for the ability of the agent to produce content. For each agent, the type is private information and drawn from a set Ti with a probability mass function fi. Set Ti and function fi is

publicly known. We assume Ti is discrete and has a ﬁnite size. They are standard assumptions for compensation schemes design in the contract theory. We also deﬁne the union of type space T = S i Ti = {t1, . . . , tm} and the sum of probabilities as f(t) = Pn i=1 fi(t). W.l.o.g., we assume t1 < t2 < < tm.2 Every agent posts content (e.g., an article or a short video) on the platform. We use a non-negative number to represent the quality of the content (e.g., the expected number of views or likes) in which the platform is interested. We assume an agent can regulate the quality of her content and her action is choosing the quality. To produce content with quality x R 0, an agent with type t suffers from a cost cost(x, t). We consider cost functions in the form of cost(x, t) = c(x) h(t) where h(t) is always positive. We assume an agent with a higher type can produce content with the same quality using a lower cost, i.e., the function h(t) is decreasing in t. We assume c(x) is convex, strict increasing and c(0) = 0. The platform designs a reward scheme, which is essentially a reward function R : Rn 0 7 Rn 0 that maps a quality proﬁle of agents content to a reward proﬁle. We also deﬁne Ri : Rn 0 7 R 0 to be the reward function for agent i, i.e., R(x) = (R1(x), . . . , Rn(x)) where x = (x1, x2, . . . , xn). For agent i, we use xi R 0 to represent her action of producing content with quality xi and use x i Rn 1 0 to represent other agents actions similarly. An agent s utility is deﬁned as the reward she receives minus the cost in producing her content, i.e., agent i s utility function ui : Rn 0 Ti 7 R that maps the quality proﬁle of all agents content and her type to her utility is deﬁned as ui(xi, x i, ti) = Ri(xi, x i) c(xi)h(ti). In this paper, we analyze the problem of incentivizing highquality content in the Bayesian information setting. The platform aims to maximize the gross product which is deﬁned as the expectation of the overall quality of all content on the platform within a ﬁxed budget B. We assume the budget constraint is only required to be satisﬁed in expectation. In addition, we assume the reward scheme satisﬁes individual rationality property so that an agent will not transfer money to the platform, i.e., the reward must be non-negative. We focus on a relatively simple reward scheme called Anonymous Independent Reward Scheme in which the reward each agent receives is only determined by the quality of content she produces and independent of other agents actions. Any two agents receive the same reward if they create content with the same quality. The AIRS scheme has a simple format and can be easily understood by agents. Besides, it is anonymous thus has no price discrimination issue.

Deﬁnition 1 (Anonymous Independent Reward Scheme (AIRS)). A reward scheme is an Anonymous Independent Reward Scheme if

1. the reward each agent receives is only determined by the quality of her content, i.e., for any i, Ri(xi, x i) = Ri(xi, (x ) i) for any xi R 0, x i, (x ) i Rn 1 0 , and

2Note that in this paper, the superscript refers to the agent while the subscript refers to the order of types.

Proceedings of the Thirty-First International Joint Conference on Artiﬁcial Intelligence (IJCAI-22)

2. the reward function always assigns the same reward to any two agents if they produce content with the same quality, i.e., for any i, j, if xi = xj, Ri(xi, x i) = Rj(xj, x j) for any x i, x j Rn 1 0 .

With a slight abuse of notation, we use function R : R 0 7 R 0 to represent the reward function in an AIRS. It only takes the quality of an agent s content as the input and speciﬁes the agent s reward. We assume every agent is strategic and will produce content with the optimal quality to maximize her utility. Given the reward function, it is obvious that an agent s action only depends on her type. For convenience, we focus on an agent s type instead of her index number from now on. We formulate the problem as an optimization problem presented as below.

max Xk R 0 Gk (Xk) R

k [m] f(tk) Z

x Xk x d Gk(x) ,

s.t. R(x) c(x)h(tk) R(y) c(y)h(tk), k [m], x Xk, y 0, X

k [m] f(tk) Z

x Xk R(x) d Gk(x) B,

R(x) 0, x 0.

Here Xk represents the union of actions taken by all agents of type tk, (Xk) represents the set of all cumulative probability distributions over Xk, and Gk is one cumulative probability distribution function over Xk that represents the combined mixed strategies used by agents of type tk. We will use a triple (X, G, R) to represent a solution to this problem. The ﬁrst constraint refers to agents choosing the best actions to maximize the utilities. Let [m] represent the set {1, 2, . . . , m} throughout the paper. The second constraint refers to the budget constraint. The last constraint refers to individual rationality.

3 The Optimal AIRS Problem P1 is complicated because it involves the design of mixed strategies Gk and a reward function R which involves a huge design space. To overcome these difﬁculties, we show that, w.l.o.g., we can assume the agents are using pure strategies. In addition, we reduce the design of the entire reward function to the design of the rewards on a set of speciﬁc values. At last, we show the optimal AIRS can be found by solving a convex optimization problem. We ﬁrst state that an agent with a higher type will post content with (weakly) higher quality. Lemma 1. We pick two numbers k, l [m] and assume k < l. Given a feasible solution (X, G, R) to Problem P1, for any xk Xk and xl Xl, we have xk xl. We omit all proofs due to the lack of space, that are included in the full version of this paper. Given a reward scheme, an agent might have multiple best actions and thus can use a mixed strategy. However, we can construct a solution where agents only use pure strategies.

0 x1 x2 x3 xm 1 xm 0

Figure 1: Reward function ˆR is a step function which has at most m breakpoints.

For any solution (X, G, R) to Problem P1, we deﬁne xk = R

x Xk x d Gk(x) and Xk = {xk}. We set Gk(x) = 0 for x < xk and Gk(xk) = 1. In other words, an agent with type tk will produce content with quality xk deterministically. By Lemma 1, Xk is point-wise weakly larger than Xk 1. Then the expectation of any distribution over Xk is weakly larger than the expectation of any distribution over Xk 1, i.e., xk xk 1. Therefore it is proper to deﬁne R(x) = maxy{R(y) c(y)h(tk)} + c(xk)h(tk) for x [xk, xk+1) where xm+1 denotes inﬁnity. We have the following result. Lemma 2. Given a feasible solution (X, G, R) to Problem P1, ( X, G, R) is also a feasible solution to Problem P1 and scheme R achieves the same gross product as scheme R.

Function R is a step function. It is characterized by its break points {xk} and rewards on these points. It enables us to focus on these break points instead of the whole reward function. We continue to construct a new reward function ˆR under which every agent s best action does not change, and the reward given by the platform weakly decreases. We deﬁne ˆR(0) = 0 and x0 = 0. For x [xk, xk+1), k [m], we deﬁne ˆR(x) = ˆR(xk 1) + (c(xk) c(xk 1))h(tk). (1)

Lemma 3. Under reward function ˆR, xk is one best action for an agent with type tk. In addition, R(xk) ˆR(xk) for k [m].

Given that R is budget feasible, the above lemma tells us scheme ˆR is budget feasible. Furthermore, it sufﬁces to ﬁnd the optimal reward schemes ˆR to solve Problem P1. Figure 1 shows how function ˆR looks like. Function ˆR is fully characterized by the best action proﬁle {xk}. To simplify the problem, we rewrite ˆR and constraints in terms of functions c, h and xk. By repeatedly using the deﬁnition of ˆR in Eq. (1), we will get

ˆR(xk) = c(xk)h(tk) +

l=1 c(xl)( h(tl+1) + h(tl)). (2)

The total reward given to agents can be written as X

k [m] ˆR(xk)f(tk)

Proceedings of the Thirty-First International Joint Conference on Artiﬁcial Intelligence (IJCAI-22)

k [m] f(tk)

c(xk)h(tk) +

l=1 c(xl)( h(tl+1) + h(tl))

k [m] c(xk)

l=k f(tl) h(tk+1)

l=k+1 f(tl)

In the second equality, we deﬁne h(tm+1) = 0. For the sake of simplicity, we deﬁne αk:

l=k f(tl) h(tk+1)

l=k+1 f(tl). (3)

Since function h( ) is decreasing, we have αk > 0. At this point, we come to our ﬁrst main result. Theorem 1. P1 has the same optimal value as the following problem where x0 = 0 and αk is given in Eq. (3).

max x1,...,xm

k [m] xkf(tk),

k [m] c(xk)αk B,

xk 1 xk, k [m].

Given P2 s optimal solution {x k}, we can construct the reward scheme ˆR using Eq. (1). Problem P2 is a convex optimization problem. The objective is a linear function, and the feasible region is convex. Note that every point on the line segment connecting two feasible solutions is still in the feasible region since c(x) is convex.

4 Solution Characterization and Algorithm In this section, we characterize the optimal solution of Problem P2 by analyzing the KKT conditions and thereafter propose an efﬁcient algorithm to solve it.

4.1 Solution Characterization First note that the objective function is continuous and the feasible region is bounded and closed. Therefore, a global maximum exists. We deﬁne the Lagrangian X

k [m] xkf(tk)+λ B X

k [m] αkc(xk) + X

k [m] µk(xk xk 1).

The KKT conditions for Problem P2 are:

f(tk) λαkc (xk) + µk µk+1 = 0, k [m], (4)

k [m] αkc(xk) = 0, (5)

µk(xk xk 1) = 0, k [m], (6) X

k [m] αkc(xk) B, (7)

xk 1 xk 0, k [m], (8) λ 0, µk 0, k [m]. (9)

To handle the case where function c( ) does not have a derivative, we consider sub-derivatives such that c (x) can be any value in [ c(x), +c(x)]. In Eq. (4), we set µm+1 = 0. If we sum up Eq. (4) for all k, we get λ > 0. In addition, by Eq. (5), we deduce that there is no surplus in the budget. To ﬁnd the solution, it is important to determine the set of ks where µk = 0. We deﬁne S = {k | µk = 0, k [m + 1]}. Note that m + 1 S. There exists a unique number, denoted as q, such that {1, . . . , q} S = {q}. For k S and k = q, we deﬁne pre(k) as the predecessor element such that {pre(k), pre(k)+1, . . . , k 1} S = {pre(k)}. The ratio between the sum of f and the sum of α shows a nice structure which can help us determine the set S. We ﬁrst show a relation between the ratio and the dual variable λ. For convenience, we deﬁne avg(l, k) = Pk 1 j=l f(tj) / Pk 1 j=l αj. For k [m], we deﬁne avg(k) = max1 l<k{avg(l, k)}. For 2 k m + 1, we deﬁne γ(k) = max {l : avg(l, k) = avg(k)}. The following theorem states a strong connection between functions pre( ) and γ( ). Theorem 2. For any k S and k > q, we have pre(k) = γ(k). For γ(k) j k 1, variable xj has the same value. Furthermore, λc (xj) = avg(k). For any integer larger than 1, function γ( ) maps it to a smaller integer, so there exists an integer d such that γ(d)(m+ 1) = γ(γ(...γ(m + 1))) = 1. Thus it is convenient for us to deﬁne SB = {γ(d)(m + 1), γ(d 1)(m + 1), . . . , m + 1}. Lemma 2 tells us that for any k > q in S, we have pre(k) = γ(k). Therefore, the set S consists of every element in SB that is no less than q. The next lemma states the monotonicity between any two consecutive elements in SB. We design an O(m)-algorithm (Algorithm 1) based on the monotonicity to compute SB. Lemma 4. For any k1, k2 SB and k1 < k2, we have avg(γ(k1), k1) avg(γ(k2), k2).

Theorem 3. Algorithm 1 computes SB in O(m) time. We provide the intuition here. The input can be equivalently regarded as a sequence of ratio f(tk)

αk with weight αk. While Algorithm 1 is searching a set of break points to iron the given sequence into a non-decreasing sequence consists of avg(γ(k), k). Note that each element can be pushed into the stack at most once and be popped out at most once. So the amortized time for each element in the array is O(1). Then the total running time is O(m) as a result. Given SB, we still need q to determine S. The next lemma gives a way to determine q based on the value of λc (0). The optimal solution of Problem P2 must satisfy xm > 0 and thus q m. Then we have λc (0) λc (xm) = avg(γ(m + 1)) which is an upper bound for λc (0). Lemma 5. If λc (0) (avg(γ(k)), avg(k)] for some k SB such that γ(k) > 1, k m, we have q = γ(k). If λc (0) [0, avg(k)] for the k SB such that γ(k) = 1, we have q = γ(k). Up to now, we only make use of constraints (4),(6),(8), and (9) in KKT conditions. Given a parameter λ, if we leave alone the constraints (5) and (7), we can determine S and further

Proceedings of the Thirty-First International Joint Conference on Artiﬁcial Intelligence (IJCAI-22)

Algorithm 1 An O(m) algorithm to compute SB Input: f(ti), αi, i [m] Initilize empty stack SB, AG and WT for each k [1, m] do

αk , weight αk while SB is not empty do if AG.top > avg then

avg avg weight + AG.top WT.top

weight + WT.top weight weight + WT Pop(SB), Pop(AG) and Pop(WT) else break end if end while Push(SB, k + 1), Push(AG, avg), Push(WT, weight) end for Push(SB, 1) return SB

ﬁnd {xk} satisfying (4),(6),(8), and (9). The following lemma shows the inﬂuence of λ on the corresponding solution {xk}. Lemma 6. When the dual variable λ increases, any {xk} that satisﬁes the constraints (4),(6),(8) and (9) decreases. Lemma 6 indicates an approach to ﬁnding the true λ. Recall that we would spend all the budget in the optimal solution. For a guess of λ, we can compute the {xk} and compare the total cost P k [m] αkc(xk) and the budget B. If the total cost matches the budget B, the true λ is found (Figure 2). Suppose there is a surplus in the budget. If c ( ) is not differentiable at xk, we can tune down λ unilaterally. If c (xk) = 0, we can increase xk to a large value with λ ﬁxed. If c (xk) > 0, we can tune down λ and increase xk simultaneously. In all three cases, we can spend more money. When there is a deﬁcit in the budget, we can use similar methods to spend less money. Finally, the total cost would match the budget B. The next lemma gives a lower bound and an upper bound of λ such that we can compute any approximation of the true λ by the bisection method. Lemma 7. Assume c(y1) = B P

k αk and c(y2) = B αm , then we have

avg(γ(m + 1))

c (y2) λ avg(γ(m + 1))

We conclude this section by summarizing the algorithm to solve Problem P2.

4.2 The Algorithm We ﬁrst compute SB by Alg. 1. Then we determine the lower bound and the upper bound of λ by Lemma 7. For a given λ, we can compute q by Lemma 5 and the {xk} by Theorem 2. The true λ satisﬁes the equation P k αkc(xk) = B. Since P k αkc(xk) is decreasing in λ, the true λ can be searched by bisection method. In the last step, there are two cases for the true λ. In one case, there is a unique solution proﬁle {xk}. In

P k αkc(xk)

Figure 2: P

k αkc(xk) is decreasing in λ on [λ, λ].

the other case, the possible values of P k [m] αkc(xk) constitute a continuous interval due to the multiple solutions of xk given the value of c (xk). Actually, for any solution satisﬁes the constraint P k [m] αkc(xk) = B would be an optimal solution. Let λ and λ denote the lower bound and the upper bound of λ respectively given by Lemma 7. We give the time complexity of Algorithm 2 for certain precision in the following theorem.

Theorem 4. Suppose that λ is the optimal solution of Problem P2. Given ϵ > 0, the run time of Algorithm 2 to ﬁnd λ such that |λ λ | < ϵ|λ λ| is O(m log 1

Algorithm 2 Algorithm to solve P2 Input: f(ti), αi, i [m], B, c( ), ϵ for each k [m] do

avg(k) = max1 l<k{avg(l, k)} end for Compute SB by Algorithm 1 γ(m + 1) = max{SB/{m + 1}} λl = λ = γ(m + 1)/c (c 1( B

αm )), λh = λ = γ(m + 1)/c (c 1( B P

k [m] αm ))

while λh λl > ϵ(λ λ) do λ = (λh + λl)/2 S = {k|λc (0) avg(k), k SB} if S = SB then S = S S max{SB/S} end if k0 = 1 for each k S do xi = (c ) 1(avg(k)/λ), k0 < i k k0 = k end for if P k [m] αkc(xk) = B then break else if P k [m] αkc(xk) > B then λl = λ else λh = λ end if end while return λ, {xk}

Proceedings of the Thirty-First International Joint Conference on Artiﬁcial Intelligence (IJCAI-22)

5 NP-Hardness In the two previous sections, we consider the convex cost function and propose an efﬁcient algorithm to solve Problem P1. If we relax the speciﬁc form of c(x, t) and the convexity, the problem becomes difﬁcult to solve, even in the full information setting. We consider the following cost function.

cost(x, t) =

0, 0 x < 1, t, 1 x 1 + t, + , 1 + t < x.

We show the decision version of the reward design problem with cost(x, t) is an NP-hard problem. Given every agent type ti, i [n] and the above cost function, is there an AIRS that can achieve gross product V within budget B? When there are multiple best actions, we assume an agent will choose the highest quality. We call this decision problem the General Cost Problem for convenience. Theorem 5. The General Cost Problem is NP-hard.

6 The Linear Reward Scheme This section focuses on a simpler scheme where the reward function is linear, denoted by R(x) = px. Here p is the per unit price of contribution. We will show the optimal linear reward function can achieve at least 1

2 gross product of that achieved by the optimal AIRS. Theorem 6. Optimal linear reward scheme is a 1 2approximation to the optimal AIRS. The ratio is tight. The proof of 1

2-approximation is deferred to the appendix. We show the ratio is tight by providing an example. There is only 1 agent and she has a unique type t and h(t) = 1. The budget is 1. We let

c(x) = ϵx, x 1, (1 + ϵ)x 1, 1 < x.

We design an AIRS such that

( 0, x < 2 1+ϵ, 1, x 2 1+ϵ.

Under scheme R, the agent will choose x = 2 1+ϵ. We consider the linear reward scheme. If the price p is set larger than 1, the agent s action x should be at most 1 according to the budget constraint. If the price p is set at most 1, the agent s action x is still at most 1 according to the cost function. When p = 1, agent s best action would be 1. Thus the agent s action x is at most 1. The ratio between the two schemes is 1/ 2+ϵ

1+ϵ, which approaches 1

2 when ϵ moves towards zero.

7 Superiority over Other Schemes This section demonstrates that the optimal AIRS gains high gross product compared to other reward schemes. We ﬁrst show that when agents types are independent and identically distributed, the optimal AIRS has superiority over other anonymous schemes implemented in symmetric Bayes-Nash equilibrium. Then we prove that the proportional scheme,

which divides the reward according to the proportion of quality, can perform arbitrarily badly in the worst case. At last, since the Bayes-Nash equilibrium is not known for the proportional reward scheme, we only consider the full information setting and show the optimal AIRS beats the proportional reward scheme.

Theorem 7. When agents are independent and identically distributed, for any anonymous reward scheme R in which the Nash equilibrium is symmetric, there is an AIRS R that can achieve at least the same gross product as R.

This theorem indicates that AIRS might be the optimal anonymous reward scheme. Next, we focus on the proportional scheme [Ghosh and Hummel, 2014]. Formally, the utility of agent i in this scheme can be represented as

ui(xi, x i, ti) = xi B Pn j=1 xj c(xi)h(ti).

For completeness, let ui(xi, x i) be 0 if xi = 0 for all i. This scheme has no guarantee on the gross product compared to the optimal AIRS, even in the full information setting.

Theorem 8. There are two agents, for any ϵ > 0, there exists h( ),c( ) and (t1, t2) such that the Nash equilibrium (xprop,1, xprop,2) of the proportional scheme and for actions (x ,1, x ,2) achieved in the optimal AIRS, we have

xprop,1 + xprop,2 ϵ(x ,1 + x ,2).

Theorem 8 is proved by constructing an instance that the proportional scheme can perform arbitrarily bad compared to the optimal AIRS. The idea is as follows. When there is an agent A with high ability and an agent B with low ability, agent A can get a big enough reward with mediocre content since agent B has a low ability. Though the proportional scheme is not an AIRS, the following theorem shows that there is an AIRS such that agents choose the same actions across two schemes and this AIRS demands less budget.

Theorem 9. Given agents types, we can design an AIRS that achieves the same gross product as in the proportional scheme. In addition, every agent chooses the same action across two schemes.

8 Conclusion

We consider designing anonymous reward schemes for platforms to maximize the overall quality of all content. This paper introduces the anonymous independent reward scheme. We ﬁrst show the intractability of the general problems. Then, when the cost function is convex, we propose an efﬁcient algorithm. We also give a tight approximation ratio for the optimal linear reward scheme compared to the optimal AIRS. Finally, we show the superiority of AIRS over other anonymous schemes under several settings. Many open problems remain in this research direction. How to compute the Bayes Nash equilibrium in the proportional reward scheme in the Bayesian information setting? Will the reward scheme beneﬁt from using the rank information? What is the optimal anonymous reward scheme?

Proceedings of the Thirty-First International Joint Conference on Artiﬁcial Intelligence (IJCAI-22)

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China (Grant No. 62172422); Beijing Outstanding Young Scientist Program (No. BJJWZYJH012019100020098); Intelligent Social Governance Platform, Major Innovation & Planning Interdisciplinary Platform for the Double-First Class Initiative, Renmin University of China.

[Alon et al., 2020] Tal Alon, Magdalen Dobson, Ariel D Procaccia, Inbal Talgam-Cohen, and Jamie Tucker-Foltz. Multiagent evaluation mechanisms. Proceedings of the AAAI Conference on Artiﬁcial Intelligence, 34(02):1774 1781, 2020.

[Azizan et al., 2020] Navid Azizan, Yu Su, Krishnamurthy Dvijotham, and Adam Wierman. Optimal pricing in markets with nonconvex costs. Operations Research, 68(2):480 496, 2020.

[Babaioff et al., 2006] Moshe Babaioff, Michal Feldman, and Noam Nisan. Combinatorial agency. In Proceedings of the 7th ACM conference on Electronic commerce, pages 18 28. ACM, 2006.

[Bjørndal and J ornsten, 2008] Mette Bjørndal and Kurt J ornsten. Equilibrium prices supported by dual price functions in markets with non-convexities. European Journal of Operational Research, 190(3):768 789, 2008.

[Chawla et al., 2019] Shuchi Chawla, Jason D Hartline, and Balasubramanian Sivan. Optimal crowdsourcing contests. Games and Economic Behavior, 113:80 96, 2019.

[Chen et al., 2019] Mengjing Chen, Pingzhong Tang, Zihe Wang, Shenke Xiao, and Xiwang Yang. Optimal mechanisms with budget for user generated contents. ar Xiv preprint ar Xiv:1907.04740, 2019.

[Dutting et al., 2021] Paul Dutting, Tim Roughgarden, and Inbal Talgam-Cohen. The complexity of contracts. SIAM Journal on Computing, 50(1):211 254, 2021.

[Ghosh and Hummel, 2013] Arpita Ghosh and Patrick Hummel. Learning and incentives in user-generated content: Multi-armed bandits with endogenous arms. In Proceedings of the 4th conference on Innovations in Theoretical Computer Science, pages 233 246. ACM, 2013.

[Ghosh and Hummel, 2014] Arpita Ghosh and Patrick Hummel. A game-theoretic analysis of rank-order mechanisms for user-generated content. Journal of Economic Theory, 154:349 374, 2014.

[Ghosh and Mc Afee, 2011] Arpita Ghosh and Preston Mc Afee. Incentivizing high-quality user-generated content. In Proceedings of the 20th international conference on World wide web, pages 137 146. ACM, 2011.

[Holmstrom, 1982] Bengt Holmstrom. Moral hazard in teams. The Bell Journal of Economics, pages 324 340, 1982.

[Jain et al., 2014] Shaili Jain, Yiling Chen, and David C Parkes. Designing incentives for online question-andanswer forums. Games and Economic Behavior, 86:458 474, 2014. [Luca, 2015] Michael Luca. User-generated content and social media. In Handbook of media Economics, volume 1, pages 563 592. Elsevier, 2015. [Moldovanu and Sela, 2006] Benny Moldovanu and Aner Sela. Contest architecture. Journal of Economic Theory, 126(1):70 96, 2006. [O Neill et al., 2005] Richard P O Neill, Paul M Sotkiewicz, Benjamin F Hobbs, Michael H Rothkopf, and William R Stewart Jr. Efﬁcient market-clearing prices in markets with nonconvexities. European journal of operational research, 164(1):269 285, 2005. [Park et al., 2014] Jaimie Yejean Park, Jiyeon Jang, Alejandro Jaimes, Chin-Wan Chung, and Sung-Hyon Myaeng. Exploring the user-generated content (ugc) uploading behavior on youtube. In Proceedings of the 23rd International Conference on World Wide Web, pages 529 534, 2014. [Tullock, 1980] Gordon Tullock. Efﬁcient rent seeking. In Toward a Theory of the Rent-seeking Society., pages 97 112. College Stations, TX:Texas A & M University Pres, 1980. [Xia et al., 2014] Yingce Xia, Tao Qin, Nenghai Yu, and Tie Yan Liu. Incentivizing high-quality content from heterogeneous users: On the existence of nash equilibrium. Proceedings of the AAAI Conference on Artiﬁcial Intelligence, 28(1), 2014. [Xiao et al., 2020] Shenke Xiao, Zihe Wang, Mengjing Chen, Pingzhong Tang, and Xiwang Yang. Optimal common contract with heterogeneous agents. Proceedings of the AAAI Conference on Artiﬁcial Intelligence, 34(05):7309 7316, 2020. [Zhang and Sarvary, 2014] Kaifu Zhang and Miklos Sarvary. Differentiation with user-generated content. Management Science, 61(4):898 914, 2014.

Proceedings of the Thirty-First International Joint Conference on Artiﬁcial Intelligence (IJCAI-22)