# crowdsourcing_with_arbitrary_adversaries__2c9660cd.pdf Crowdsourcing with Arbitrary Adversaries Matth aus Kleindessner 1 Pranjal Awasthi 1 Most existing works on crowdsourcing assume that the workers follow the Dawid-Skene model, or the one-coin model as its special case, where every worker makes mistakes independently of other workers and with the same error probability for every task. We study a significant extension of this restricted model. We allow almost half of the workers to deviate from the one-coin model and for those workers, their probabilities of making an error to be task-dependent and to be arbitrarily correlated. In other words, we allow for arbitrary adversaries, for which not only error probabilities can be high, but which can also perfectly collude. In this adversarial scenario, we design an efficient algorithm to consistently estimate the workers error probabilities. 1. Introduction Crowdsourcing is an omnipresent phenomenon: it has emerged as an integral part of the machine learning pipeline in recent years, and one reason for the great advances in deep learning is the presence of large data sets that have been labeled by the crowd (e.g., Deng et al., 2009; Krizhevsky, 2009). Crowdsourcing is also at the heart of peer grading systems (e.g., Alfaro & Shavlovsky, 2014), which help with rising enrollment at universities, and online rating systems (e.g., Liao et al., 2014), which many of us rely on when choosing the next restaurant, to provide just a few examples. A crowdsourcing scenario consists of a set of workers and a set of tasks that need to be solved. A data curator utilizing crowdsourcing can aim at estimating various quantities of interest. The first goal might be to estimate the true labels or answers for the tasks at hand. Typically, additional constraints are involved here such as a worker not being willing 1Department of Computer Science, Rutgers University, Piscataway Township, New Jersey, USA. Correspondence to: Matth aus Kleindessner , Pranjal Awasthi . Proceedings of the 35 th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80, 2018. Copyright 2018 by the author(s). to solve too many tasks and the data curator wanting to get high-quality labels at a low price. The canonical example of this case is the Amazon Mechanical Turk TM. There one cannot track specific workers as they are fleeting. However, in scenarios such as peer grading or online rating systems, a second goal might be to estimate worker qualities, especially if workers can be reused at a later time. In a seminal paper, Dawid & Skene (1979) proposed a formal model that involves worker quality parameters for crowdsourcing scenarios in the context of classification. The Dawid-Skene model has become a standard theoretical framework and has led to a flurry of research over the past few years (Liu et al., 2012; Raykar & Yu, 2012; Li et al., 2013; Gao et al., 2016; Zhang et al., 2016; Khetan et al., 2017), in particular in its special symmetric form usually referred to as one-coin model (Ghosh et al., 2011; Karger et al., 2011a;b; Dalvi et al., 2013; Gao & Zhou, 2013; Karger et al., 2014; Bonald & Combes, 2017; Ma et al., 2017). In its general form for binary classification problems, the Dawid Skene model assumes that for each worker, the probability of providing the wrong label only depends on the true label of the task, but not on the task itself. Moreover, given the true label, the responses provided by different workers are independent. The one-coin model additionally assumes that for each worker, the probability of providing the wrong label is the same for both classes. We will formally introduce the one-coin model in Section 2. A discussion of prior work work is provided in Section 5 and Appendix A. The crucial limitation of the Dawid-Skene and one-coin model is the assumption that workers error probabilities are task-independent. In particular, this excludes the possibility of colluding adversaries (other than those that provide the wrong label all of the time), which might make these models a poor approximation of the real world encountered in such applications as peer grading or online rating. In this paper, we study a significant extension of the one-coin model that allows for arbitrary, highly colluding adversaries. We provide an algorithm for estimating the workers error probabilities and prove that it asymptotically recovers the true error probabilities. Using our estimates of the error probabilities in weighted majority votes, we also provide strategies to estimate ground-truth labels of the tasks. Experiments on both synthetic and real data show that our approach clearly outperforms existing methods in the presence of adversaries. Crowdsourcing with Arbitrary Adversaries 2. Setup and problem formulation We first describe a general model for crowdsourcing with non-adaptive workers and binary classification tasks: there are n workers w1, . . . , wn and an i.i.d. sample of m tasklabel pairs ((xi, yi))m i=1 Dm, where D is a joint probability distribution over tasks x X and corresponding labels y { 1, +1}. There is a variable gij {0, 1}, i [m], j [n], indicating whether worker wj is presented with task xi (for k N, we use [k] to denote the set {1, . . . , k}). If wj is presented with xi, that is gij = 1, wj provides an estimate wj(xi) { 1, +1} of the ground-truth label yi. Let A { 1, 0, +1}m n be a matrix that stores all the responses collected from the workers: Aij = wj(xi) if gij = 1 and Aij = 0 if gij = 0. We assume that each worker wj follows some (probabilistic or deterministic) strategy such that wj(xi) only depends on xi. In particular, given xi, any two different workers responses wj(xi) and wk(xi) and the ground-truth label yi are independent. Let εwj(x, y) [0, 1] be the conditional error probability that, given x and y, wj(x) does not equal y, that is εwj(x, y) := Prwj|(x,y)[wj(x) = y | (x, y)]. (1) Note that the unconditional probability of wj(x) being incorrect, before seeing x and y, is given by Pr(x,y) D,wj[wj(x) = y] = E(x,y) D[εwj(x, y)] =: εwj. Now one may study the following questions: (i) Given only the matrix A, how can we estimate the ground-truth labels y1, . . . , ym? (ii) Given only the matrix A, how can we estimate the workers unconditional error probabilities εw1, . . . , εwn? (iii) If we can choose gij (either in advance of collecting workers responses or adaptively while doing so), how should we choose it such that we can achieve (i) or (ii) with a minimum number of collected responses? In case of εwj(x, y) as defined in (1) being constant on X { 1, +1}, that is εwj(x, y) εwj, for all j [n], our model boils down to what is usually referred to as the one-coin model (e.g., Szepesvari, 2015), for which (i) to (iii) have been studied extensively (see Section 5 and Appendix A for references and a detailed discussion). With this paper we initiate the study of a significant extension of the one-coin model. We will allow almost half of the workers to deviate from the one-coin model and for such a worker wj, the conditional error probability εwj(x, y) to be a completely arbitrary random variable. In other words, we will allow for arbitrary adversaries, for which not only error probabilities can be high, but for which error probabilities can be arbitrarily correlated. We mainly study (ii) in this scenario. We then make use of existing results for the onecoin model to answer (i) satisfactorily for our purposes. We do not deal with (iii), but instead assume that gij has been specified in advance. 3. General outline of our approach In this section we want to present the general outline of our approach. A key insight is that the unconditional probability of workers wj and wk being agreeing is given by Pr(x,y) D,wj,wk[wj(x) = wk(x)] = 1 εwj εwk+ 2εwjεwk + 2 Cov(x,y) D[εwj(x, y), εwk(x, y)]. (2) Cov(x,y) D[εwj(x, y), εwk(x, y)] denotes the covariance between random variables εwj(x, y) and εwk(x, y), that is Cov(x,y) D[εwj(x, y), εwk(x, y)] = E(x,y) D[(εwj(x, y) εwj) (εwk(x, y) εwk)]. A proof of (2) can be found in Appendix B. The probability on the left-hand side of (2) can be easily estimated from A by the ratio of the number of tasks that wj and wk agreed on to the number of tasks they were both presented with: Pr[wj(x) = wk(x)] Pm i=1 gijgik1{Aij = Aik} Pm i=1 gijgik =: pjk. This suggests to solve the system of equations 1 εj εk + 2εjεk + 2cjk = pjk, 1 j < k n, (4) in the unknowns εl, l [n], and cjk, 1 j < k n, in order to obtain estimates of the workers unconditional error probabilities εw1, . . . , εwn. However, there is a catch: in general, the system (4) is not identifiable and has several solutions. We will assume that at least n 2 + 2 of the workers follow the one-coin model and have error probabilities smaller than one half. A worker wj following the one-coin model implies Cov(x,y) D[εwj(x, y), εwk(x, y)] = 0, k = j, (5) and hence under this assumption we can restrict the search for solutions of (4) to εl, l [n], and cjk, 1 j < k n, with the property that1 L [n] with |L| n/2 + 2 such that j L : (εj < 1/2 [ k = j : cjk = 0]) . (6) 1Throughout the paper, we set cjk = ckj if j > k. We also assume pjk = pkj. Crowdsourcing with Arbitrary Adversaries Note that we never assume to know which workers follow the one-coin model, which corresponds to using the existential quantifier for the set L in (6) rather than considering a fixed L. We can show that the system (4) has at most one solution with property (6). We also provide evidence that our assumption of n 2 + 2 of the workers following the one-coin model and having error probabilities smaller than one half is a necessary condition for guaranteeing the identifiability of system (4). If the workers satisfy our assumption and pjk on the right-hand side of (4) are actually true agreement probabilities, then εl = εwl and cjk = Cov[εwj(x, y), εwk(x, y)] is the unique solution of (4) that satisfies (6). But if pjk are not exactly true agreement probabilities, there might be no solution of (4) with property (6) at all. We prove that if estimates pjk are not too bad, we can solve (4) together with (6) approximately, and our approximate solution is guaranteed to be close to true error probabilities εw1, . . . , εwn and covariances Cov[εwj(x, y), εwk(x, y)], j < k. This answers (ii) from Section 2 and is the main contribution of our paper: Main result. Assume that at least n 2 + 2 of the workers follow the one-coin model and have error probabilities not greater than γTR < 1 2. If |Pr[wj(x) = wk(x)] pjk| β for all j = k and β sufficiently small, we can compute estimates ˆεw1, . . . , ˆεwn of εw1, . . . , εwn such that |εwi ˆεwi| C(γTR) β1/4. We answer (i) from Section 2 and provide two ways to predict ground-truth labels y1, . . . , ym by taking weighted majority votes over the responses provided by the workers. In these majority votes, the weights depend on our estimates of true error probabilities εw1, . . . , εwn. 4. Details and analysis 4.1. Estimating agreement probabilities If gij has been specified in advance, we have the following guarantee on the quality of the estimates pjk (see (3)): Lemma 1. Assume Pm i=1 gijgik > 0, j = k. Let δ > 0 and βjk = min 1, h ln(2n2/δ)/ 2 Xm i=1 gijgik i1/2 . Then we have with probability at least 1 δ over the sample ((xi, yi))m i=1 and the randomness in workers strategies that |Pr[wj(x) = wk(x)] pjk| βjk, 1 j < k n. Proof. A straightforward application of Hoeffding s inequality and the union bound yields the result. 4.2. Identifiability and approximate solution If all workers follow the one-coin model, that is εwj(x, y) εwj for all j [n], we have Cov(x,y) D[εwj(x, y), εwk(x, y)] = 0, 1 j < k n, and system (4) reduces to 1 εj εk + 2εjεk = pjk, 1 j < k n, (7) in the unknowns εl, l [n]. It is well known that, in general, even (7) is not identifiable. For example, if pjk = 1 for all 1 j < k n, there are the two solutions εl = 0, l [n], and εl = 1, l [n], corresponding to either all perfect or all completely erroneous workers. On the other hand, the system (7) is identifiable if we assume that on average workers are better than random guessing, that is 1 n Pn j=1 εwj < 1 2, and there are at least three informative workers with εwj = 1 2 (Bonald & Combes, 2017). Clearly, these two conditions do not guarantee identifiability of the general system (4). The next lemma shows that even if we additionally assume half of the workers to follow the one-coin model, the system (4) is not identifiable. Here we only state an informal version of the lemma. A detailed version and its proof can be found in Appendix B. Lemma 2. There exists an instance of the system (4), where n is even, that has two different solutions. In both solutions, it holds that εl < 1 2, l [n]. Furthermore: (a) In the first solution, cjk = 0 for all j [ n 2 ] and k = j, and εl is small if l [ n 2 ] and big if l [n] \ [ n (b) In the second solution, cjk = 0 for all j [n]\[ n 2 ] and k = j, and εl is small if l [n] \ [ n 2 ] and big if l [ n We want to mention that a solution of (4) does not necessarily correspond to actual workers, that is given εl, l [n], and cjk, 1 j < k n, there might be no collection of workers w1, . . . , wn such that εwl = εl and Cov[εwj(x, y), εwk(x, y)] = cjk. By the Bhatia Davis inequality (Bhatia & Davis, 2010) it holds that Var[εwj(x, y)] εwj ε 2 wj. Hence, a necessary condition for a solution to correspond to actual workers is that |cjk| (εj ε 2 j )1/2(εk ε 2 k )1/2 (in addition to εl [0, 1]). The two solutions in Lemma 2 correspond to actual workers. From now on we assume that at least n 2 + 2 workers follow the one-coin model and have error probabilities smaller than one half:2 Assumption A. There exists L [n] with |L| n/2 + 2 such that for all j L, the worker wj follows the one-coin model with error probability εwj < 1/2. This corresponds to considering (4) together with the constraint (6). The system (4) together with (6) is identifiable: Proposition 1. There exists at most one solution of system (4) that has property (6). 2All results of Section 4.2 hold true if we assume, more generally, the existence of L [n] with |L| n 2 + 2 such that (5) together with εwj < 1 2 holds for all j L. Crowdsourcing with Arbitrary Adversaries Proof. Assuming there are two solutions (εS1 l )l [n], (c S1 jk )1 j 0 as ν 0. The proof of Proposition 2, which provides an explicit expression for G(γ, ν), can be found in Appendix B. In a next step, we assume that we are given pairwise different i1, i2, i3 [n] such that wi1, wi2, wi3 follow the onecoin model with εwi1, εwi2, εwi3 < 1/2. In this case, assuming that estimates pjk are close to true agreement probabilities, we can construct a solution of (4) that is guaranteed to be close to the true error probabilities and covariances (and hence approximately satisfies (6)). This is made precise in the next lemma (its proof can be found in Appendix B). Lemma 3. Let γTR < 1/2 and consider the system (4) with p TR jk [0, 1] as right-hand side. Assume there exists a solution3 (εTR l )l [n], (c TR jk )1 j 0, B + 4C 0 and εS i2 = 1 2), then (εS l )l [n], (c S jk)1 j 0 as β 0. In Lemma 3, for constructing the solution (εS l )l [n], (c S jk)1 j 0, then the optimal estimator for the ground-truth label yi is given by the weighted majority vote (16) with f(ˆεwl) replaced by f(εwl) = ln ((1 εwl)/εwl) (Nitzan & Paroush, 1982; Berend & Kontorovich, 2015; Bonald & Combes, 2017). Hence, a common approach for the one-coin model is to first estimate the true error probabilities and then to estimate ground-truth labels by using the majority vote (16) with f(ˆεwl) = ln ((1 ˆεwl)/ˆεwl) (Bonald & Combes, 2017; Ma et al., 2017). We propose to use the same majority vote, but restricted to answers from workers that we believe to follow the one-coin model. Using the notation from Section 4.2, this means that we set f(ˆεwl) = ln ((1 ˆεwl)/ˆεwl) for l Qp0 with maxk [n]\{l} |c S lk(p0)| νp0 and f(ˆεwl) = 0 otherwise. Alternatively, we suggest to set f(ˆεwl) = 1 2ˆεwl for l [n]. With this choice of f we make use of the responses provided by all workers. The same choice has been used for the one-coin model too (Dalvi et al., 2013). A third option would be to set f(ˆεwl) = 1 2ˆεwl for l Qp0 with maxk [n]\{l} |c S lk(p0)| νp0 and f(ˆεwl) = 0 otherwise, but we do not consider this choice any further. Crowdsourcing with Arbitrary Adversaries 4.4. Algorithm In the interests of clarity, we present our approach as self contained Algorithm 1. Choosing P as the set of triples such that involved pairs of workers have been provided with at least ten or three common tasks might seem somewhat arbitrary here. Indeed, one could introduce two parameters to the algorithm instead. Without optimizing for these parameters, we chose them as ten and three in all our experiments on real data, and hence we state Algorithm 1 as is. Our analysis best applies to the setting of a full matrix A (or variables gij that are independent Bernoulli random variables with common success probability, as it is assumed by Bonald & Combes, 2017, for example). In this case, which we consider in our experiments on synthetic data, choosing P as stated in Algorithm 1 reduces to choosing P as the set of all triples of pairwise different indices. If the number of workers n is small, this is the best one can do. If n is large, it is infeasible to choose P as the set of all triples though since the running time of Algorithm 1 is in O(n2(m + |P|)). If n is large and A full, one should sample P uniformly at random. For |P| ln δ/ ln(7/8) our error guarantee (14) holds with probability at least 1 δ then (compare with Section 4.2). 5. Related work We briefly survey related work here. A complete discussion can be found in Appendix A. As discussed in Sections 1 and 2, in crowdsourcing one might be interested in estimating ground-truth labels and/or worker qualities given the response matrix A, but also in optimal task assignment. In their seminal paper, Dawid & Skene (1979) proposed an EM based algorithm to address the first two goals. Since then numerous works have followed addressing all three goals for the Dawid-Skene and one-coin model (Ghosh et al., 2011; Karger et al., 2011a;b; 2013; 2014; Dalvi et al., 2013; Gao & Zhou, 2013; Gao et al., 2016; Zhang et al., 2016; Bonald & Combes, 2017; Ma et al., 2017). There have also been efforts to study generalizations of the Dawid-Skene model (Jaffe et al., 2016; Khetan & Oh, 2016; Shah et al., 2016) as well as to explicitly deal with adversaries (Raykar & Yu, 2012; Jagabathula et al., 2017). However, none of the prior work can handle a number of arbitrary adversaries almost as large as the number of reliable workers as we do. 6. Experiments On both synthetic and real data, we compared our proposed Algorithm 1 to straightforward majority voting for predicting labels (referred to as Maj) and the following methods from the literature: the spectral algorithms by Ghosh et al. (2011) (GKM), Dalvi et al. (2013) (Ro E and Eo R) and Karger et al. (2013) (KOS), the two-stage procedure by Algorithm 1 Input: crowdsourced labels stored in A { 1, 0, +1}m n, upper bound 0 < γTR < 1 2 on the error probabilities of n 2 + 2 workers that follow the one-coin model, confidence parameter 0 < δ < 1 Output: estimates (εF l )l [n], (c F jk)j