# crowdsourcing_via_tensor_augmentation_and_completion__f1c74b91.pdf

Crowdsourcing via Tensor Augmentation and Completion

Yao Zhou, Jingrui He Arizona State University, Tempe, Arizona

yzhou174@asu.edu, jingrui.he@asu.edu

Nowadays, the rapid proliferation of data makes it possible to build complex models for many real applications. Such models, however, usually require large amount of labeled data, and the labeling process can be both expensive and tedious for domain experts. To address this problem, researchers have resorted to crowdsourcing to collect labels from non-experts with much less cost. The key challenge here is how to infer the true labels from the large number of noisy labels provided by non-experts. Different from most existing work on crowdsourcing, which ignore the structure information in the labeling data provided by non-experts, in this paper, we propose a novel structured approach based on tensor augmentation and completion. It uses tensor representation for the labeled data, augments it with a ground truth layer, and explores two methods to estimate the ground truth layer via low rank tensor completion. Experimental results on 6 real data sets demonstrate the superior performance of the proposed approach over state-of-the-art techniques.

1 Introduction Recent years have seen explosive growth of data being collected from a variety of domains. Such unprecedented amount of data makes it possible to build complex models for prediction and inference. On the other hand, building such models requires accurate label information, the collection of which from domain experts is typically both expensive and tedious. Alternatively, crowdsourcing has been proposed to collect large amount of label information from nonexperts, which is much less expensive [Kittur et al., 2008; Huberman et al., 2009]. However, due to the noisy nature of the labels provided by non-experts, a key challenge in crowdsoucing is how to infer the true labels from the large number of noisy labels.

To address this problem, a variety of techniques have been proposed in the past decades. Among others, the most straightforward method is majority voting, which is based on the assumption that all labels are equally reliable. However, this assumption may not hold in practice, and majority voting has been proven sub-optimal [Karger et al., 2011].

More recently, [Dawid et al., 1979] proposed an iterative algorithm based on Expectation Maximization (EM) to estimate worker quality and infer the item true label at the same time. Its performance is further improved by a variety of recent algorithms [Zhou et al., 2012; Liu et al., 2012b; Zhang et al., 2014; Raykar et al., 2010]. Section 2 provides a brief review of these algorithms.

In this paper, for the ﬁrst time, we approach the crowdsourcing problem using tools and concepts from tensor augmentation and completion (TAC). Compared with existing techniques, we are able to effectively leverage the structured information in the labeled data. First of all, we represent the set of labels provided by non-experts (workers) as a threeway tensor, and then augment it with an extra tensor slice named the ground truth layer. Second, to infer the true labels in the ground truth layer, we leverage the low rank property of the augmented tensor, and introduce two optimization problems named PG-TAC (prior guided) and RS-TAC (relaxed simplex). Finally, we propose various algorithms for solving these problems using block coordinate descent. Empirical results on 6 real data sets demonstrate the effectiveness of the proposed methods in both binary and multi-class labeling tasks, outperforming several state-of-the-art methods.

The rest of the paper is organized as follows. In Section 2, we brieﬂy review existing working on crowdsourcing and tensor completion. Then in Sections 3 and 4, we present our proposed model and optimization algorithms, followed by experimental results on both synthetic and real data sets in Section 5. Finally, we conclude the paper in Section 6.

2 Related work

In this section, we brieﬂy review the related work on crowdsourcing and missing value completion.

One of the earliest works on crowdsourcing is [Dawid et al., 1979], which proposes an iterative algorithm based on Expectation Maximization (EM) to estimate worker quality and infer the item true label at the same time. They assume each worker is associated with a probabilistic confusion matrix for item labeling. Each diagonal entry of the confusion matrix represents the labeling accuracy in each labeling class and the off-diagonal entries of each row represent the mislabeling probabilities. However their model implicitly ignores the item variations in the same class and assumes all items, which

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16)

have the same true labels, will have the same degree of difﬁculties. That assumption does not hold in many real-world situations, then [Zhou et al., 2012] improved upon their work by proposing a minimax entropy principle to infer the true label, the labeling difﬁculty of the item, and the quality of the worker. Besides the worker quality, their method assumes that each item has its own intrinsic difﬁculty of being mislabeled. When the item difﬁculty is ignored, their model is reduced to the EM method proposed by [Dawid et al., 1979]. Another ﬂaw of the EM method, proposed by [Dawid et al., 1979], is that their likelihood function is nonconvex, therefore its performance is initialization sensitive because the EM iterations can possibly converge at a local optimum. To address this issue, [Zhang et al., 2014] proposed a two-staged algorithm in which the initial worker confusion matrix is estimated using the spectral method, and then their algorithm turns to EM iterations. Their model has been proved to be able to achieve the minimax rates of convergence up to a logarithmic factor. [Liu et al., 2012b] also proposed a graphical model that performs variational inference method using belief propagation and mean ﬁeld (MF) algorithms. Another probabilistic model named GLAD, which can simultaneously estimate the ground truth, item difﬁculty and worker ability, has also been proposed by [Whitehill et al., 2009]. However the GLAD model can only work on binary tasks and it does not model the worker bias, its performance can get worse when the bias variation of different workers is high [Welinder et al., 2010]. Later on, the GLAD model is generalized to work multi-class labelling tasks by [Mineiro, 2011].

Missing values are commonly seen in many real-world applications, such as recommendation systems, which motivates the study of missing value completion. This problem is initially proposed by [Cand es and Recht, 2009] in order to recover the missing entries in matrices. Theoretically it has already been proved that most low rank matrices can be recovered from a small fraction of entries by formatting a rank minimization problem. However this rank minimization problem is NP-hard and non-convex, which results in the optimization problem that uses trace norm as the objective. This is addressed and mentioned in a variety of works [Cand es and Tao, 2010; Recht, 2011]. The advantage is that trace norm is the tightest convex envelop for matrix rank. In many practical situations, higher dimensional data is more desired and it requires to generalize the completion methods on tensors. Similar to matrix completion, it is straightforward to think of formulating the tensor completion as a rank minimization problem. However, unlike the matrix rank, there is no direct algorithm that can decide the rank of a tensor [Kolda and Bader, 2009]. To overcome this issue, similarly, [Liu et al., 2012a] proposed to approximate the rank minimization problem as a trace norm minimization problem. They introduce one type of the deﬁnition for tensor trace norm, while there exists many other deﬁnitions [Gandy et al., 2011]. [Liu et al., 2012a] also relaxes the objective function so that the optimization problem becomes convex. Their ﬁnal low rank tensor completion method (LRTC) shows the broad capability to recover data in various format. Meanwhile there are many other heuristic methods [Xu et al., 2013] that can be applied to do tensor completions by employing tensor decom-

position and unfolded matrix factorization. However the theoretical guarantee of these heuristic methods is still an open question and the comparison experiment results from [Liu et al., 2012a] show that LRTC has a more stable performance on both synthetic data and real-world data.

3 Problem formulation 3.1 Notation In this article, we use calligraphic letters, such as X, to denote tensors. We use upper case letters, such as M, to denote matrices. Vectors and scalars are denoted by the bold lower case letters and a lower case letters such as x and x. A n-way tensor is denoted as X 2 IRN1 N2 ... Nn. The (i, j, k)th element of a three-way tensor X is represented by Xijk. A slice of a three-way tensor X is denoted as Xi::, X:j: or X::k. A ﬁber of a three-way tensor is denoted as X:jk, Xi:k or Xij:. The norm of a tensor is analogous to the matrix Frobenius norm: ||X||F = (P

i,j,k |Xijk|2)1/2. The trace norm of a matrix M is deﬁned as: ||M|| = P

i σi(M) and σi(M) denotes the ith singular value in descending order. Let denote the index set of a tensor, and | | denote the cardinality of . One important operation of a tensor X is called matricization or unfold, which reorders a n-way tensor into a matrix. We denote X(k) as the output of unfold operation along the k-th dimension of a tensor X, i.e., X(k) = unfoldk(X). Similarly, the foldk(X(k)) is the inverse operation of unfold and it returns the tensor X. The details of operations fold and unfold can be found at [Kolda and Bader, 2009].

3.2 Tensor augmentation and completion We propose to reorganize the worker labels from crowdsourcing as a three-way label tensor T 0 2 IRNw Ni Nc and an index set . Here Nw, Ni and Nc are denoted as number of the workers, number of the items and number of the classes respectively. Each worker gives each item either exactly one label or no label, then label tensor T 0 and index set are built as follows: If a worker i has labeled an item j with label k, the corresponding ﬁber T 0

ij: is initialized with an unit vector, which has value of 1 in kth entry and value of 0 s in the rest. Meanwhile the corresponding index triplets of ﬁber T 0

ij: are added into the index set . However a worker does not necessarily have to label all items. If worker i does not label item j, ﬁber T 0

ij: is initialized with a zero vector and remains unchanged. If that is the case, the label tensor T 0 will have missing entries. In our approach, we propose to augment the label tensor with an extra tensor slice of size Ni Nc, called the ground truth layer, on the worker dimension. All entries of the ground truth layer are assumed to be missing and our objective is to infer the true labels of items.

Recall that the common approach for matrix completion is to minimize the matrix trace norm by solving the following convex optimization problem [Cand es and Recht, 2009].

X : ||X|| , s.t. : X = M (1)

where X and M are matrices of the same size. is the index set of matrix M, and X is the matrix such that its rank should be minimized while completing procedure. The entries that

do not belong to are missing. For tensor completion, [Liu et al., 2012a] followed the same formulation with the following trace norm deﬁnition of an n-way tensor X:

l = 1, l 0, l = 1, ..., n

where l, l = 1, ..., n are pre-deﬁned scalars of tensor trace norm. Analogous to the matrix completion formulation, the tensor completion problem can be written as follows:

l||X(l)|| , s.t. : X = T (3)

In the formulation, X is the target tensor that needs to be completed. However, the unfolded matrices X(l), l = 1, ..., n, are not independent with each other. In order to split them and solve them independently, same number of intermediate matrices Ml, l = 1, ..., n are introduced in this problem. Then this optimization problem can be relaxed and formulated as:

l||Ml|| + βl

2 ||X(l) Ml||2

s.t. : X = T

We propose to formulate the crowdsourcing problem as an augmented tensor completion problem with certain regularization on the ground truth layer. Since our task only requires a three-way tensor, from now on, without other speciﬁcations, all tensors in our equations have an order of three, namely n = 3. Given the augmented label tensor T 2 IR(Nw+1) Ni Nc and index set , our optimization problem becomes:

l||Ml|| + βl

2 ||X(l) Ml||2

F + R(Xig::)

s.t. : X = T

Here ig denotes the ground truth layer index on the worker dimension of the tensor.

3.3 Two formulations for inferring the ground truth layer In the formulations, we propose to regularize the ground truth layer in two different ways: One is to regularize the discrepancy between the ground truth layer of the tensor and a given prior statistics of the items. Another one is to constraint each tensor ﬁber of the ground truth layer in a simplex. Under these two regularizations, the inferred the ground truth layer can have distinct interpretations.

Prior guided ground truth inference The objective function of the ﬁrst formulation has a regularization term w.r.t. the discrepancy between ground truth layer and prior statistics matrix, and the regularization is parameterized with a positive value γ. Our key motivation of this

regularization is to updating the item labels by combining the prior statistics and tensor structure information. The formulation becomes:

l||Ml|| + βl

2 ||X(l) Ml||2

2 ||Xig:: S||2

s.t. : X = T

Here S 2 IRNw Nc represents the item prior statistics matrix of the tensor T .

Relaxed simplex ground truth inference The second formulation has regularization terms w.r.t. the tensor ﬁbers of the ground truth layer. Originally each item ﬁber Xigj: is posed to be constraint in a simplex. However the amount of labels collected for each item is usually limited to a small number in empirical experiment. It is likely that the labels of these items ﬂuctuate around their expected values. In order to prevent overﬁtting, we formulate our objective with relaxed simplex constraint and penalize the large ﬂuctuations according to the value of parameter γ:

l||Ml|| + βl

2 ||X(l) Ml||2

Xijk 1 = j, i = ig, 8j = 1, ..., Ni

4 Algorithm All the terms in the objective function are convex, therefore we can employ the block coordinate descent (BCD) for the optimization problems (6) and (7). BCD is guaranteed to converge [Tseng, 2001] and is computational easier and cheaper than the batch update. Then we apply the coordinate descent to optimize one target variable while ﬁxing others. In our case, we have four blocks: X, M1, M2 and M3, because the observed tensor has only three dimensions: the worker, the item and the label. There are two major iteration steps in BCD: First iteration updates one intermediate matrix Ml while ﬁxing the other intermediate matrices and the tensor X; Second iteration updates the tensor and ﬁxing all intermediate matrices.

4.1 Updating Ml Under certain simpliﬁcation, the optimization problem of ﬁrst BCD iteration becomes:

2||X(l) Ml||2

The close-form solution of this problem has been given by [Cai et al., 2010] as D (X(l)) = U V T . We ﬁrst compute singular value decomposition of matrix X(l) = U V T , then replace with its shrinkage version: = diag({σi }+). Here a+ = max(a, 0) and is the threshold of shrinkage SVD. No matter under prior guided formulation or relaxed simplex formulation, both problems will have problem (8) as the sub-problem in their BCD iterations.

4.2 Updating X

Prior guided formulation: With intermediate matrices M1, M2 and M3 ﬁxed in this iteration, the optimization problem becomes:

2 ||X(l) Ml||2

2 ||Xig:: S||2

s.t. : X = T

This problem is convex and the objective can be rewritten in elementary manner and then the Lagrangian of the optimization problem is given as:

(Xijk (foldl(Ml))ijk)2

2 ||Xig:: S||2

λijk(Xijk Tijk)

Elements of tensor X can be divided into three sets. First set C1 has its elements belong to the index set: (i, j, k) 2 . The elements of the second set C2 neither belong to the index set nor the ground truth layer: (i, j, k) /2 and i 6= ig; The elements of the third set C3 do not belong to the index set but belong to the ground truth layer: (i, j, k) /2 and i = ig. The elements in set C1 do not appear in the second term of the Lagrangian. Easily we know that the solution is:

Xijk = Tijk (11)

The elements in set C2 do not appear in the second and third terms of the Lagrangian. We take the derivative of the Lagrangian w.r.t. Xijk and set it to 0, then we get:

l=1 βlfoldl(Ml) Pn

The elements in set C3 do not appear in the third term of the Lagrangian. We take the derivative of the Lagrangian w.r.t. Xijk and set it to 0, then we get:

l=1 βlfoldl(Ml) Pn

Relaxed simplex formulation: The intermediate matrices M1, M2 and M3 are ﬁxed, the optimization problem becomes:

2 ||X(l) Ml||2

Xijk 1 = j, i = ig, 8j = 1, ..., Ni

Similarly, we rewrite the objective element wise, and the Lagrangian of the optimization problem becomes:

(Xijk (foldl(Ml))ijk)2 + γ

Xijk 1 j) +

λijk(Xijk Tijk)

Similar as the prior guided formulation, here the elements of tensor X are also divided into three sets C1, C2 and C3. The solutions for the elements in sets C1, C2 stay the same as shown in equations (11) and (12). The elements in set C3 do not appear in the third term of Lagrangian. We take the derivative of the Lagrangian w.r.t. Xijk and j and set them to 0, then we get:

l=1 βl(foldl(Ml))ijk j Pn

Substituting Equations (16) and (17) into the relaxed constraint PNc

k=1 Xijk 1 = j, we get:

k=1 foldl(Ml))ijk 1) Nc + 1

Substituting Equation (18) in Equation (16), we get:

l=1 βlfoldl(Ml) Pn

l=1 βl(1 PNc

k=1(foldl(Ml))ijk) (γNc + Pn

Our proposed PG-TAC method is described in Algorithm 1. The algorithm of RS-TAC is omitted due to space limit.

5 Experiments In this section, we report the results of our proposed methods on four groups of synthetic data sets. The purpose of this is to study the behavior of our methods under various data set conﬁgurations. Moreover, we compare our methods with a variety of state-of-the-art algorithms on six real data sets.

5.1 Synthetic data Generation of synthetic data sets is based on four parameters: number of worker Nw, number of items Ni, number of classes Nc and probability of no labels q. Given Nc and Ni, the true labels of these items are sampled from a multinomial distribution with probabilities p1, p2, ..., pnc. In order to have balanced data, these probabilities should be the same. However we add random noise to the probabilities without breaking the rule of the sum being to 1. Now, the data set is unbalanced and is more analogous to a real data set. Then for each worker, we generate a Nc Nc worker quality confusion matrix as follows: the diagonal entries are independently and

Algorithm 1: PG-TAC

Data: Augmented tensor T , prior statistics S, , β, γ, . Result: Completed tensor X.

while ||X T ||F /||T ||F do

for l = 1:n do

Updating Ml based on equation (8). end Updating elements in set C1 based on equation (11); Updating elements in set C2 based on equation (12); Updating elements in set C3 based on equation (13); end

uniformly sampled from a certain probability range. Empirically, the labeling difﬁculty of each item should rise with the increasing number of labels Nc, therefore it would be inappropriate to sample the diagonal probability entries in a ﬁxed range for different Nc values. In order to simulate the realworld situations, we assign each diagonal entry with a probability which is the product of random guess and a scale factor . We empirically draw from a uniform distribution of range [1.5, 1.99]. For instance, if the scale factor is drawn as 1.8, then for a 3-labeling task, the diagonal element will have accuracy value of 0.6. The non-diagonal entries are randomly assigned with positive probabilities under the constraint that the sum of each row of the confusion matrix is equal to 1. Since each worker does not have to label all items, we draw a labeling decision for each worker from a Bernoulli distribution with the probability of q. In addition to prior guided tensor augmentation and completion (PG-TAC) and relaxed simplex tensor augmentation and completion (RS-TAC), we also use no constraint tensor augmentation and completion (NC-TAC) as a simple baseline method for comparison. The evaluation metric is the error rate of label prediction. All the outcomes of our proposed methods are result of 10 independent runnings and the performance is shown in ﬁgure 1. The initial conﬁguration is Nw = 50, Ni = 400, Nc = 4 and q = 0.7. Based on this, we report the performance of our proposed methods on synthetic data sets generated with four groups of conﬁgurations: (a). Nw varies in range of [20, 90] by a step size of 10. (b). Nc varies in range of [2, 8] by a step size of 1. (c). Ni varies in range of [50, 1000] by a step size of 50. (d). q varies in range of [0, 0.95] by a step size of 0.05. Under each conﬁguration, other data set parameters remain consistent with the initial conﬁguration.

On all synthetic data sets we generated, the PG-TAC method achieves the lowest error rate on all conﬁgurations. The RS-TAC method does not necessarily improve the performance. In many conﬁgurations, RS-TAC and NC-TAC have almost the same performance. We also observe when the data sets are not sufﬁciently labeled, for instance, if Nw and Ni are very small or q is relatively large, the performance of our proposed methods are almost the same. When more data is used, the PG-TAC and RS-TAC have better performance.

5.2 Real data We also evaluate our methods with various other methods on six real world crowdsourcing data sets. There are three data

Number of workers: Nw

20 40 60 80 100

0.5 Configuration (a)

NC-TAC PG-TAC RS-TAC

Number of classes: Nc

Configuration (b)

NC-TAC PG-TAC RS-TAC

Number of items: Ni

0 200 400 600 800 1000

0.5 Configuration (c)

NC-TAC PG-TAC RS-TAC

Probability of no labels: q 0 0.2 0.4 0.6 0.8 1

Configuration (d)

NC-TAC PG-TAC RS-TAC

Figure 1: Comparison results of our proposed methods on various synthetic data sets conﬁgurations.

sets have binary labels and three data sets have multiple labels. The binary labeling data sets include Temp data set [Snow et al., 2008]), RTE data set [Snow et al., 2008] and Spam data set [Zhou et al., 2015]. The multi-class labeling data sets include Dog data set [Zhou et al., 2012], Web data set [Zhou et al., 2012] and Age data set [Han et al., 2015].

# classes # items # workers # total labels RTE 2 800 164 8000 Temp 2 462 76 4620 Web 5 2653 177 15567 Dog 4 807 109 7354 Spam 2 149 18 1901 Age 7 1002 165 10020

Table 1: For Dog data set, the unqualiﬁed workers, who have only labeled a small amount of images, are remained. For web data set, 12 items have been removed due to lack of true labels. For age data set, data has been discretized into 7 bins: [0, 9], [10, 19], [20, 29], [30, 39], [40, 49], [50, 59], [60, 100].

5.3 Methods

In our experiment, we employed eight methods for the purpose of comparison: Majority Voting (MV) is the most straightforward method to implement and we use it as one of our baseline methods. Dawid-Skene Expectation Maximization (DS-EM), proposed by [Dawid et al., 1979], is a generative model which jointly infers the item true labels and worker qualities. Dawid-Skene Mean Field (DS-MF) employs variational inference using mean ﬁled method and this model is proposed by [Liu et al., 2012b]. Generative model of Labels, Abilities and Difﬁculties (GLAD), proposed by [Whitehill et al., 2009], is a probabilistic framework that can simultaneously infer worker quality, item difﬁculty and item true labels. Here we use its variant, implemented by [Mineiro,

MV DS-EM DS-MF GLAD MMCE NC-TAC PG-TAC RS-TAC RTE 10.31 7.25 6.63 7.00 7.50 7.13 7.00 7.25 Temp 6.39 5.84 5.84 5.63 5.63 5.84 5.41 5.84 Web 26.93 16.92 18.24 19.30 11.12 11.16 10.82 11.23 Dog 17.91 15.86 15.74 16.23 15.86 15.74 15.74 Spam 19.80 13.42 12.75 18.12 12.75 14.10 12.75 13.42 Age 34.88 39.62 36.33 35.73 31.14 31.24 32.44 31.14

Table 2: Comparison results of all methods on six real data sets in error rate (in percentage)

2011], which can work on multi-class data. Minimax Conditional Entropy (MMCE) uses the minimax entropy principle [Zhou et al., 2012] to infer items ground truth from noisy labels. When the item difﬁcult is ignored, the MMCE model is reduced to DS-EM method. NC-TAC is another simple baseline of our proposed method without the constraint on ground truth layer. PG-TAC employs an tensor slice as its prior statistics. When regularization parameter γ is very small, PG-TAC is approximately equal to NC-TAC; when γ is sufﬁciently large, PG-TAC reduces to its prior statistics. The objective of RS-TAC has a regularization term, which is parameterized by γ, to control the strength of relaxation on ground truth layer.

5.4 Parameter selection PG-TAC and RS-TAC both have three parameters in objective: l, βl and γ. Here l = 1, ..., 3 and 3 is the mode of the tensor. The values of l are assigned with value of 1/3 and we let δl = l

βl . Given δl, the value of βl is also determined. Therefore it is straightforward to verify that we only need to tune δl in BCD iterations no matter it is in the step of computing Ml or in the step of computing X. For simplicity, we let all δl be the same for all three modes. Eventually we can apply the grid search on two regularization parameters δl and γ, and the procedure is described as follows: all data sets we used are publicly available online and they all come with ground truth labels. We run our proposed algorithms on a 2D grid parameter space. For each possible parameter pair on the searching grid, a subset of worker labels is randomly chosen from current data set without replacement. In practice, we empirically choose 90 percent of worker labels as a subset, run our methods, and evaluate the performance. Then we repeat the same procedure ten times for each possible parameter pair on the grid. Eventually the regularization parameter pair is chosen as the one that have lowest average error rate.

5.5 Implementation details The results of MV, DS-EM and MMCE methods are veriﬁed by using the open source implementation provided by [Zhou et al., 2015]. Our PG-TAC method uses the DS-EM as prior statistic and the ground truth layer is initialized using histogram of worker labels. In our empirical studies, we have tried initializing ground truth layer with majority voting of worker labels, mean value of the tensor and normalized histogram of worker labels. Normalized histogram is the linear combination of label tensor slices, therefore the rank of the augmented tensor will not increase if the ground truth layer is initialized using histogram. If we do not have any information about the ground truth layer, there is no hope to

recover the unknown ground truth layer with meaningful returning values. This has been veriﬁed by initialize ground truth layer with all 0 s and the ﬁnal completed values on it are meaningless. The ground truth layer of RS-TAC method is initialized with DS-EM. We use ||X T ||F /||T ||F as the stopping criteria and it is set to 10 5. The ﬁnal label prediction is performed as follows: in each ﬁber Xigj: of the completed ground truth layer, the entry with larger values are more likely to be correctly predicted.

5.6 Results

Table 2 summarizes the error rates of various methods on six real data sets. For fairly comparison, all methods have been fed with the same format of input data. Our proposed methods PG-TAC and RS-TAC have consistently lower error rate than other state-of-the-art methods in most data sets. The RS-TAC method has the best performance on Age data set, which is the most difﬁcult one we employed. Among all other data sets, the RS-TAC method has similar performance as NC-TAC. We observe the PG-TAC has outperformed all state-of-the-arts methods in most real data sets. The performance shown by PS-TAC is within our anticipation because PG-TAC combines the prior information and structural information inferred from tensor. From Equation (13), we know that inferred layer is actually the linear combination of prior statistics and NC-TAC. The only exception is on Age data set, which has severely unbalanced label distributions. Interestingly, DS-EM has the worst performance on Age data set among all methods. Even though the prior statistics is severely biased, PG-TAC still can achieve competitive results.

6 Conclusion

In this paper, we propose two novel methods (PG-TAC and RS-TAC) to infer the true labels of items in both binary and multi-class crowdsourcing settings. These methods capture the structure information in the data by representing the noisy labels provided by workers with tensors. Furthermore, we propose to augment the data tensor with an extra ground truth layer, and explore various tensor completion techniques to infer the true labels in the ground truth layer. Our experiment results on 6 real data set demonstrate that our proposed methods outperform state-of-the-art techniques.

Acknowledgements

The research was partially supported by NSF research grant IIS-1552654; and by an IBM Faculty Award. The views and

conclusions are those of the authors and should not be interpreted as representing the ofﬁcial policies of the funding agencies or the U.S. Government.

References [Cai et al., 2010] Jian-Feng Cai, Emmanuel J. Cand es, and

Zuowei Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20:1956 1982, 2010. [Cand es and Recht, 2009] Emmanuel J. Cand es and Ben-

jamin Recht. Exact matrix completion via convex optimization. Foundations of Computational Mathematics, pages 717 772, 2009. [Cand es and Tao, 2010] Emmanuel J. Cand es and Terence

Tao. The power of convex relaxation: Near-optimal matrix completion. IEEE Transaction of Information Theory., pages 2053 2080, 2010. [Dawid et al., 1979] P. Dawid, A. M. Skene, A. P. Dawidt,

and A. M. Skene. Maximum likelihood estimation of observer error-rates using the em algorithm. Applied Statistics, pages 20 28, 1979. [Gandy et al., 2011] Silvia Gandy, Benjamin Recht, and Isao

Yamada. Tensor completion and low-n-rank tensor recovery via convex optimization. Inverse Problems, 2011. [Han et al., 2015] Hu Han, Charles Otto, Xiaoming Liu, and

Anil K. Jain. Demographic estimation from face images: Human vs. machine performance. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1148 1161, 2015. [Huberman et al., 2009] Bernardo A. Huberman, Daniel M.

Romero, and Fang Wu. Crowdsourcing, attention and productivity. Journal of Information Science, 35:758 765, 2009. [Karger et al., 2011] David R. Karger, Sewoong Oh, and De-

vavrat Shah. Budget-optimal task allocation for reliable crowdsourcing systems. Co RR, abs/1110.3564, 2011. [Kittur et al., 2008] Aniket Kittur, Ed H. Chi, and Bongwon

Suh. Crowdsourcing user studies with mechanical turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2008. [Kolda and Bader, 2009] Tamara G. Kolda and Brett W.

Bader. Tensor decompositions and applications. SIAM Review, 51:455 500, 2009. [Liu et al., 2012a] Ji Liu, Przemyslaw Musialski, Peter Wonka, and Jieping Ye. Tensor completion for estimating missing values in visual data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35:208 220, 2012. [Liu et al., 2012b] Qiang Liu, Jian Peng, and Alexander T

Ihler. Variational inference for crowdsourcing. In Advances in Neural Information Processing Systems, 2012. [Mineiro, 2011] P. Mineiro. http://www.machinedlearnings.

com/2011/08/low-rank-confusion-modeling-of.html, 2011.

[Raykar et al., 2010] Vikas C. Raykar, Shipeng Yu, Linda H.

Zhao, Gerardo Hermosillo Valadez, Charles Florin, Luca Bogoni, and Linda Moy. Learning from crowds. Journal of Machine Learning Research, 11:1297 1322, 2010. [Recht, 2011] Benjamin Recht. A simpler approach to ma-

trix completion. Journal of Machine Learning Research., 12:3413 3430, 2011. [Snow et al., 2008] Rion Snow, Brendan O Connor, Daniel

Jurafsky, and Andrew Y. Ng. Cheap and fast but is it good?: Evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 254 263, 2008. [Tseng, 2001] Paul Tseng. Convergence of a block coordi-

nate descent method for nondifferentiable minimization. Journal of Optimization Theory and Applications, pages 475 494, 2001. [Welinder et al., 2010] Peter Welinder, Steve Branson, Serge

Belongie, and Pietro Perona. The multidimensional wisdom of crowds. In Advances in Neural Information Processing Systems, 2010. [Whitehill et al., 2009] Jacob Whitehill, Ting fan Wu, Jacob

Bergsma, Javier R. Movellan, and Paul L. Ruvolo. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Advances in Neural Information Processing Systems, 2009. [Xu et al., 2013] Yangyang Xu, Ruru Hao, Wotao Yin, and

Zhixun Su. Parallel matrix factorization for low-rank tensor completion. Co RR, 2013. [Zhang et al., 2014] Yuchen Zhang, Xi Chen, Dengyong

Zhou, and Michael I. Jordan. Spectral methods meet EM: A provably optimal algorithm for crowdsourcing. In Advances in Neural Information Processing Systems, 2014. [Zhou et al., 2012] Dengyong Zhou, John C. Platt, Sumit

Basu, and Yi Mao. Learning from the wisdom of crowds by minimax entropy. In Advances in Neural Information Processing Systems, 2012. [Zhou et al., 2015] Dengyong Zhou, Qiang Liu, John C.

Platt, Christopher Meek, and Nihar B. Shah. Regularized minimax conditional entropy for crowdsourcing. Co RR, abs/1503.07240, 2015.