# phase_transitions_in_the_pooled_data_problem__5f9118e2.pdf

Phase Transitions in the Pooled Data Problem

Jonathan Scarlett and Volkan Cevher Laboratory for Information and Inference Systems (LIONS)

École Polytechnique Fédérale de Lausanne (EPFL)

{jonathan.scarlett,volkan.cevher}@epﬂ.ch

In this paper, we study the pooled data problem of identifying the labels associated with a large collection of items, based on a sequence of pooled tests revealing the counts of each label within the pool. In the noiseless setting, we identify an exact asymptotic threshold on the required number of tests with optimal decoding, and prove a phase transition between complete success and complete failure. In addition, we present a novel noisy variation of the problem, and provide an information-theoretic framework for characterizing the required number of tests for general random noise models. Our results reveal that noise can make the problem considerably more difﬁcult, with strict increases in the scaling laws even at low noise levels. Finally, we demonstrate similar behavior in an approximate recovery setting, where a given number of errors is allowed in the decoded labels.

1 Introduction

Consider the following setting: There exists a large population of items, each of which has an associated label. The labels are initially unknown, and are to be estimated based on pooled tests. Each pool consists of some subset of the population, and the test outcome reveals the total number of items corresponding to each label that are present in the pool (but not the individual labels). This problem, which we refer to as the pooled data problem, was recently introduced in [1,2], and further studied in [3,4]. It is of interest in applications such as medical testing, genetics, and learning with privacy constraints, and has connections to the group testing problem [5] and its linear variants [6,7].

The best known bounds on the required number of tests under optimal decoding were given in [3]; however, the upper and lower bounds therein do not match, and can exhibit a large gap. In this paper, we completely close these gaps by providing a new lower bound that exactly matches the upper bound of [3]. These results collectively reveal a phase transition between success and failure, with the probability of error vanishing when the number of tests exceeds a given threshold, but tending to one below that threshold. In addition, we explore the novel aspect of random noise in the measurements, and show that this can signiﬁcantly increase the required number of tests. Before summarizing these contributions in more detail, we formally introduce the problem.

1.1 Problem setup

We consider a large population of items [p] = {1, . . . , p}, each of which has an associated label in [d] = {1, . . . , d}. We let = ( 1, . . . , d) denote a vector containing the proportions of items having each label, and we assume that the vector of labels itself, β = (β1, . . . , βp), is uniformly distributed over the sequences consistent with these proportions:

β Uniform(B( )), (1)

where B( ) is the set of length-p sequences whose empirical distribution is .

The goal is to recover β based on a sequence of pooled tests. The i-th test is represented by a (possibly random) vector X(i) 2 {0, 1}p, whose j-th entry X(i)

j indicates whether the j-th item is

31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.

Sufﬁcient for Pe ! 0 [3] Necessary for Pe 6! 1 [3] Necessary for Pe 6! 1

(this paper) p log p max r2{1,...,d 1} f(r) p log p 1

2f(1) p log p max r2{1,...,d 1} f(r)

Table 1: Necessary and sufﬁcient conditions on the number of tests n in the noiseless setting. The function f(r) is deﬁned in (5). Asymptotic multiplicative 1 + o(1) terms are omitted.

Noiseless testing Noisy testing (SNR = p (1))

Noisy testing (SNR = (log p) (1))

Noisy testing (SNR = (1))

p log log p

Table 2: Necessary and sufﬁcient conditions on the number of tests n in the noisy setting. SNR denotes the signal-to-noise ratio, and the noise model is given in Section 2.2.

included in the i-th test. We deﬁne a measurement matrix X 2 {0, 1}n p whose i-th row is given by X(i) for i = 1, . . . , n, where n denotes the total number of tests. We focus on the non-adaptive testing scenario, where the entire matrix X must be speciﬁed prior to performing any tests.

In the noiseless setting, the i-th test outcome is a vector Y (i) = (Y (i)

1 , . . . , Y (i)

d ), with t-th entry

t = Nt(β, X(i)), (2)

where for t = 1, . . . , d, we let Nt(β, X) = P

j2[p] 1{βj = t \ Xj = 1} denote the number of items with label t that are included in the test described by X 2 {0, 1}p. More generally, in the possible presence of noise, the i-th observation is randomly generated according to

Y (i) | X(i), β

PY |N1(β,X(i))...Nd(β,X(i)) (3) for some conditional probability mass function PY |N1,...,Nd (or density function in the case of continuous observations). We assume that the observations Y (i) (i = 1, . . . , n) are conditionally independent given X, but otherwise make no assumptions on PY |N1,...,Nd. Clearly, the noiseless model (2) falls under this more general setup.

Similarly to X, we let Y denote an n d matrix of observations, with the i-th row being Y (i). Given X and Y, a decoder outputs an estimate ˆβ of β, and the error probability is given by

Pe = P[ˆβ 6= β], (4) where the probability is with respect to β, X, and Y. We seek to ﬁnd conditions on the number of tests n under which Pe attains a certain target value in the limit as p ! 1, and our main results provide necessary conditions (i.e., lower bounds on n) for this to occur. As in [3], we focus on the case that d and are ﬁxed and do not depend on p.1

1.2 Contributions and comparisons to existing bounds

Our focus in this paper is on information-theoretic bounds on the required number of tests that hold regardless of practical considerations such as computation and storage. Among the existing works in the literature, the one most relevant to this paper is [3], whose bounds strictly improve on the initial bounds in [1]. The same authors also proved a phase transition for a practical algorithm based on approximate message passing [4], but the required number of tests is in fact signiﬁcantly larger than the information-theoretic threshold (speciﬁcally, linear in p instead of sub-linear).

Table 1 gives a summary of the bounds from [3] and our contributions in the noiseless setting. To deﬁne the function f(r) therein, we introduce the additional notation that for r = {1, . . . , d 1}, (r) = ( (r)

1 , . . . , (r)

r ) is a vector whose ﬁrst entry sums the largest d r + 1 entries of , and whose remaining entries coincide with the remaining r 1 entries of . We have

f(r) = max r2{1,...,d 1}

2(H( ) H( (r)))

meaning that the entries in Table 1 corresponding to the results of [3] are given as follows:

1More precisely, should be rounded to the nearest empirical distribution (e.g., in 1-norm) for sequences β 2 [d]p of length p; we leave such rounding implicit throughout the paper.

1 2 3 4 5 6 7 8 9 r

Random : Uniform : Highly non-uniform :

Figure 1: The function f(r) in (5), for several choices of , with d = 10. The random are drawn uniformly on the probability simplex, and the highly non-uniform choice of is given by = (0.49, 0.49, 0.0025, . . . , 0.0025). When the maximum is achieved at r = 1, the bounds of [3] coincide up to a factor of two, whereas if the maximum is achieved for r > 1 then the gap is larger.

(Achievability) When the entries of X are i.i.d. on Bernoulli(q) for some q 2 (0, 1) (not

depending on p), there exists a decoder such that Pe ! 0 as p ! 1 with

max r2{1,...,d 1}

2(H( ) H( (r)))

for arbitrarily small > 0. (Converse) In order to achieve Pe 6! 1 as p ! 1, it is necessary that

for arbitrarily small > 0.

Unfortunately, these bounds do not coincide. If the maximum in (6) is achieved by r = 1 (which occurs, for example, when is uniform [3]), then the gap only amounts to a factor of two. However, as we show in Figure 1, if we compute the bounds for some random choices of then the gap is typically larger (i.e., r = 1 does not achieve the maximum), and we can construct choices where the gap is signiﬁcantly larger. Closing these gaps was posed as a key open problem in [3].

We can now summarize our contributions as follows:

1. We give a lower bound that exactly matches (6), thus completely closing the above-mentioned

gaps in the existing bounds and solving the open problem raised in [3]. More speciﬁcally, we show that Pe ! 1 whenever n p log p

maxr2{1,...,d 1}

2(H( ) H( (r)))

(1 ) for some > 0, thus identifying an exact phase transition a threshold above which the error probability vanishes, but below which the error probability tends to one. 2. We develop a framework for understanding variations of the problem consisting of random

noise, and give an example of a noise model where the scaling laws are strictly higher compared to the noiseless case. A summary is given in Table 2; the case SNR = (log p) (1) reveals a strict increase in the scaling laws even when the signal-to-noise ratio grows unbounded, and the case SNR = (1) reveals that the required number of tests increases from sub-linear to super-linear in the dimension when the signal-to-noise ratio is constant. 3. In the supplementary material, we discuss how our lower bounds extend readily to the approx-

imate recovery criterion, where we only require β to be identiﬁed up to a certain Hamming distance. However, for clarity, we focus on exact recovery throughout the paper.

In a recent independent work [8], an adversarial noise setting was introduced. This turns out to be fundamentally different to our noisy setting. In particular, the results of [8] state that exact recovery is impossible, and even with approximate recovery, a huge number of tests (i.e., higher than polynomial) is needed unless = O

, where qmax is the maximum allowed reconstruction error measured by the Hamming distance, and is maximum adversarial noise amplitude. Of course, both random and adversarial noise are of signiﬁcant interest, depending on the application.

Notation. For a positive integer d, we write [d] = {1, . . . , d}. We use standard information-theoretic notations for the (conditional) entropy and mutual information, e.g., H(X), H(Y |X), I(X; Y |Z) [9]. All logarithms have base e, and accordingly, all of the preceding information measures are in units of nats. The Gaussian distribution with mean µ and variance σ2 is denoted by N(µ, σ2). We use the standard asymptotic notations O( ), o( ), ( ), !( ) and ( ).

2 Main results

In this section, we present our main results for the noiseless and noisy settings. The proofs are given in Section 3, as well as the supplementary material.

2.1 Phase transition in the noiseless setting

The following theorem proves that the upper bound given in (6) is tight. Recall that for r = {1, . . . , d 1}, (r) = ( (r)

1 , . . . , (r)

r ) is a vector whose ﬁrst entry sums the largest d r + 1 entries of , and whose remaining entries coincide with the remaining r 1 entries of . Theorem 1. (Noiseless setting) Consider the pooled data problem described in Section 1.1 with a given number of labels d and label proportion vector (not depending on the dimension p). For any decoder, in order to achieve Pe 6! 1 as p ! 1, it is necessary that

max r2{1,...,d 1}

2(H( ) H( (r)))

for arbitrarily small > 0.

Combined with (6), this result reveals an exact phase transition on the required number of measurements: Denoting n = p log p

maxr2{1,...,d 1}

2(H( ) H( r))

, the error probability vanishes for n n (1 + ), tends to one for n n (1 ), regardless of how small is chosen to be. Remark 1. Our model assumes that β is uniformly distributed over the sequences with empirical distribution , whereas [3] assumes that β is i.i.d. on . However, Theorem 1 readily extends to the latter setting: Under the i.i.d. model, once we condition on a given empirical distribution, the conditional distribution of β is uniform. As a result, the converse bound for the i.i.d. model follows directly from Theorem 1 by basic concentration and the continuity of the entropy function.

2.2 Information-theoretic framework for the noisy setting

We now turn to general noise models of the form (3), and provide necessary conditions for the noisy pooled data problem in terms of the mutual information. General characterizations of this form were provided previously for group testing [10,11] and other sparse recovery problems [12,13].

Our general result is stated in terms of a maximization over a vector parameter = ( 1, . . . , d) with t 2 {0, . . . , tp} for all t. We will see in the proof that t represents the number of items of type t that are unknown to the decoder after p t t are revealed by a genie. We deﬁne the following:

Given and β, we let S be a random set of indices in [p] such that for each t 2 [d], the set

contains t indices corresponding to entries where β equals t. Speciﬁcally, we deﬁne S to be uniformly distributed over all such sets. Moreover, we deﬁne Sc

= [p] \ S . Given the above deﬁnitions, we deﬁne

? otherwise, (9)

where ? can be thought of as representing an unknown value. Hence, knowing βSc

amounts to knowing the labels of all items in the set Sc

. We deﬁne |B ( )| to be the number of sequences β 2 [d]p that coincide with a given βSc

on the entries not equaling ?, while also having empirical distribution overall. This number does not depend on the speciﬁc choice of Sc

. As an example, when t = p t for all t, we have S = [p], βSc

= (?, . . . , ?), and |B ( )| = |B( )|, deﬁned following (1)

We let k k0 denote the number of values in ( 1, . . . , d) that are positive.

With these deﬁnitions, we have the following result for general random noise models. Theorem 2. (Noisy setting) Consider the pooled data problem described in Section 1.1 under a general observation model of the form (3), with a given number of labels d and label proportion vector . For any decoder, in order to achieve Pe δ for a given δ 2 (0, 1), it is necessary that

n max : k k0 2

log |B ( )|

(1 δ) log 2

i=1 I(β; Y (i)|βSc

, X(i)). (10)

In order to obtain more explicit bounds on n from (10), one needs to characterize the mutual information terms, ideally forming an upper bound that does not depend on the distribution of the measurement matrix X. We do this for some speciﬁc models below; however, in general it can be a difﬁcult task. The following corollary reveals that if the entries of X are i.i.d. on Bernoulli(q) for some q 2 (0, 1) (as was assumed in [3]), then we can simplify the bound. Corollary 1. (Noisy setting with Bernoulli testing) Suppose that the entries of X are i.i.d. on Bernoulli(q) for some q 2 (0, 1). Under the setup of Theorem 2, it is necessary that

n max : k k0 2

log |B ( )|

(1 δ) log 2 I(X0, ; Y |X1, ) , (11)

where (X0, , X1, , Y ) are distributed as follows: (i) X0, (respectively, X1, ) is a concatenation of the vectors X0, (1), . . . , X0, (d) (respectively, X1, (1), . . . , X1, (d)), the t-th of which contains t (respectively, tp t) entries independently drawn from Bernoulli(q); (ii) Letting each Nt (t = 1, . . . , d) be the total number of ones in X0, (t) and X1, (t) combined, the random variable Y is drawn from PY |N1,...,Nd according to (3).

As well as being simpler to evaluate, this corollary may be of interest in scenarios where one does not have complete freedom in designing X, and one instead insists on using Bernoulli testing. For instance, one may not know how to optimize X, and accordingly resort to generating it at random.

Example 1: Application to the noiseless setting. In the supplementary material, we show that in the noiseless setting, Theorem 2 recovers a weakened version of Theorem 1 with 1 replaced by 1 δ o(1) in (8). Hence, while Theorem 2 does not establish a phase transition, it does recover the exact threshold on the number of measurements required to obtain Pe ! 0.

An overview of the proof of this claim is as follows. We restrict the maximum in (10) to choices of where each t equals either its minimum value 0 or its maximum value p t. Since we are in the noiseless setting, each mutual information term reduces to the conditional entropy of Y (i) = (Y (i)

1 , . . . , Y (i)

d ) given βSc

and X(i). For the values of t such that t = 0, the value Y (i)

deterministic (i.e., it has zero entropy), whereas for the values of t such that t = p t, the value Y (i)

t follows a hypergeometric distribution, whose entropy behaves as

(1 + o(1)).

In the case that X is i.i.d. on Bernoulli(q), we can use Corollary 1 to obtain the following necessary condition for Pe δ as as p ! 1, proved in the supplementary material:

n p log(pq(1 q))

max r2{1,...,d 1}

2(H( ) H( r))

(1 δ o(1)) (12)

for any q = q(p) such that both q and 1 q behave as !

. Hence, while q = (1) recovers the threshold in (8), the required number of tests strictly increases when q = o(1), albeit with a mild logarithmic dependence.

Example 2: Group testing. To highlight the versatility of Theorem 2 and Corollary 1, we show that the latter recovers the lower bounds given in the group testing framework of [11].

Set d = 2, and let label 1 represent defective items, and label 2 represent non-defective items. Let PY |N1N2 be of the form PY |N1 with Y 2 {0, 1}, meaning the observations are binary and depend only on the number of defective items in the test. For brevity, let k = p 1 denote the total number of defective items, so that p 2 = p k is the number of non-defective items.

Letting 2 = p k in (11), and letting 1 remain arbitrary, we obtain the necessary condition

n max 12{1,...,k}

(1 δ) log 2

I(X0, 1; Y |X1, 1) , (13)

where X0, 1 is a shorthand for X0, with = ( 1, p k), and similarly for X1, 1. This matches the lower bound given in [11] for Bernoulli testing with general noise models, for which several corollaries for speciﬁc models were also given.

Example 3: Gaussian noise. To give a concrete example of a noisy setting, consider the case that we observe the values in (2), but with each such value corrupted by independent Gaussian noise:

t = Nt(β, X(i)) + Z(i)

t N(0, pσ2) for some σ2 > 0. Note that given X(i), the values Nt themselves have variance at most proportional to p (e.g., see Appendix C), so σ2 = (1) can be thought of as the constant signal-to-noise ratio (SNR) regime.

In the supplementary material, we prove the following bounds for this model:

By letting each t in (10) equal its minimum or maximum value analogously to the noiseless case

above, we obtain the following necessary condition for Pe δ as p ! 1:

max G [d] : |G| 2

1 + t 4σ2 )

(1 δ o(1)), (15)

where p G := P

t2G tp, and G has entries t P

t02G t0 for t 2 G. Hence, we have the following:

In the case that σ2 = p c for some c 2 (0, 1), each summand in the denominator simpliﬁes

(1 + o(1)), and we deduce that compared to the noiseless case (cf., (8)), the asymptotic number of tests increases by at least a constant factor of 1

c. In the case that σ2 = (log p) c for some c > 0, each summand in the denominator simpliﬁes

2 log log p

(1+o(1)), and we deduce that compared to the noiseless case, the asymptotic number of tests increases by at least a factor of log p c log log p. Hence, we observe a strict increase in the scaling laws despite the fact that the SNR grows unbounded. While (15) also provides an (p) lower bound for the case σ2 = (1), we can in fact do

better via a different choice of (see below). By letting 1 = p 1, 2 = 1, and t = 0 for t = 3, . . . , d, we obtain the necessary condition

(1 δ o(1)) (16)

for Pe δ as p ! 1. Hence, if σ2 = (1), we require n = (p log p); this is super-linear in the dimension, in contrast with the sub-linear

behavior observed in the noiseless case. Note that this choice of essentially captures the difﬁculty in identifying a single item, namely, the one corresponding to 2 = 1.

These ﬁndings are summarized in Table 2; see also the supplementary material for extensions to the approximate recovery setting. Remark 2. While it may seem unusual to add continuous noise to discrete observations, this still captures the essence of the noisy pooled data problem, and simpliﬁes the evaluation of the mutual information terms in (10). Moreover, this converse bound immediately implies the same bound for the discrete model in which the noise consists of adding a Gaussian term, rounding, and clipping to {0, . . . , p}, since the decoder could always choose to perform these operations as pre-processing.

Here we provide the proof of Theorem 1, along with an overview of the proof of Theorem 2. The remaining proofs are given in the supplementary material.

3.1 Proof of Theorem 1

Step 1: Counting typical outcomes. We claim that it sufﬁces to consider the case that X is deterministic and ˆβ is a deterministic function of Y; to see this, we note that when either of these are random we have Pe = EX, ˆβ[Pβ[error]], and the average is lower bounded by the minimum.

The following lemma, proved in the supplementary material, shows that for any X(i), each entry of the corresponding outcome Y (i) lies in an interval of length O

with high probability.

Lemma 1. For any deterministic test vector X 2 {0, 1}p, and for β uniformly distributed on B( ), we have for each t 2 [d] that

h**Nt(β, X) E[Nt(β, X)]

By Lemma 1 and the union bound, we have with probability at least 1 2nd

**Nt(β, X(i)) E[Nt(β, X(i))]

** pp log p for all i 2 [n] and t 2 [d]. Letting this event be denoted by A, we have

Pe P[A] P[A \ no error] 1 2nd

p2 P[A \ no error]. (18)

Next, letting Y(β) 2 [p]n d denote Y explicitly as a function of β and similarly for ˆβ(Y) 2 [d]p, and letting YA denote the set of matrices Y under which the event A occurs, we have

P[A \ no error] = 1 |B( )|

1{Y(b) 2 YA \ ˆβ(Y(b)) = b} (19)

|B( )|, (20)

where (20) follows since each each Y 2 YA can only be counted once in the summation of (19), due to the condition ˆβ(Y(b)) = b.

Step 2: Bounding the set cardinalities. By a standard combinatorial argument (e.g., [14, Ch. 2]) and the fact that is ﬁxed as p ! 1, we have

|B( )| = ep(H( )+o(1)). (21)

To bound |YA|, ﬁrst note that the entries of each Y (i) 2 [p]d sum to a deterministic value, namely, the number of ones in X(i). Hence, each Y 2 YA is uniquely described by a sub-matrix of Y 2 [p]n d of size n (d 1). Moreover, since YA only includes matrices under which A occurs, each value in this sub-matrix only takes one of at most 2pp log p + 1 values. As a result, we have

p log p + 1

$n(d 1), (22)

and combining (18) (22) gives

2pp log p + 1

ep(H( )+o(1)) 2nd

Since d is constant, it immediately follows that Pe ! 1 whenever n p H( ) (d 1) log(2pp log p+1)(1 )

for some > 0. Applying log(2pp log p + 1) =

(1 + o(1)), we obtain the following necessary condition for Pe 6! 1:

n 2p H( ) (d 1) log p(1 ). (24)

This yields the term in (8) corresponding to r = 1.

Step 3: Genie argument. Let G be a subset of [d] of cardinality at least two, and deﬁne Gc = [d]\G. Moreover, deﬁne βGc to be a length-p vector with

? βj 2 G, (25)

where the symbol ? can be thought of as representing an unknown value. We consider a modiﬁed setting in which a genie reveals βGc to the decoder, i.e., the decoder knows the labels of all items for which the label lies in Gc, and is only left to estimate those in G. This additional knowledge can only make the pooled data problem easier, and hence, any lower bound in this modiﬁed setting remains valid in the original setting.

In the genie-aided setting, instead of receiving the full observation vector Y (i) = (Y (i)

1 , . . . , Y (i)

d ), it is equivalent to only be given {Y (i)

j : j 2 G}, since the values in Gc are uniquely determined

from βGc and X(i). This means that the genie-aided setting can be cast in the original setting with modiﬁed parameters: (i) p is replaced by p G = P

t2G tp, the number of items with unknown labels; (ii) d is replaced by |G|, the number of distinct remaining labels; (iii) is replaced by G, deﬁned to be a |G|-dimensional probability vector with entries equaling t P

t02G t0 (t 2 G).

Due to this equivalence, the condition (24) yields the necessary condition n 2p GH( G) (|G| 1) log p(1 ), and maximizing over all G with |G| 2 gives

n max G [d] : |G| 2

2p GH( G) (|G| 1) log p

Step 4: Simpliﬁcation. Deﬁne r = d |G|+1. We restrict the maximum in (26) to sets G indexing the highest |G| = d r + 1 values of , and consider the following process for sampling from :

Draw a sample v from (r) (deﬁned above Theorem 1);

If v corresponds to the ﬁrst entry of (r), then draw a random sample from G and output it as

a label (i.e., the labels have conditional probability proportional to the top |G| entries of );

Otherwise, if v corresponds to one of the other entries of (r), then output v as a label.

By Shannon s property of entropy for sequentially-generated random variables [15, p. 10], we ﬁnd that H( ) = H( (r))+

H( G). Moreover, since p G = p P

t2G j, this can be written as p GH( G) = p

H( ) H( (r))

. Substituting into (26), noting that |G| 1 = d r by the deﬁnition of r, and maximizing over r = 1, . . . , d 1, we obtain the desired result (8).

3.2 Overview of proof of Theorem 2

We can interpret the pooled data problem as a communication problem in which a message β is sent over a channel PY |N1,...,Nd via codewords of the form {(N (i)

1 , . . . , N (i)

i=1 that are constructed by summing various columns of X. As a result, it is natural to use Fano s inequality [9, Ch. 7] to lower bound the error probability in terms of information content (entropy) of β and the amount of information that Y reveals about β (mutual information).

However, a naive application of Fano s inequality only recovers the bound in (10) with = p . To handle the other possible choices of , we again consider a genie-aided setting in which, for each t 2 [d], the decoder is informed of p t t of the items whose label equals t. Hence, it only remains to identify the remaining t items of each type. This genie argument is a generalization of that used in the proof of Theorem 1, in which each t was either equal to its minimum value zero or its maximum value p t. In Example 3 of Section 2, we saw that this generalization can lead to a strictly better lower bound in certain noisy scenarios.

The complete proof of Theorem 2 is given in the supplementary material.

4 Conclusion

We have provided novel information-theoretic lower bounds for the pooled data problem. In the noiseless setting, we provided a matching lower bound to the upper bound of [3], establishing an exact threshold indicating a phase transition between success and failure. In the noisy setting, we provided a characterization of general noise models in terms of the mutual information. In the special case of Gaussian noise, we proved an inherent added difﬁculty compared to the noiseless setting, with strict increases in the scaling laws even when the signal-to-noise ratio grows unbounded.

An interesting direction for future research is to provide upper bounds for the noisy setting, potentially establishing the tightness of Theorem 2 for general noise models. This appears to be challenging using existing techniques; for instance, the pooled data problem bears similarity to group testing with linear sparsity, whereas existing mutual information based upper bounds for group testing are limited to the sub-linear regime [10, 11, 16]. In particular, the proofs of such bounds are based on concentration inequalities which, when applied to the linear regime, lead to additional requirements on the number of tests that prevent tight performance characterizations.

Acknowledgment: This work was supported in part by the European Commission under Grant ERC Future Proof, SNF Sinergia project CRSII2-147633, SNF 200021-146750, and EPFL Fellows Horizon2020 grant 665667.

[1] I.-H. Wang, S. L. Huang, K. Y. Lee, and K. C. Chen, Data extraction via histogram and

arithmetic mean queries: Fundamental limits and algorithms, in IEEE Int. Symp. Inf. Theory, July 2016, pp. 1386 1390. [2] I.-H. Wang, S. L. Huang, and K. Y. Lee, Extracting sparse data via histogram queries, in

Allerton Conf. Comm., Control, and Comp., 2016. [3] A. E. Alaoui, A. Ramdas, F. Krzakala, L. Zdeborova, and M. I. Jordan, Decoding from pooled

data: Sharp information-theoretic bounds, 2016, http://arxiv.org/abs/1611.09981. [4] , Decoding from pooled data: Phase transitions of message passing, 2017, http://arxiv.org/abs/1702.02279. [5] D.-Z. Du and F. K. Hwang, Combinatorial group testing and its applications, ser. Series on

Applied Mathematics. World Scientiﬁc, 1993. [6] A. Seb o, On two random search problems, J. Stat. Plan. Inf., vol. 11, no. 1, pp. 23 31, 1985. [7] M. Malyutov and H. Sadaka, Maximization of ESI. Jaynes principle for testing signiﬁcant

inputs of linear model, Rand. Opt. Stoch. Eq., vol. 6, no. 4, pp. 339 358, 1998. [8] W.-N. Chen and I.-H. Wang, Partial data extraction via noisy histogram queries: Information

theoretic bounds, in IEEE Int. Symp. Inf. Theory (ISIT), 2017. [9] T. M. Cover and J. A. Thomas, Elements of Information Theory. John Wiley & Sons, Inc., 2006. [10] M. Malyutov, The separating property of random matrices, Math. Notes Acad. Sci. USSR,

vol. 23, no. 1, pp. 84 91, 1978. [11] G. Atia and V. Saligrama, Boolean compressed sensing and noisy group testing, IEEE Trans.

Inf. Theory, vol. 58, no. 3, pp. 1880 1901, March 2012. [12] C. Aksoylar, G. K. Atia, and V. Saligrama, Sparse signal processing with linear and nonlinear

observations: A uniﬁed Shannon-theoretic approach, IEEE Trans. Inf. Theory, vol. 63, no. 2, pp. 749 776, Feb. 2017. [13] J. Scarlett and V. Cevher, Limits on support recovery with probabilistic models: An information-theoretic framework, IEEE Trans. Inf. Theory, vol. 63, no. 1, pp. 593 620, 2017. [14] I. Csiszár and J. Körner, Information Theory: Coding Theorems for Discrete Memoryless Sys-

tems, 2nd ed. Cambridge University Press, 2011. [15] C. E. Shannon, A mathematical theory of communication, Bell Syst. Tech. Journal, vol. 27,

pp. 379 423, July and Oct. 1948. [16] J. Scarlett and V. Cevher, Phase transitions in group testing, in Proc. ACM-SIAM Symp. Disc.

Alg. (SODA), 2016. [17] W. Hoeffding, Probability inequalities for sums of bounded random variables, J. Amer. Stat.

Assoc., vol. 58, no. 301, pp. 13 30, 1963. [18] J. Massey, On the entropy of integer-valued random variables, in Int. Workshop on Inf. The-

ory, 1988. [19] G. Reeves and M. Gastpar, The sampling rate-distortion tradeoff for sparsity pattern recovery

in compressed sensing, IEEE Trans. Inf. Theory, vol. 58, no. 5, pp. 3065 3092, May 2012. [20] , Approximate sparsity pattern recovery: Information-theoretic lower bounds, IEEE

Trans. Inf. Theory, vol. 59, no. 6, pp. 3451 3465, June 2013. [21] J. Scarlett and V. Cevher, How little does non-exact recovery help in group tesitng? in IEEE

Int. Conf. Acoust. Sp. Sig. Proc. (ICASSP), New Orleans, 2017. [22] , On the difﬁculty of selecting Ising models with approximate recovery, IEEE Trans.

Sig. Inf. Proc. over Networks, vol. 2, no. 4, pp. 625 638, 2016. [23] J. C. Duchi and M. J. Wainwright, Distance-based and continuum Fano inequalities with

applications to statistical estimation, 2013, http://arxiv.org/abs/1311.2669.