# scalable_and_efficient_nonadaptive_deterministic_group_testing__c9f1b206.pdf

Scalable and Efﬁcient Non-adaptive Deterministic

Group Testing

Dariusz R. Kowalski School of Computer and Cyber Sciences

Augusta University, USA dkowalski@augusta.edu

Dominik Pajak Department of Pure Mathematics Wroclaw University of Science and Technology,

Infermedica, Poland dominik.pajak@pwr.edu.pl

Group Testing (GT) is about learning a (hidden) subset K, of size k, of some large domain N, of size n k, using a sequence of queries. A result of a query provides some information about the intersection of the query with the unknown set K. The goal is to design efﬁcient (polynomial time) and scalable (polylogarithmic number of queries per element in K) algorithms for constructing queries that allow to decode every hidden set K based on the results of the queries. A vast majority of the previous work focused on randomized algorithms minimizing the number of queries; however, in case of large domains N, randomization may result in a signiﬁcant deviation from the expected precision of learning the set K. Others assumed unlimited computational power (existential results) or adaptiveness of queries (next query could be constructed taking into account the results of the previous queries) the former approach is less practical due to non-efﬁciency, and the latter has several drawbacks including non-parallelization. To avoid all the abovementioned drawbacks, for Quantitative Group Testing (QGT) where query result is the size of its intersection with the hidden set, we present the ﬁrst efﬁcient and scalable non-adaptive deterministic algorithms for constructing queries and decoding a hidden set K from the results of the queries these solutions do not use any randomization, adaptiveness or unlimited computational power.

1 Introduction

In the Group Testing (GT) ﬁeld, the goal is to identify all elements of an unknown set K by asking queries. Originally GT was applied to identifying infected individuals in large populations using pooled tests [26] which regained interest during COVID-19 pandemic [2, 52, 56]. Recently GT found applications in various areas of Machine Learning including approximating the nearest neighbor [28], simplifying multi-label classiﬁers [58] and accelerating forward pass of a deep neural network [51].

All we initially know about set K is that |K| k, for some known parameter k n, and that it is a subset of some much larger set N with |N| = n. The answer to a query Q depends on the intersection between K and Q and equals to Result(K \ Q), where Result is some result function (also called feedback function in this paper).

The sequence of queries is a correct solution to Group Testing if and only if for any two different sets K1, K2 (satisfying some cardinality restriction), the sequence of answers for K1 and K2 is different. Note that although this allows to uniquely identify the hidden set K based on the results of the queries, in some cases decoding of set K could be a hard computational problem. The objective is: for a given deterministic feedback function Result( ), to ﬁnd a ﬁxed sequence of queries that will identify any set K, and the length of this sequence, called the query complexity, is as short as possible. In particular, we are interested in solutions that have query complexity logarithmic in n and polynomial in k.

36th Conference on Neural Information Processing Systems (Neur IPS 2022).

Constructive algorithms Lower bound

1 O(k2 log n) [54]

k O(k2 log n) [54]

(folklore) e O(k) Thm 1 (for = k)

Table 1: Asymptotic bounds on query complexity of solutions to non-adaptive deterministic QGT with F feedback. By constructive upper bound we mean query construction and decoding of hidden set in time poly(n). Symbol stands for any valid value of the parameter 2 {1, . . . , k}, notations

e O and e disregard polylogarithmic factors.

The most popular classical variant, present in the literature, considers function Result( ) that answers whether the intersection between K and Q is empty or not, c.f., [27]; it is also known under the name of beeping. Another popular result function returns the intersection size; this variant has also been studied under the name of coin weighting [4, 24] and Quantitative Group Testing (QGT) [32, 29].

In this paper we study the problem of Group Testing under a more general capped quantitative result, where the result (feedback) is the size of the intersection up to some parameter 2 {1, . . . , k}, and for larger intersections. It subsumes and generalizes the two previously described classical result functions: the smallest case of = 1 corresponds to the classical empty/non-empty feedback (beeping), while the case = k captures the (full) quantitative feedback (classical QGT). Motivation for our research is twofold: ﬁrst, to better understand the intermediate settings between the two classical GT models; second, to ﬁnd efﬁcient solutions for the capped settings that could be applied to other problems (see the Appendix).

Our focus is on non-adaptive deterministic solutions, in which queries are ﬁxed (i.e., the same sequence applies to all hidden sets) and allow to discover (decode) any hidden set based on - capped results. The primary goal is to minimize the number of queries, called query complexity, and the second objective is to compute the queries and decode the hidden set in polynomial time. All existing polynomial construction/decoding algorithms for beeping or quantitative feedbacks use (min{n, k2 log n}) queries, c.f., [54]. General constructions developed in this work employ the concept of dispersion, c.f., [57], and combine it appropriately with other techniques. When instantiated for a speciﬁc feedback parameter 2 {1, . . . , k}, (e.g., =

k), they shrink the gap between the upper and lower bounds on the number of efﬁciently constructed queries nearly exponentially: from linear to polylogarithmic gap per hidden element. Together with our lower bound, they explain why sometimes a much smaller feedback is sufﬁcient for decoding sets with similar efﬁciency. We show that parameter has also different interpretations it inversely relates to the number of occurrences of an element in the queries and to fault-tolerance (c.f., Section 6).

1.1 Our Results (see also Table 1)

We generalize the classical beeping and QGT models of Group Testing by considering an -capped results, also called an -capped feedback function, for an 2 {1, . . . , k}: for a hidden set K and query Q it returns F (Q \ K) = min{|Q \ K|, }. We study the query complexity of non-adaptive deterministic QGT in this general model. We focus on scalable (polylogarithmic number of queries per element) and efﬁcient (polynomial-time) constructing/decoding algorithms, and we show that it could be achieved if and only if the information cap = e (

k). In particular, it holds for classical QGT.

Main result Polynomial-time construction/decoding algorithms using almost optimal number of queries (Algorithms 1 and 2 in Section 4). Here almost-optimality means that the length of the constructed query sequence is only polylogarithmically longer than the shortest possible sequence. The previous best polynomial-time solution used (min{n, k2 log n}) queries for all 1 [54], and we shrink it by factor (min{ 2, k} polylog 1 n). To achieve this goal, we deﬁne and build new types of selectors, called (Strong) Selectors under Interference (SSu I and Su I, for short) based on the concept of (superimposed) dispersion. Although dispersion has been used before in GT, c.f., [10, 44], as well as the classical superimposed codes, our technical novelty is twofold: ﬁrst, we identiﬁed additional properties of selectors (e.g., avoiding interfering set with capped intersections), that could

not be formally deducted from the deﬁnitions of previous types of selectors, but allow to improve the solutions to QGT; second, we applied careful mathematical analysis to choose speciﬁc parameters of dispersers and superimposed codes and to prove that the new selectors properties hold.

We also generalize the concept of Round-Robin query systems, where each query is a singleton, to -Round-Robin query systems, containing sets of size at most . Such sequences are shorter than the simple Round-Robin, i.e., have length O((n/ ) polylog n), and, unlike a simple Round-Robin singletons structure, are challenging to construct in a way to allow correct decoding based on -capped feedback. Theorem 1. There is an explicit polynomial-time algorithm constructing non-adaptive queries Q1, . . . , Qm, for m = O

polylog n + k polylog n

, that solve Group Testing un-

der feedback F with polynomial-time decoding. Moreover, every element occurs in O( k

polylog n) queries, and the decoding time is O(m + k2

polylog n + k polylog n).

One of the consequences is that having the result function capped at =

k, we obtain similar number of queries as with the actual result (i.e., returning the size of the whole intersection, up to k), which together with our next lower bound establishes an interesting inherited property of GT. The value =

k seems to be a sweet spot balancing two different phenomena superimposed coding and dispersion. Enriching the beeping model (i.e., increases starting from 1) allows to improve the superquadratic query complexity of classical superimposed coding (used for the beeping feedback), and the improvement is roughly quadratic in terms of , due to the nature of superimposed codes. On the other hand, limiting the full quantitative model (i.e., decreases starting from k) increases the number of queries, starting from slightly superlinear, roughly linearly in , as the nature of dispersion is linear. Both these phenomena equalize query complexities at =

k. This is illustrated in our construction in the beginning, the progress in learning relies on dispersion-based Su I s for small , then with the growth of the length of used Su Is becomes similar to the length of the code-based SSu I; it happens for =

k. The SSu I allows to discover all the remaining elements and ﬁnishes the learning process. The last fact explains why further increase of is not needed.

Lower bound (Section 5). The almost-optimality of our algorithms from Theorem 1 is justiﬁed by proving an absolute lower bound on the length of sequences allowing to decode a hidden set based on feedback F to the queries. Here by absolute we mean that it holds for all query systems that allow for decoding of the hidden sets based on feedback F , not restricted to polynomially constructed queries with polynomial decoding algorithm. Even more, some components of the lower bound are general: they hold for any -capped feedback function, which will be formally deﬁned later in Section 3. The lower bound has three components: (k/ )2, n/ , and k log n

k log . For different ranges of k, different components determine the value of the lower bound. Note that these components match the corresponding components in our constructive upper bound (Main result in Theorem 1), up to a polylogarithmic factor. Theorem 2. Any non-adaptive algorithm solving Group Testing needs:

queries under feedback F .

queries under any feedback capped at .

Document structure. We discuss related work on various variants of GT in Section 2. In Section 3 we formally deﬁne the QGT problem and the generalized -capped-results model. In Section 4 we present our polynomial-time query construction and decoding algorithms. Section 5 proves the lower bound. Discussion of extensions, applications and future directions is given in Section 6. Additional Appendix contains more details: Appendix A discusses other related work; Appendix B shows details of extensions and applications; Our new selector tools are constructed and analyzed in Appendix C.

2 Previous and Related Work (see also Tables 1 and 2)

Group Testing with general feedback functions have been studied in [47], where two parameters of feedback functions (cap and expressiveness) are shown to inﬂuence the necessary number of queries.

In the standard feedback model, considered in most of the Group Testing literature [27], the feedback tells whether the intersection between query Q and set K is empty or not (sometimes it is also

Upper bound First work with asymptotic bound Difference from this paper

[37] existential (super-exponential construction time)

[4] adaptive

[55] randomized (only with some probability)

Table 2: Other related work on adaptive, randomized or existential solutions to QGT for = k.

called a beeping model). It it is a special case F1 of our feedback function. In this feedback model, Group Testing is known to be solvable using O(k2 log(n/k)) queries [21] and an explicit polynomialtime construction of length O(k2 log n) exists [54]. Best known lower bound (for k < pn) is (k2 log n/ log k) [12]. This model is also similar to Angluin s notion of concept learning with disjointness queries [1]. The main differences are that in our model the learner is non-adaptive and that in the disjointness queries the feedback might include an element from the intersection.

The setting considered in this paper is also a generalization of an existing problem of coin weighting. In this problem, we have a set of n coins of two distinct weights w0 (true coin) and w1 (counterfeit coin), out of which up to k are counterfeit ones. We are allowed to weigh any subset of coins on a spring scale, hence we can deduce the number of counterfeit coins in each weighting. The task is to identify all the counterfeit coins. Such a feedback is a special case Fk of our feedback function. The problem is solvable with O(k log(n/k)/ log k) [37] non-adaptive (i.e., ﬁxed in advance) queries and matching a standard information-theoretic lower bound of (k log(n/k)/ log k), as well as its stronger version proved for randomized strategies [24]. [4] considers the problem of explicit polynomial-time construction of O(k log(n/k)/ log k) queries that allows for polynomial time identiﬁcation of the counterfeit coins. However, the algorithm presented in [4] is adaptive, which means that the subsequent queries can depend on the feedback from the previous ones. The only existing, constructive, non-adaptive solution would be using the explicit construction of the superimposed codes [46] but the resulting query complexity would be O(k2 polylog n). Thus, the solution in our paper is the ﬁrst explicit polynomial time algorithm constructing non-adaptive queries allowing for fast decoding of set K, with O(k polylog n) ﬁxed queries.

An important line of work on non-adaptive randomized solutions to Quantitative Group Testing, started by [55] and being continued until recently [32, 14, 29, 3], resulted in a number of algorithms nearing the lower bound of 2k ln(n/k)

ln k from [25]. However, these results always assume some restriction on k (typically k n for some 0 < < 1), and similarly as above, they may result in signiﬁcant deviation from the actual set if n is large. They also require a large number of truly random bits (in practice, algorithms have access only to small truly random seeds, while randomized solutions typically have large number of (n log n log k) random bits to construct the queries) and typically do not offer any provable guarantees in case when an adaptive adversary could create/change the hidden set or introduce errors online during the learning process.

The bounds obtained in this paper match (up to polylogarithmic factors) the best existing results for the extreme cases of = 1 and = k. The paper also bounds how the query complexity depends on the value of between these extremes. Interestingly we show, that the shortest-possible query complexity of k polylog n is already possible for =

k and increasing from

k to k does not result in further decrease of the query complexity.

Other interesting feedback function is a Threshold Group Testing [20, 23], where the feedback model includes a set of thresholds and the feedback function returns whether or not the size of the intersection is larger or smaller than each threshold. In [22] the authors show that it is possible to deﬁne an interval of pk log k thresholds resulting in an algorithm with O(k log(n/k)/ log k) queries. Note that both those feedbacks are inefﬁcient in view of our setting of -capped feedbacks, because their feedback functions are not capped at any < k, but they achieve similar query complexity as our capped Fp

k feedback.

Superimposed codes and dispersers were already used in Group Testing. However, they either led to a super-quadratic (in k) number of queries, c.f., [46, 9], or decoded only a fraction of elements of the hidden set, see [44, 41]. Recall that in solutions where query complexity depends on the number of identiﬁed elements, decoding of all the elements requires over k2 queries, as proved in [21, 10]. [45]

presents the ﬁrst Group Testing solution with poly(k, log n) decoding time, but super-quadratic query complexity. It is worth noting that our generalized solution achieves almost-linear number of queries (for certain values of ) and our decoding algorithm identiﬁes all the elements in time polynomial in k and logarithmic in n. Our use of the known tools is different then in previous approaches: we deﬁne new properties (Su I, SSu I), which we prove to be satisﬁed by some combinations of those tools, and lead to efﬁcient solutions in both query complexity and construction/decoding time.

3 The Model and the Problem

We assume that the universe of all elements N, with |N| = n, is enumerated with integers 1, 2, . . . , n. Throughout the paper, we will associate an element with its identiﬁer. Let K, with |K| k, denote a hidden subset of N chosen arbitrarily by an adversary. Let Q = h Q1, . . . , Qmi be a non-adaptive algorithm, represented by a sequence of m queries ﬁxed prior to an execution.

A general feedback function F is a function from subsets of N into an arbitrary domain. A function is applied to K\Q. A meaningful feedback function can range from simple empty/non-empty feedback up to returning all elements of the intersection. Consider a feedback function F , that returns the size of an intersection if it is at most and for larger intersections, i.e., F (Q\K) = min{|Q\K|, }. Parameter in feedback F is called a feedback cap. A general class of feedback functions (used in our lower bound) with feedback cap includes all deterministic functions that take subsets of N as input and for sets with more than elements output some arbitrary ﬁxed value.

We say that Q solves Quantitative Group Testing (QGT), if the feedback vector allows for unique identiﬁcation of set K. The feedback vector is deﬁned as h F (Q1 \ K), F (Q2 \ K), . . . , F (Qt \ K)i. Thus, in order to solve QGT, the feedback vectors for any two sets K1 and K2 have to be different. We say that Q solves QGT with polynomial-time reconstruction if there exists a polynomialtime algorithm that, given the feedback vector outputs all the identiﬁers of the elements from K. Finally we say that Q is constructible in polynomial time if there exists a polynomial-time algorithm, that given parameters n, k, outputs an appropriate sequence of queries.

We assume that both coupled algorithms, construction and decoding, know n, k, F . W.l.o.g., in order to avoid rounding in the presentation, we assume that n and other crucial parameters are powers of 2.

4 Polynomial-time Constructions and Decoding

4.1 Combinatorial Tools

In this section we introduce two new combinatorial tools (Selectors-under-Interference and Strong Selectors-under-Interference) for our QGT solutions, and recall Dispersers and Balanced IDs.

For given sets K1, K2 N and an element v 2 K1, we say that S N selects v from K1 under -interference from K2 if S \ K1 = {v} and |S \ K2| < . Intuitively, v is a unique representative of K1 in S and the number of representatives of K2 in S is smaller than .

Deﬁnition 1 (Selector under Interference (Su I)). An (n, , , , )-Selector-under-Interference, (n, , , , )-Su I for short, is a sequence of queries S = (S1, . . . , Sx) satisfying: for every set

K1 N of at most elements and set K2 N of at most elements, there are at most elements v 2 K1 that are not selected from K1 under -interference from K2 by any query Si 2 S, i.e., set {v 2 K1 : 8i x Si \ K1 6= {v} or |Si \ K2| } has less than elements.

Disperser. Consider a bipartite graph G = (V, W, E), where |V | = n, which is an ( = , d, )- disperser with entropy loss δ, i.e., it has left-degree d, |W| = d/δ, and satisﬁes the following dispersion condition: for each L V such that |L| , the set NG(L) of neighbors of L in graph G is of size at least (1 )|W|. Note that it is enough for us to take as in the dispersion property the same value as in the constructed (n, , , , )-Su I S.1 An explicit construction (i.e., in time polynomial in n) of dispersers was given by [57], for any n , and some δ = O(log3 n), where d = O(polylog n).2 In the following the notation d and δ will always denote the parameters of the disperser. Note that any improved construction of dispersers will immediately reduce the complexities of our algorithms. In Appendix C.1 we will describe two disperser-based polynomialtime constructions of Su I and prove:

1If someone considers 1/3, we could use construction for = 1/3 to get solution with better guarantees. 2Optimal dispersers have δ = O(1) and d = O(log n), but their polynomial-time construction is an open problem.

Theorem 3. There is an explicit polynomial-time construction of an (n, , , , )-Su I, for any , any k such that /(3δ) > , for any constant 2 (0, 1/3), of size O(min

n, dδ log2 n

). Moreover, every element occurs in O(dδ log n) queries.

Theorem 4. There is an explicit polynomial-time construction of an (n, , , , )-Su I, for any , any k such that 3δ / for any constant 2 (0, 1/3), of size O

. Moreover, every element occurs in O(dδ log n) queries.

Deﬁnition 2 (Strong Selector under Interference (SSu I)). An (n, , , )-Strong-Selector-under Interference, (n, , , )-SSu I for short, is a sequence of queries T = (T1, . . . , Tx) satisfying: for every set K1 N of at most elements and set K2 N of at most elements, every element v 2 K1 is selected from K1 under -interference from K2 by some query Ti 2 T , i.e., set {v 2 K1 : 8i x Ti \ K1 6= {v} or |Ti \ K2| } is empty.

An (n, , , )-Strong-Selector-under-Interference could be also viewed as (n, , 0, , )-Selectorunder-Interference. In Appendix C.2 we describe a polynomial-time construction of SSu I, which essentially is a [46] construction for adjusted parameters, and prove that it satisﬁes the SSu I property. Theorem 5. There is an explicit polynomial-time construction of an (n, , , )-SSu I of length O( 2 log2

n), provided 3δ / . Moreover, every element occurs in O( log n) queries.

Balanced IDs. Each element i in N has a unique ID represented by 2 log2 n bits, in which the number of 1 s is the same as the number of 0 s; e.g., take a binary representations of elements i and n i, each in log2 n bits, and concatenate them (recall n = |N|). Balanced IDs have previously been used in algorithms for decoding elements in Group Testing (see e.g., [50]).

4.2 Construction of Queries, Decoding and Analysis (Proof of Theorem 1)

Algorithm constructing queries. Assume 2 and k > 3δk/( 1) (the opposite case will be considered later) and let us take (n, , 1/2, k, 1)-Su I S( ), for being a power of 2 ranging down from k to 3δk/( 1) (w.l.o.g. we could also assume that 3δk/( 1) is a power of 2). Next, for each set S in these selectors we add the following family R(S) = {Ri(S)}2 log2 n

i=1 of sets Ri(S) = {v 2 S : bv/2i 1c = 1 mod 2}. Intuitively Ri(S) is the set of elements from S that have 1 on i-th least signiﬁcant bit of Balanced ID. Let us call the obtained enhanced selectors (i.e., with additional families R(S), for every set S in the original selector) S( ). Then we concatenate selectors R(S( )), starting from the largest = k, to the smallest value = 3δk/( 1). An (n, 3δk/( 1), k, 1)-SSu I T is concatenated at the end, with the same replacement of bits

1 and 0 that gives R(T ) as in the above (n, , 1/2, k, 1)-Su I s. In the case, where = 1 or k 3δk/( 1) then the sequence equals to (n, k, k, )-SSu I T concatenated as in the above with R(T ). Algorithm 1 presents a pseudocode of the construction algorithm for > 1.

Algorithm 1: Construction of a sequence of queries solving QGT with -capped feedback. Input: Disperser G of degree d and entropy loss δ

2 while > 3δk

3 S (n, , 1/2, k, 1)-Su I;

4 foreach S 2 S do

5 Q.append(S);

6 for i 1 to 2 log2 n do

/* Add a set of elements

from S that have 1 on i-th least significant bit of Balanced ID. */

7 Ri(S) {v 2 S : bv/2i 1c = 1

8 Q.append(Ri(S));

10 T (n, 3δk/( 1), k, 1)-SSu I;

11 foreach T 2 T do

12 Q.append(T);

13 for i 1 to 2 log2 n do

14 Ri(T) {v 2 T : bv/2i 1c = 1

15 Q.append(Ri(T));

16 return Q

, . . . | {z }

, . . . | {z }

, T1, R (T1) , T2, R (T2) , . . . | {z }

2 , . . .i = S(32) = (64, 32, 1/2, 32, 16)-Su I

2 , . . .i = S(16) = (64, 16, 1/2, 32, 16)-Su I

h T1, T2, . . .i = T = (64, 8, 32, 16)-SSu I

for decoding 16 (out of 32) hidden elements

for decoding 8 (out of remaining 16) hidden elements

for decoding the remaining 8 hidden elements

Figure 1: An example of sequence of queries produced by Algorithm 1 for n = 64, k = 32, = 17. For simplicity we assume that 16 > 3δk

Decoding algorithm. During the decoding algorithm we process, in subsequent iterations, the feedbacks from enhanced selectors S( ) for = k, k/2, k/4, . . . , 3δk/( 1). We will later prove, by induction, that during processing S( ), /2 new elements from K are decoded. To show this, we consider any iteration and let set K1 be the set of the elements that have been decoded in previous iterations while set K2 = K \ K1 be the set of unknown elements. We treat K1 as the interfering set and, by the properties of S( ), we know that for at least /2 elements v, there exists a query S 2 S( ), such that v 2 S, |K1 \ S| < 1, |K2 \ S| = 1. We observe that since we already know the identiﬁers of all the elements from the interfering set K1, then using feedbacks from the additional queries R(S) (corresponding to balanced IDs) we can exactly decode the identiﬁer of v. We do this for all /2 elements that are possible to decode in this iteration and we proceed to the next iteration. After considering all Selectors-under-Interference, we have only at most 3δk/( 1) unknown elements. To complete the decoding we use a Strong-Selector-under-Interference, where the decoding procedure is exactly the same as in the case of Su I (the interfering set is also the set of already decoded elements). The properties of SSu I guarantee that we decode the identiﬁers of all the remaining elements from K. See the pseudocode of decoding Algorithm 2 for details of deocding for > 1. In the case where = 1 the decoding is straightforward, as the SSu I guarantees in this case no interference, hence the results (due to our enhancement T ) contain exactly the identiﬁers of the elements from set K.

Lemma 1. There is an explicit polynomial-time algorithm constructing non-adaptive queries Q1, . . . , Qm and decoding any hidden set K of size at most k n, from the feedback vector in polynomial time, under feedback F and for m = O(( k

)2δ2 log3 n + kdδ log3 n) queries. Moreover, every element occurs in O( k

δ log2 n + dδ log3 n) queries.

Proof. We start from describing a procedure of revealing elements in any given set K of at most k elements, together with a formal (inductive) argument of its correctness. Our ﬁrst goal is to show that by the beginning of S( ), for stepping down from k to 3δk/( 1), we have not learned about the identity of at most elements from the hidden set K.

The proof is by induction it clearly holds in the beginning of the computation, as the set K has at most = k elements. We prove the inductive step: by the end of S( ), at most /2 elements are not learned. We set K2 to be the set of learned elements and K1 = K \ K2. Clearly, |K2| k, and by the inductive assumption |K1| k. For such K1 and K2, by the deﬁnition of Su I, there are at most /2 elements from K1 that are not occurring in some round without other such elements or with at least 1 of already learned elements from K2. Consider a previously not learned element v 2 K1, for which there exists a good query in S( ), i.e., a query S 2 S( ) such that S \ K1 = {v} and |S \ K2| < 1. At this point of decoding of set K we know the Balanced IDs of all elements from set K2. Hence we can calculate the 2 log2 n-bit feedback vector from sets K2 \ R1(S), K2 \ R2(S), . . . , K2 \ R2 log2 n(S). We compare this feedback vector with the output of the enhanced selector, which is the feedback vector for sets K \ R1(S), K \ R2(S), . . . , K\R2 log2 n(S). The difference between the latter and the former is exactly the Balanced ID of v. In case this difference does not form a Balanced ID of any element, i.e., it has some value bigger than 1 or otherwise the number of 1 s is different from log2 n, or in case F (K \ S) = (recall that S is also in the constructed selector) the feedback from this R(S) is ignored. This is done to avoid misinterpreting the feedback and false discovery of an element which is not in K. Indeed, ﬁrst note that the fact |K \ S| will automatically discard the part of the feedback from K \ R1(S), K \ R2(S), . . . , K \ R2 log2 n(S), as it indicates that the intersection is too large to provide correct decoding of an element. Second, assuming |K \ S| < , if there are no elements

Algorithm 2: Decoding of the elements for QGT with -capped feedback.

Input: Result(Q) for queries Q 2 Q constructed by Algorithm 1, based on Su I S( ) for being

powers of 2 stepping down from k to 3δl 1 and SSu I T .

1 k, Kacc ; ; /* Set Kacc accumulates the decoded elements. */

2 while > 3δk

1 do /* We decode l/2 elements in each iteration.*/

3 for i 1 to l/2 do

/* Look for query for which we can decode a new element. */

4 foreach S 2 S( ) do

5 if Result(S) < 1 and |S \ Kacc| = Result(S) 1 then

6 Kacc Kacc [ {Decode Element(S, Kacc)} ;

8 while > 0 do /* Iteratively decode all the remaining elements.*/

9 foreach T 2 T do

10 if Result(T) < 1 and |T \ Kacc| = Result(T) 1 then

11 Kacc Kacc [ {Decode Element(T, Kacc)} ;

13 return Kacc

1 Procedure Decode Element(Q,Kacc)

3 for j 2 log2 n downto 0 do

/* Take the feedback from set Rj(Q). Calculate the feedback from set Rj(Q), if hidden set was exactly Kacc. The difference is j-th least significant bit of Balanced ID of new element v. */

4 v 2 v + Result(Rj(Q)) |Rj(Q) \ Kacc|

in K1 \ S then the difference between feedbacks gives vector of zeros, and if there will be at least two elements in K1 \ S, the difference between feedbacks will contain a value of at least 2 or all 1 s, as it will be a bitwise sum of at least two Balanced IDs of log2 n ones each. By the deﬁnition of

Su I we can ﬁnd l/2 such elements v. This shows that during decoding of enhanced S( ) we learn the identities of /2 new elements. This completes the inductive proof. Note here that the inductive step, being one of O(log n) steps, deﬁnes a polynomial time algorithm decoding some elements one-by-one indeed, it computes two feedbacks of polynomial number of queries, computes the difference and deducts based on the structure of subsequent blocks of O(log n) size.

The above analysis implies, that before applying (n, 3δk/( 1), k, 1)-SSu I we have not discovered at most 3δk/( 1) elements. Thus, by deﬁnition, (n, 3δk/( 1), k, 1)-SSu I combined with Balanced IDs reveals all the remaining elements in the same way as the Su I s above the only difference in the argument is that instead of leaving at most /2 undiscovered elements in the -th inductive step, due to the nature of Su I s, the SSu I guarantees that every undiscovered element will occur in a good query. The same argument as for Su I s proves that the decoding algorithm deﬁned this way works in polynomial time. By Theorem 3, the length of (n, , 1/2, k, )-Su I is O(min

n, dδ log2 n

), which sums up to O(kdδ log2 n), and is multiplied by (log n) due to ampliﬁcation by Balanced IDs. By Theorem 5, the length of (n, 3δk/ , k, )- SSu I is O(( k

)2δ2 log2 n), and it is also increased by factor (log n) due to Balanced IDs. If we apply the above reasoning with respect to the number of queries containing an element, we get that every element occurs in O( k

δ log2 n + dδ log3 n) queries.

Implementing -Round-Robin for large values of k/ . The question arises from the previous result if one could efﬁciently construct a shorter sequence of queries if (k/ )2 > n/ ? In the case of full feedback (i.e., = k) the common way to deal with large values of k is via Round-Robin, which means that queries are singletons and consequently, the length of such query sequence is n. This also works for an arbitrary value of k, however the lower bound in Theorem 2 suggest that in such case there could exist a shorter query system of length O((k + (n/ )) polylog n). Indeed, if we modify our construction in such case, we could obtain such a goal. Namely, we concatenate:

selectors S( ), for being decreasing powers of 2 from = k to = 6δk/( 1); selectors S|( )

, for being decreasing powers of 2 from = 3δk/( 1) to = 1.

In the case, where k > 6δk/( 1), then the construction contains only selectors S|( )

. Then we enhance them based on Balanced IDs, as in the previous construction. Then, applying Theorem 3 for concatenated S( ) and Theorem 4 for concatenated S|( )

, instead of combination of Theorems 3 and 5 as it was in the proof of Lemma 1 with respect to S( ), we get the following result. Lemma 2. There is an explicit polynomial-time algorithm constructing non-adaptive queries Q1, . . . , Qm and decoding any hidden set K of size at most k n, from the feedback vector in polynomial time, under feedback F and for m = O((k + n/ )dδ log3 n) queries. Moreover, each element occurs in O(dδ log3 n) queries.

Proof. The proof is analogous to the proof of Lemma 1, except that we continue proving the invariant until = 6δ /( 1), using same properties guaranteed by Theorems 3, and continue the invariant until = 1, using Su I s of slightly different length formula from Theorem 4. The correctness argument, as well as polynomial-time query construction and decoding of the elements, are the same as in the invariant proof in Lemma 1. Then, by Theorem 3 for concatenated S( ) and by Theorem 4 for concatenated S|( )

, we argue that the total length of the obtained sequence is m = O((k + n/ )dδ log3 n). Indeed, the ﬁrst part results from the telescoping sum for different and the second component is a logarithmic ampliﬁcation of the original O((ndδ/ ) log n) length of Su I s; all is ampliﬁed by O(log n) due to enhancement of the used Su I s by Balanced IDs. In all the components, every element belongs to O(dδ log n) queries, by Theorems 3 and 4, in the ﬁnal sequence it occurs in O(dδ log3 n) queries.

Combining Lemma 1 with Lemma 2 gives Theorem 1. Note that in both lemmas the decoding algorithm proceeds query-by-query, each time spending polylogarithmic time on each of them; additionally, for each decoded element, an update of the feedback of next queries needs to be done, which takes time proportional to the number of occurrences of the discovered element in the queries. Thus, it is asymptotically upper bounded by the length of the sequence plus k times the upper bound on the number of occurrences of an element in the queries (polylogarithmic).

5 Lower Bound

Proof of Theorem 2. We will ﬁrst show the min{n/ , k2/ 2} component. Assume that a sequence of queries Q1, Q2, . . . , Qt of length t solves Group Testing. We want to show the lower bound that holds for any feedback function capped at hence we assume that the feedback function F returns the whole set (i.e., the identiﬁers of all the elements). Recall that F works only for sets with at most elements. We begin by proving the following: Claim A: For any set K, with |K| k and any x 2 K, there must exist 2 {1, 2, . . . , t}, such that x 2 Q and |K \ Q | + 1.

The proof is by contradiction. Assume that such a set K and element x exist for which there is no such query. Consider feedback vectors for sets K and K \ {x }. For any query that does not contain x , the feedback is clearly identical. For any query Q , such that x 2 Q , we have |Q \K | +2 and |Q \(K \{x })| +1 and sets K and K \{x } are indistinguishable under any feedback capped at hence the sequence of queries does not solve the problem. This completes the proof of Claim A.

Take all queries that have at most + 1 elements and all elements that belong to such queries. We have: Ns = S

2{1,2,...,t}|Q | +1 Q and remaining elements Nl = N \ Ns. Consider two cases: Case 1: |Ns| n/2.

Observe that: t |{ 2 {1, 2, . . . , t} : |Q | + 1}| Ns +1 n 2( +1).

Case 2: |Nl| n/2.

In this case, we take an arbitrary subset K1 of k/2 elements from Nl. For every element x 2 K1, we consider a set of queries Q(x) = {Q 2 {Q1, Q2, . . . , Qt} : x 2 Q , |Q \K1| +1}. We show:

Claim B: For every x 2 K1, we have |Q(x)| k 2( +2). The proof is by contradiction. Assume that for some x 2 K1 we have |Q(x )| < k 2( +2). Then, for every query Q 2 Q(x ), we take

+ 2 |Q \ K1| elements from Q \ K1. Such elements exist since |Q| + 2. Choose such elements for each query in Q(x ) and gather them in set K2. Note that since |Q(x )| < k 2( +2), then |K2| k/2. Now observe that set K1 [ K2 and element x violate Claim A. The obtained contradiction completes the proof of Claim B.

Now observe that each query belongs to at most + 1 sets Q(x) for different values of x 2 K1.

x2K1 |Q(x)|

+1 k2 4( +1)( +2). To complete the proof observe that any algorithm must fall either into Case 1 or Case 2, hence any algorithm needs to use

min{n/ , k2/ 2}

To see that any algorithm in F feedback model at least k log n

k log queries, observe that the feedback vector must be unique for each set K with at most k elements. Hence we need at least

different feedback vectors for different sets. Feedback has at most values hence we get t

and t 2 (k log n

k log ). This completes the proof of Theorem 2.

6 Extensions, Applications and Open Directions

Observations. The size of response to a capped quantitative query is not much bigger than beeping log bits vs one bit of beeping whereas the number of queries is smaller by a factor of 2 compared to beeping. Thus, even after normalization of the query complexity by the inverse of the size of the feedback, in the -capped model it is nearly 2 times better than the classic beeping, up to =

Although optimizing the construction time is not our main focus, it is actually small. E.g., for =

k, the construction of all queries takes time e O(n

k). Most of the complexity comes from the construction of SSu I (Alg 4) for each element of the universe, we consider e O(

k) zeros of some polynomial. We use a logarithmic number of Su Is in the construction, but each takes only e O(n) time (Alg 3): in Su I for every element we list its neighborhood in disperser in polylogarithmic time [57] and take the Cartesian product with a polylogarithmic Reed-Solomon type of codeword.

Fault-tolerance. Algorithms 1 and 2 are fault-tolerant with respect to f c k

adversarially jammed results, for some sufﬁciently small constant c > 0 here by a jammed result we understand receiving a null as a result of a query. It also tolerates stochastic failures, where a result is jammed with probability p c. Indeed, in adversarial failures we observe that, by dispersion, c k

could be made negligible (by right selection of constant c) with respect to the number of selected elements in Su I, and it is also smaller than the number of single occurrences (i.e., without interference from other k 1 elements) of an element in SSu I. Thus the analysis of the algorithms hold (the original analysis have margins to accommodate those additional negligible jamming, as they were done with only asymptotic upper bounds). The tolerance of stochastic failures follows from the fact that a random p-fraction of results of queries coming from Su I s and SSu I are again negligible from perspective of the fraction of correctly encoded elements and can be accommodated by our asymptotic analysis.

Theorem 6. Algorithms 1 and 2 tolerate f c k

adversarial jammings and stochastic jammings with probability p c, for some sufﬁciently small constant c > 0.

Other applications. Applications to efﬁcient GT and dynamic maintenance of multi-sets, graphs, as well as to Private Parallel Information Retrieval and coding, are detailed in Appendix B.

Open directions. Considering only polynomially-constructible query systems leaves some interesting open directions. One such direction is whether optimal-length query sequence can be constructed in polynomial time or perhaps it is possible to show some reduction that constructing a nearly-shortest query sequence is computationally hard (even if we know that it exists). Shrinking polylogarithmic gaps between lower and (existential) upper bounds and improving constants is another challenging direction, as well as considering other interesting classes of GT models with an -capped results, e.g., parity. We also believe that with some adjustment, capped GT codes could be applied to efﬁciently solve many open problems in online streaming, communication and graph learning ﬁelds.

Acknowledgments and Disclosure of Funding

Dominik Paj ak was supported by the National Science Centre, Poland (NCN), grant UMO2019/33/B/ST6/02988. Dariusz R. Kowalski was supported, in part, by the National Science Center, Poland (NCN), grant UMO-2017/25/B/ST6/02010.

[1] D. Angluin. Queries and concept learning. Machine learning, 2(4):319 342, 1988.

[2] N. Augenblick, J. T. Kolstad, Z. Obermeyer, and A. Wang. Group testing in a pandemic: The

role of frequent testing, correlated risk, and machine learning. Technical report, National Bureau of Economic Research, 2020.

[3] W. H. Bay, E. Price, and J. Scarlett. Optimal non-adaptive probabilistic group testing in general

sparsity regimes. ar Xiv preprint ar Xiv:2006.01325, 2020.

[4] N. H. Bshouty. Optimal algorithms for the coin weighing problem with a spring scale. In COLT

2009 - The 22nd Conference on Learning Theory, Montreal, Quebec, Canada, June 18-21, 2009, 2009.

[5] C.-C. Cao, C. Li, and X. Sun. Quantitative group testing-based overlapping pool sequencing to

identify rare variant carriers. BMC bioinformatics, 15(1):1 14, 2014.

[6] J. Capetanakis. Generalized tdma: The multi-accessing tree protocol. IEEE Transactions on

Communications, 27(10):1476 1484, 1979.

[7] J. Capetanakis. Tree algorithms for packet broadcast channels. IEEE transactions on information

theory, 25(5):505 515, 1979.

[8] K. Censor-Hillel, B. Haeupler, N. A. Lynch, and M. Médard. Bounded-contention coding for

the additive network model. Distributed Computing, 28(5):297 308, 2015.

[9] M. Cheraghchi and J. Ribeiro. Simple codes and sparse recovery with fast decoding. In 2019

IEEE International Symposium on Information Theory (ISIT), pages 156 160. IEEE, 2019.

[10] B. S. Chlebus and D. R. Kowalski. Almost optimal explicit selectors. In M. Liskiewicz and

R. Reischuk, editors, Fundamentals of Computation Theory, 15th International Symposium, FCT 2005, Lübeck, Germany, August 17-20, 2005, Proceedings, volume 3623 of Lecture Notes

in Computer Science, pages 270 280. Springer, 2005.

[11] S.-S. Choi and J. H. Kim. Optimal query complexity bounds for ﬁnding graphs. Artiﬁcial

Intelligence, 174(9-10):551 569, 2010.

[12] A. E. F. Clementi, A. Monti, and R. Silvestri. Selective families, superimposed codes, and

broadcasting on unknown radio networks. In Proceedings of the Twelfth Annual Symposium on Discrete Algorithms, January 7-9, 2001, Washington, DC, USA, pages 709 718. ACM/SIAM, 2001.

[13] R. Clifford, K. Efremenko, E. Porat, and A. Rothschild. Pattern matching with don t cares and

few errors. Journal of Computer and System Sciences, 76(2):115 124, 2010.

[14] A. Coja-Oghlan, O. Gebhard, M. Hahn-Klimroth, and P. Loick. Optimal group testing. In J. D.

Abernethy and S. Agarwal, editors, Conference on Learning Theory, COLT 2020, 9-12 July 2020, Virtual Event [Graz, Austria], volume 125 of Proceedings of Machine Learning Research, pages 1374 1388. PMLR, 2020.

[15] G. Cormode, M. Garofalakis, S. Muthukrishnan, and R. Rastogi. Holistic aggregates in a

networked world: Distributed tracking of approximate quantiles. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pages 25 36, 2005.

[16] G. Cormode and M. Hadjieleftheriou. Finding frequent items in data streams. Proceedings of

the VLDB Endowment, 1(2):1530 1541, 2008.

[17] G. Cormode, F. Korn, S. Muthukrishnan, and D. Srivastava. Finding hierarchical heavy hitters

in data streams. In Proceedings 2003 VLDB Conference, pages 464 475. Elsevier, 2003.

[18] G. Cormode and S. Muthukrishnan. What s hot and what s not: tracking most frequent items

dynamically. ACM Transactions on Database Systems (TODS), 30(1):249 278, 2005.

[19] G. Cormode and S. Muthukrishnan. Combinatorial algorithms for compressed sensing. In

International colloquium on structural information and communication complexity, pages 280 294. Springer, 2006.

[20] P. Damaschke. Threshold group testing. Electronic Notes in Discrete Mathematics, 21:265

271, 2005. General Theory of Information Transfer and Combinatorics.

[21] A. De Bonis, L. Gasieniec, and U. Vaccaro. Generalized framework for selectors with applica-

tions in optimal group testing. In Automata, Languages and Programming, 30th International Colloquium, ICALP 2003, Eindhoven, The Netherlands, June 30 - July 4, 2003. Proceedings, volume 2719 of Lecture Notes in Computer Science, pages 81 96. Springer, 2003.

[22] G. De Marco, T. Jurdzinski, and D. R. Kowalski. Optimal channel utilization with limited

feedback. J. Comput. Syst. Sci., 119:21 33, 2021.

[23] G. De Marco, T. Jurdzinski, D. R. Kowalski, M. Rózanski, and G. Stachowiak. Subquadratic

non-adaptive threshold group testing. J. Comput. Syst. Sci., 111:42 56, 2020.

[24] G. De Marco and D. R. Kowalski. Searching for a subset of counterfeit coins: Randomization

vs determinism and adaptiveness vs non-adaptiveness. Random Structures & Algorithms, 42(1):97 109, 2013.

[25] A. Djackov. On a search model of false coins. In Topics in Information Theory (Colloquia

Mathematica Societatis Janos Bolyai 16). Budapest, Hungary: Hungarian Acad. Sci, pages 163 170, 1975.

[26] R. Dorfman. The detection of defective members of large populations. The Annals of Mathe-

matical Statistics, 14(4):436 440, 1943.

[27] D. Du, F. K. Hwang, and F. Hwang. Combinatorial group testing and its applications, volume 12.

World Scientiﬁc, 2000.

[28] J. Engels, B. Coleman, and A. Shrivastava. Practical near neighbor search via group testing. In

M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 9950 9962. Curran Associates, Inc., 2021.

[29] U. Feige and A. Lellouche. Quantitative group testing and the rank of random matrices. Co RR,

abs/2006.09074, 2020.

[30] L. Galanis, Y. Wang, S. R. Jeffery, and D. J. De Witt. Locating data sources in large distributed

systems. In J. C. Freytag, P. C. Lockemann, S. Abiteboul, M. J. Carey, P. G. Selinger, and A. Heuer, editors, Proceedings of 29th International Conference on Very Large Data Bases, VLDB 2003, Berlin, Germany, September 9-12, 2003, pages 874 885. Morgan Kaufmann, 2003.

[31] R. Gallager. A perspective on multiaccess channels. IEEE Transactions on information Theory,

31(2):124 142, 1985.

[32] O. Gebhard, M. Hahn-Klimroth, D. Kaaser, and P. Loick. Quantitative group testing in the

sublinear regime. Co RR, abs/1905.01458, 2019.

[33] P. B. Gibbons, Y. Matias, and V. Poosala. Fast incremental maintenance of approximate

histograms. ACM Transactions on Database Systems (TODS), 27(3):261 298, 2002.

[34] A. C. Gilbert, S. Guha, P. Indyk, Y. Kotidis, S. Muthukrishnan, and M. J. Strauss. Fast, small-

space algorithms for approximate histogram maintenance. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pages 389 398, 2002.

[35] A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. J. Strauss. How to summarize the universe:

Dynamic maintenance of quantiles. In VLDB 02: Proceedings of the 28th International Conference on Very Large Databases, pages 454 465. Elsevier, 2002.

[36] S. Gollakota and D. Katabi. Zigzag decoding: combating hidden terminals in wireless networks.

In V. Bahl, D. Wetherall, S. Savage, and I. Stoica, editors, Proceedings of the ACM SIGCOMM 2008 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, Seattle, WA, USA, August 17-22, 2008, pages 159 170. ACM, 2008.

[37] V. Grebinski and G. Kucherov. Optimal reconstruction of graphs under the additive model.

Algorithmica, 28(1):104 124, 2000.

[38] A. G. Greenberg, P. Flajolet, and R. E. Ladner. Estimating the multiplicities of conﬂicts to speed

their resolution in multiple access channels. Journal of the ACM (JACM), 34(2):289 325, 1987.

[39] A. G. Greenberg and S. Winograd. A lower bound on the time needed in the worst case to

resolve conﬂicts deterministically in multiple access channels. Journal of the ACM (JACM), 32(3):589 596, 1985.

[40] M. Greenwald and S. Khanna. Space-efﬁcient online computation of quantile summaries. ACM

SIGMOD Record, 30(2):58 66, 2001.

[41] E. Hradovich, M. Klonowski, and D. R. Kowalski. Contention resolution on a restrained channel.

In 26th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2020, Hong Kong, December 2-4, 2020, pages 89 98. IEEE, 2020.

[42] I. F. Ilyas, W. G. Aref, and A. K. Elmagarmid. Supporting top-k join queries in relational

databases. VLDB J., 13(3):207 221, 2004.

[43] P. Indyk. Deterministic superimposed coding with applications to pattern matching. In 38th

Annual Symposium on Foundations of Computer Science, FOCS 97, Miami Beach, Florida, USA, October 19-22, 1997, pages 127 136. IEEE Computer Society, 1997.

[44] P. Indyk. Explicit constructions of selectors and related combinatorial structures, with applica-

tions. In D. Eppstein, editor, Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 6-8, 2002, San Francisco, CA, USA, pages 697 704. ACM/SIAM, 2002.

[45] P. Indyk, H. Q. Ngo, and A. Rudra. Efﬁciently decodable non-adaptive group testing. In

M. Charikar, editor, Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, Austin, Texas, USA, January 17-19, 2010, pages 1126 1142. SIAM,

[46] W. Kautz and R. Singleton. Nonrandom binary superimposed codes. IEEE Transactions on

Information Theory, 10(4):363 377, 1964.

[47] M. Klonowski, D. R. Kowalski, and D. Pajak. Generalized framework for group testing: Queries,

feedbacks and adversaries. Theor. Comput. Sci., 919:18 35, 2022.

[48] J. Komlós and A. G. Greenberg. An asymptotically fast nonadaptive algorithm for conﬂict

resolution in multiple-access channels. IEEE Trans. Inf. Theory, 31(2):302 306, 1985.

[49] D. R. Kowalski and D. Pajak. Light agents searching for hot information. In L. D. Raedt, editor,

Proceedings of the Thirty-First International Joint Conference on Artiﬁcial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022, pages 363 369. ijcai.org, 2022.

[50] K. Lee, K. Chandrasekher, R. Pedarsani, and K. Ramchandran. Saffron: A fast, efﬁcient, and

robust framework for group testing based on sparse-graph codes. IEEE Transactions on Signal Processing, 67(17):4649 4664, 2019.

[51] W. Liang and J. Zou. Neural group testing to accelerate deep learning. In 2021 IEEE Interna-

tional Symposium on Information Theory (ISIT), pages 958 963. IEEE, 2021.

[52] S. Mallapaty et al. The mathematical strategy that could transform coronavirus testing. Nature,

583(7817):504 505, 2020.

[53] J. L. Massey. Collision-resolution algorithms and random-access communications. In Multi-user

communication systems, pages 73 137. Springer, 1981.

[54] E. Porat and A. Rothschild. Explicit nonadaptive combinatorial group testing schemes. IEEE

Trans. Inf. Theory, 57(12):7982 7989, 2011.

[55] A. Seb o. On two random search problems. Journal of Statistical Planning and Inference,

11(1):23 31, 1985.

[56] N. Sinnott-Armstrong, D. L. Klein, and B. Hickey. Evaluation of group testing for sars-cov-2

rna. Med Rxiv, 2020.

[57] A. Ta-Shma, C. Umans, and D. Zuckerman. Loss-less condensers, unbalanced expanders, and

extractors. In J. S. Vitter, P. G. Spirakis, and M. Yannakakis, editors, Proceedings on 33rd Annual ACM Symposium on Theory of Computing, July 6-8, 2001, Heraklion, Crete, Greece,

pages 143 152. ACM, 2001.

[58] S. Ubaru, S. Dash, A. Mazumdar, and O. Gunluk. Multilabel classiﬁcation by hierarchical

partitioning and data-dependent grouping. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 22542 22553. Curran Associates, Inc., 2020.

[59] J. X. Yu, Z. Chong, H. Lu, and A. Zhou. False positive or false negative: Mining frequent

itemsets from high speed transactional data streams. In VLDB, volume 4, pages 204 215, 2004.

The checklist follows the references. Please read the checklist guidelines carefully for information on how to answer these questions. For each question, change the default [TODO] to [Yes] , [No] , or [N/A] . You are strongly encouraged to include a justiﬁcation to your answer, either by referencing the appropriate section of your paper or providing a brief inline description. For example:

Did you include the license to the code and datasets? [Yes] See Section ??.

Did you include the license to the code and datasets? [No] The code and the data are

proprietary.

Did you include the license to the code and datasets? [N/A]

Please do not modify the questions and only use the provided macros for your answers. Note that the Checklist section does not count towards the page limit. In your paper, please delete this instructions block and only keep the Checklist section heading above along with the questions/answers below.

1. For all authors...

(a) Do the main claims made in the abstract and introduction accurately reﬂect the paper s

contributions and scope? [Yes] Our abstract and introduction states the theoretical results that are later proven in our paper. (b) Did you describe the limitations of your work? [Yes] Our assumed model (including

the result function) is described in Section 3. (c) Did you discuss any potential negative societal impacts of your work? [Yes] Our result

is theoretical hence does not have any direct negative impact. Also none of the existing solutions to the Group Testing (which is a ﬁeld with almost 80 years of history) were harmful to the society. (d) Have you read the ethics review guidelines and ensured that your paper conforms to

them? [Yes]

2. If you are including theoretical results...

(a) Did you state the full set of assumptions of all theoretical results? [Yes] The assump-

tions are given in a statement of each theoretical result. (b) Did you include complete proofs of all theoretical results? [Yes] Some combinatorial

constructions (used in proofs of Theorems 3 4 5), are included Appendix C.

3. If you ran experiments...

(a) Did you include the code, data, and instructions needed to reproduce the main experi-

mental results (either in the supplemental material or as a URL)? [N/A] (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they

were chosen)? [N/A] (c) Did you report error bars (e.g., with respect to the random seed after running experi-

ments multiple times)? [N/A] (d) Did you include the total amount of compute and the type of resources used (e.g., type

of GPUs, internal cluster, or cloud provider)? [N/A] 4. If you are using existing assets (e.g., code, data, models) or curating/releasing new assets...

(a) If your work uses existing assets, did you cite the creators? [N/A] (b) Did you mention the license of the assets? [N/A]

(c) Did you include any new assets either in the supplemental material or as a URL? [N/A]

(d) Did you discuss whether and how consent was obtained from people whose data you re

using/curating? [N/A] (e) Did you discuss whether the data you are using/curating contains personally identiﬁable

information or offensive content? [N/A] 5. If you used crowdsourcing or conducted research with human subjects...

(a) Did you include the full text of instructions given to participants and screenshots, if

applicable? [N/A] (b) Did you describe any potential participant risks, with links to Institutional Review

Board (IRB) approvals, if applicable? [N/A] (c) Did you include the estimated hourly wage paid to participants and the total amount

spent on participant compensation? [N/A]