# bayesian_multiple_target_localization__bda6c2b4.pdf Bayesian Multiple Target Localization Purnima Rajan PURNIMA@CS.JHU.EDU Department of Computer Science, Johns Hopkins University Weidong Han WHAN@PRINCETON.EDU Department of Operations Research and Financial Engineering, Princeton University Raphael Sznitman RAPHAEL.SZNITMAN@ARTORG.UNIBE.CH ARTORG Center, University of Bern Peter I. Frazier PF98@CORNELL.EDU School of Operations Research and Information Engineering, Cornell University Bruno M. Jedynak BRUNO.JEDYNAK@JHU.EDU Department of Applied Mathematics & Statistics, Johns Hopkins University We consider the problem of quickly localizing multiple targets by asking questions of the form How many targets are within this set while obtaining noisy answers. This setting is a generalization to multiple targets of the game of 20 questions in which only a single target is queried. We assume that the targets are points on the real line, or in a two dimensional plane for the experiments, drawn independently from a known distribution. We evaluate the performance of a policy using the expected entropy of the posterior distribution after a fixed number of questions with noisy answers. We derive a lower bound for the value of this problem and study a specific policy, named the dyadic policy. We show that this policy achieves a value which is no more than twice this lower bound when answers are noisefree, and show a more general constant factor approximation guarantee for the noisy setting. We present an empirical evaluation of this policy on simulated data for the problem of detecting multiple instances of the same object in an image. Finally, we present experiments on localizing multiple faces simultaneously on real images. Proceedings of the 32 nd International Conference on Machine Learning, Lille, France, 2015. JMLR: W&CP volume 37. Copyright 2015 by the author(s). 1. Introduction The task of localizing structures of interest, or targets, appears in numerous applications, such as finding quasars in astronomical data (Mortlock, 2009), localizing faces in images (Ali et al., 2012) or counting synapses in microscopy volumes (Merchan-Perez et al., 2009). Once a competent detection scheme is available, the localization task often reduces to evaluating each possible location in an exhaustive fashion. Such strategies are highly effective and easy to implement, which contributes to their widespread use. Yet such localization strategies do not scale with data size requirements since their computational complexity depends directly on the searchable domain s size. This problem is critical in analysis of enormous microscopy volumes where localizing and counting intra-cellular structures such mitochondria or synapses is critical to understanding brain processes (Lee et al., 2007). Similarly, efficient localization of object instances in images remains challenging given the growing amount of image data to evaluate. To address this issue, theoretical works have established strategies to reduce the computation needed to find targets. Generally, literature considering such problems falls into two categories: those that consider a single target (k = 1) and those that consider multiple targets (k 1). For single-target localization, (Jedynak et al., 2012) considered a Bayesian setting and used the entropy of the posterior distribution to measure accuracy, as we do here. Within this context, a number of policies have been proposed, such as the dyadic policy and the greedy probabilistic bisection (Horstein, 1963), which was further stud- Bayesian Multiple Target Localization ied in (Castro & Nowak, 2007; Waeber et al., 2013). (Tsiligkaridis et al., 2013) more recently generalized this probabilistic bisection policy to multiple questioners as well. A discretized version of probabilistic bisection was studied by (Burnashev & Zigangirov, 1975). For multiple-target localization, three variations appear frequently: the Group Testing problem (Du & Hwang, 2000; Stinson et al., 2000; Eppstein et al., 2007; Harvey et al., 2007; Porat & Rothschild, 2008), the subset-guessing game associated with the Random Chemistry algorithm (Kauffman, 1996; Buzas & Warrington, 2013) and the Guessing Secret game (Chung et al., 2001). In each case, the goal is to query subsets, A, of the search space to determine an unknown set S. In the Group Testing problem, questions are of the form: Is A \ S 6= ;? In the subset-guessing game associated with the Random Chemistry algorithm, questions are of the form Is S A? In the Guessing Secret game, when queried with a set A, the responder chooses an element from S according to any self-selected rule and specifies whether this chosen element is in A. The chosen element itself is not revealed and may change after each question. Thus, the answer is 1 when S A, 0 when A \ S = ;, and can be 0 or 1 otherwise. Four major practical considerations, however, severely limit the usability of existing theoretical results in real applications. First, when multiple targets need to be located, significant noise in the query answers are typically observed in real applications but ignored in most theoretical analyses. Second, existing theoretical analyses lack generality, considering very specific models for what is observed, rather than a general observational model that could be adapted to the application at hand. Third, many methods with theoretical guarantees on query complexity require a great deal of computation, and cannot be used in computation-constrained applications. Fourth, existing theoretical analysis often do not make clear the computational gain possible over repeatedly applying optimal strategies for single-target localization, making simpler strategies based on localizing single-targets more attractive. This work addresses these concerns, by proposing and then analyzing the dyadic policy for simultaneous localization of multiple targets. This policy and our analysis uses an observational model that allows noise, and is general enough to subsume Group Testing, Random Chemistry, and a wide variety of other problems. We provide an explicit expression for the expected entropy of the posterior after N queries from this policy, and together with a simple information-theoretic lower bound on the expected entropy under the optimal policy, we show an approximation guarantee for the expected entropy reduction under the dyadic policy. Using this result, we can then demonstrate significant computation gains over repeated single-target optimal localization. The dyadic policy can be computed quickly and is non-adaptive, making it easy to parallelize, and far simpler to implement than dynamic strategies. Moreover, it allows easy and exact computation of the expected number of targets at each location of our space. 2. PROBLEM FORMULATION Let = ( 1, . . . , k) be a random vector taking values in Rk. i represents the location of the ith target of interest, i = 1, . . . , k. We assume that 1, . . . , k are i.i.d. with density f0, and joint density p0. We refer to p0 as the Bayesian prior probability distribution on . Note that even if the targets are indistinguishable, they are modeled as a vector, and not a set. This is a key requirement for simplifying the combinatorics of the probabilistic analysis. We will ask a series of N > 0 questions to locate 1, . . . , k, where each question takes the form of a subset of R. The answer to this question is the number of targets in this subset. However, this answer is not available to the questioner. Instead, a noisy version of this answer is available. More precisely, for each n 2 {1, 2, . . . , N}, the nth question is An R and its noiseless answer is An( k), (1) A is the indicator function of the set A. The noisy observable answer is a random function of Zn, namely Xn = h(Zn, Wn) (2) where h is a known function and Wn is a collection of independent random variable, which are also independent of . Note that our choice of the set An may depend upon the answers to all previous questions. Thus, the set An is random. We call a rule for choosing the questions An a policy, and indicate it with the notation . The distribution of An thus implicitly depends on . When we wish to highlight this dependence, we use the notation P and E to indicate probability and expectation respectively. However, when the policy being studied is clear, we simply use P and E . We let be the space of all policies. Throughout the paper, we use the notation Xa:b for any a, b 2 N to indicate the sequence (Xa, . . . , Xb) if a b, and the empty sequence if a > b. We define a:b and Aa:b similarly. We refer to the posterior probability distribution on after n questions and answers as pn, so pn is the conditional distribution of given X1:n and A1:n. After we exhaust our budget of N questions, we measure the quality of what we have learned via the differential entropy H(p N) of the posterior distribution p N on the targets Bayesian Multiple Target Localization at this final time, Rk p N(u1:k) log(p N(u1:k)) du1:k. (3) Throughout this paper, we use log to denote the logarithm to base 2. We let H0 = H(p0), and we assume 1 < H(p0) < +1. The posterior distribution p N, as well as its entropy H(p N), are random for N > 0, as they depend on X1:N. Thus, we measure the quality of a policy 2 as R( , N) = E [H(p N)]. (4) Our goal is to characterize the solution to the optimization problem inf 2 R( , N). (5) Any policy that attains this infimum is called optimal. Beyond theoretical interest, a policy for which H(p N) is small is of practical interest. It was shown in (Jedynak et al. 2012), section 4.3, that an optimal policy allows for localizing efficiently in the case of k = 1. We conjecture that the same occurs for arbitrary values of k. While (5) can be formulated as a partially observable Markov decision process (Frazier, 2010), and can be solved, in principle, via dynamic programming, the state space of this dynamic program (which is the space of posterior distributions over ) is too large to allow solving it through brute-force computation. Thus, in this paper, rather than attempting to compute the optimal policy, we provide an easily computed lower bound on (5), and then study a particular policy, called the dyadic policy and defined below, whose performance is close to this lower bound. 3. THEORETICAL RESULTS We first present an information-theoretic lower bound on the best expected entropy achievable, and a proof sketch. Proofs of all results may be found in the supplement. H0 log(k + 1)N H0 Ck N inf 2 R( , N) (6) The main arguments of the proof are as follows: First, at step n, the largest reduction in entropy that can be obtained in one question and on average occurs when the answer Xn+1 and the targets have the largest mutual information given the history I( , Xn+1|X1:n), see Geman and Jedynak (1996). Second, since Xn+1 depends on only through Zn+1 given X1:n, I( , Xn+1|X1:n) = I(Zn+1, Xn+1|X1:n). (7) Third, (7) is upper bounded by the channel capacity q(z)H (f(.|z)) , (8) where q is a point mass function over the set {0,...,k} and f(.|z) is the density, or point mass function of the noisy answer Xn+1 given the noiseless answer Zn+1. Since the noiseless answer Zn+1 is discrete and takes values in a set of size k +1, Ck is bounded above by log(k +1) providing the first inequality in (6). In the noiseless case, both lower bounds in (6) are identical. Moreover, they are not achievable. Indeed, at N = 1, the target locations are independent, and so the answer to the first question is Binomial, and must have an entropy no better than H < log(k + 1). Moreover, as the expected entropy reduction is the sum of the expected entropy reduction at each question, the lower bound is not achievable for any N. We now define an easy-to-compute policy, called the dyadic policy, and indicate it with the notation D. We first recall that the quantile function of 1 is Q(p) = inf {u 2 R : p F0(u)} , (9) where F0 is the cumulative distribution function of 1, corresponding to its density f0. Then, the dyadic policy consists in choosing at step n 1 the set (10) where supp(f0) is the support of f0, i.e., the set of values u 2 R for which f0(u) > 0. This definition of the dyadic policy generalizes a definition provided in Jedynak et al. (2012) for single targets. The dyadic policy is easy to implement, and is nonadaptive, allowing its use in parallel computing environments. Figure 1 illustrates this policy. Figure 1. Illustration of the dyadic policy. The prior density f0 displayed at top is uniform over (0,1]. The question set An is the union of the dark subsets for n = 1, 2, 3. Bayesian Multiple Target Localization The following theorem provides an explicit expression for the expected entropy of the posterior distribution under the dyadic policy. Theorem 2. Under the dyadic policy D, R( D, N) = H0 Dk N, (11) H (f(.|z)) . (12) In the noiseless case, this simplifies to the entropy of a Binomial distribution Bin(k, 1 This result is easier to interpret in a discrete setting. Consider, as we will in Section 4, an image of M M pixels containing k instances of an object, located at random, uniformly and independently. The instances are our targets. The starting entropy, neglecting the fact that several instances might occupy the same location, is H0 = k log M 2. (14) According to (11), the expected number of questions N such that the k targets are located with certainty when using the dyadic policy, i.e, R( D, N ) = 0, is such that log M 2. (15) Consider the noiseless case for simplicity. Firstly, N is negligible compared to the number of questions asked by a naive algorithm that queries the pixels in a fixed, predetermined order, for example line by line and column by column. This naive algorithm requires on average M 2 2 queries for a single target and more for more targets. Secondly, N is also better than querying optimally one target, which requires log M 2 queries, and repeating this k times, for a total of k log M 2 queries. Indeed, N 2k log ek log M 2 (16) using the approximation of the Binomial distribution B(k, 1 2) with the Normal distribution. We can also compare the expression (11) for the expected entropy under the dyadic policy to the lower bound on the optimal expected entropy from Theorem 1. In both cases the expected entropy decreases linearly in the number of questions, with a reduction per question of Dk under the dyadic policy, and a reduction of Ck in the lower bound. This implies the following approximation guarantee for the entropy reduction under the dyadic policy, relative to optimal. Corollary 1. H0 R( D, N) H0 inf 2 R( , N) Ck The approximation ratio Ck/Dk depends upon the noise model and the number of targets k. In the noiseless case, this approximation ratio is H(Bin(k, 1 2))/ log(k + 1) 1 2 (this inequality is shown in the supplement), showing that the dyadic policy is a 2approximation in the noiseless case. Moreover, this approximation ratio approaches 1/2 as k ! 1, showing that the dyadic policy does not achieve the lower bound from Theorem 1 for large values of k. However, as previously noted, this lower bound is not achievable. The precise value of an optimal policy remains unknown. To support algorithms described in Section 4 that use the dyadic policy as a first phase procedure for deciding the order over pixels in which to call the oracle in computer vision applications, we introduce here some additional notation and derive an explicit formula for the posterior distribution over the targets given the history of the noiseless answers. Consider a fixed n, where 1 n N. For each binary sequence s = {s1, . . . , sn}, define The collection C = {Cs : Cs 6= ;, s 2 {0, 1}n} provides a partition of the support of f0. A history of n questions provides information on which sets Cs contain which targets among 1:k. For each C 2 C, let N(C) = Pk i=1 1{ i 2 C} be the number of targets in C. We now provide a result that shows how to compute E[N(C)|X1:N], the expected number of targets within one of these sets C under the posterior distribution. This is used by the algorithms presented in Section 4, at the start of a second stage following the use of the dyadic policy, in which an expensive noise-free oracle is called on some of the small sets C to establish definitively the number of targets in each. The algorithms use the value E[N(C)|X1:N] to determine the order over sets in which to call the oracle. Theorem 3. For each instance i and each C 2 C, the posterior likelihood P ( i 2 C|X1:N = x1:N) satisfies P ( i 2 C|X1:N = x1:N) = Bayesian Multiple Target Localization where en = E[Zn|Xn = xn] and sn = 1{C An}. Moreover, E[N(C)|X1:N = x1:N] = k P ( i 2 C|X1:N = x1:N) . The quantity en can be computed according to Bayes rule: en = E[Zn|Xn = xn] = j P(Zn = j|Xn = xn) j P(Xn = xn|Zn = j)P(Zn = j) P(Xn = xn) , where Zn Bin , P(Xn = xn|Zn = j) can be computed directly from the noise model (2), and P(Xn = xn) = Pk j=0 P(Xn = xn|Zn = j)P(Zn = j). 4. ALGORITHMS AND EXPERIMENTS We now show how the dyadic policy, analyzed above in the continuous setting using the entropy, can be used in an idealized computer vision setting to reduce the number of oracle calls required to locate k instances of a given object within a M M digital image. We consider algorithms that proceed in 2 phases, eventually iterated. The first phase consists in querying the dyadic sets. As opposed to the continuous domain, there is here a limited supply of dyadic sets. Choosing for M a power of 2, there are log M dyadic horizontal queries and log M dyadic vertical queries. Figure 2 presents the dyadic questions for M = 16. The second phase consists in ordering the pixels and querying the oracle according to this ordering. We compare three algorithms: Posterior Rank (PR), Iterated Posterior Rank (IPR) and Entropy Pursuit (EP). We will see that all these three algorithms significantly outperform the baseline algorithm the Index Rank (IR) algorithm in terms of the expected number calls to the oracle (see Figure 4). The PR algorithm computes the expected number of instances E[N(C)|X1:N] in each pixel C using Theorem 3, orders the pixels in decreasing order of this quantity E[N(C)|X1:N], and runs oracle calls according to this order until all the instances are found. This algorithm is summarized below, and a detailed implementation is provided in the supplementary file. Algorithm 1 Posterior Rank (PR) Algorithm 1: Compute the answers to the screening questions. 2: Compute the posterior rank r according to (18). 3: Run the oracle on the pixels according to r until all the instances are found. The IPR algorithm is a variation of the PR Algorithm. As before, the pixels are searched in decreasing order of the expected number of instances. When the oracle locates an instance (or instances) at a pixel, the expected number of unlocalized instances at each pixel is recomputed using (20) as described below, and provides an updated ranking for the remainder of the search. Computing the expected number of unlocalized instances at a pixel, given the locations of 0 i < k previously localized instances, is most straightforward in the case of additive noise, i.e., h(Zn, Wn) = Zn + Wn (19) In this case, this computation is accomplished by masking the instances already found, i.e., by subtracting those instances localized in An from Xn, subtracting the overall number of localized instances from k, and recomputing using (18). More generally, we re-use the expression (18), but replace the number of unlocalized instances k by k0 = k i, and alter en to account for those previously localized instances residing in the queried set An. Letting N 0(C) = N(C) Pi j=1 1{ j 2 C} be the number of unlocalized instances in C, we have E[N 0(C)|X1:N, 1:i] = k0 (20) Here, e0 n|Xn = xn, 1:i], with Z0 j=1 1{ j 2 C} being the number of unlocalized instances in the queried set An. The quantity e0 n can be computed as, j P(Xn = xn|Z0 n = j, 1:i)P(Z0 n = j) P(Xn = xn| 1:i) , , P(Xn = xn|Z0 n = j, 1:i) can be computed from the noise model (2), and P(Xn = xn| 1:i) = Pk0 j=0 P(Xn = xn|Z0 n = j, 1:i)P(Z0 In the special case when no instances have been localized, so i = 0, (20) recovers (18). We now summarize the IPR algorithm: Algorithm 2 Iterated Posterior Rank (IPR) Algorithm 1: Compute the answers to the screening questions. 2: repeat 3: Compute the posterior rank r according to (20). 4: Run the oracle on the pixels according to r until one (several) instance(s) is (are) found at a pixel. 5: until all the instances are found. IPR s Step 4 may request the oracle s feedback on a pixel that was already queried in a previous stage. This is because we condition on previously localized instances, but Bayesian Multiple Target Localization not on previous negative reports from the oracle that there were no instances at a particular pixel. When this occurs, we simply report the oracle s previous value, rather than rerunning the oracle. We do not condition on all previous oracle results because computing the posterior expected number of instances is much more challenging computationally. Figure 2 and 3 illustrate the procedures in the IPR algorithm for a 16 16 image with k = 4 instances. Figure 2 illustrates the screening questions under the dyadic policy, with light regions marking the questions sets. The first line of Figure 3 shows the true but unknown locations of the instances in each iteration of the IPR algorithm. The second line shows the expected number of instances within each pixel computed after screening questions in each iteration, respectively, with lighter regions having a higher expected number of instances. Figure 2. The queried regions under the dyadic policy for a 16 16 image shown in white. Figure 3. (row 1) Example image with 4 instances of the object initially, one instance is found after each iteration of the IPR algorithm. (row 2) The corresponding posterior distribution after each iteration. Light regions indicate pixels more likely to contain the object instance, while dark regions are less likely. The Entropy Pursuit (EP) algorithm is a greedy algorithm aimed at reducing the expected entropy on the joint location of the instances. It has been studied and used for locating and tracking instances in (Sznitman & Jedynak, 2010; Jedynak et al., 2012; Geman & Geman, 1984; Sznitman et al., 2013b;a). This algorithm can be related to the IPR algorithm. The differences between EP and IPR are: i) EP uses a different ordering criterion; ii) EP updates the ordering each time after running the oracle at a pixel instead of after an instance being found. Specifically, EP computes for each pixel the expected entropy reduction in the distribution of the location of the instances which would be achieved by running the oracle at this pixel. It then selects the pixel for which this quantity is maximal. A detailed implementation is given in the supplementary file. We use simulations to compare the performances of the three algorithms described above with a baseline algorithm, called Index Rank (IR) . IR sweeps the image from left to right, top to bottom, until all the instances of the object are found. For the sake of simplicity, the object to be found in our simulation is a dot of size 1 pixel. We use 100 random assignments for the locations of the object instances for each k and each image size in the simulation, and measure the number of calls to the oracle required in each case. 4.1. Noiseless answers to the queries We consider first the situation where the answers to the screening questions are noiseless, which is consistent with the theoretical analysis presented in the previous sections. Figure 4, top row compares the algorithms for k = 2, k = 3 and k = 10 object instances for image sizes {8 8, 16 16, . . . , 1024 1024}. Algorithms PR, IPR and EP require a smaller average number of calls to the oracle compared to the baseline IR. An example will show how dramatic this is for large size images. In the case of 1024 1024 pixel images and k = 2 instances, IR requires 220 evaluations of the oracle while IPR requires less than 28 on average. IPR is also much more efficient than PR. IPR and EP show similar performances, however, IPR is superior to EP in terms of the computational complexity. Due to the EP algorithm s large computational and memory requirements, we have only plotted EP for k = 2 and k = 3, and have only gone up to 512 512 image for k = 3. 4.2. Noisy answers to the queries In an actual computer vision setting, screening questions would be answered by an image processing algorithm, trained using labeled data. These answers would then be noisy. We performed experiments to measure the effectiveness of the Posterior Rank (PR) and the Iterated Posterior Rank (IPR) algorithms in this case, while the oracle is still considered perfect. We use the additive model presented in (19) and we choose Wn to be independent, Normally distributed random variable with standard deviation σ. Figure 4, second and third row compares the Posterior Rank (PR), Iterated Posterior Rank (IPR) and the Index Rank (IR) algorithms for k = 2, k = 3 and k = 10 object instances for two levels of noise σ = 0.5, 1. We note that both algorithms outperform the default IR algorithm in all cases. The IPR algorithm is more robust to noise than the PR algorithm. As expected, the performances of both algorithms decrease as the amount of noise increases. Bayesian Multiple Target Localization 8x8 16x16 32x32 64x64 128x128 256x256 512x512 1024x1024 0 Log number of calls to the oracle Log image size IR PR IPR EP 8x8 16x16 32x32 64x64 128x128 256x256 512x512 1024x1024 0 Log number of calls to the oracle Log image size IR PR IPR EP 8x8 16x16 32x32 64x64 128x128 256x256 512x512 1024x1024 0 Log number of calls to the oracle Log image size 8x8 16x16 32x32 64x64 128x128 256x256 512x512 1024x1024 0 Log number of calls to the oracle Log image size K = 2, σ = 0.5 8x8 16x16 32x32 64x64 128x128 256x256 512x512 1024x1024 0 Log number of calls to the oracle Log image size K = 3, σ = 0.5 8x8 16x16 32x32 64x64 128x128 256x256 512x512 1024x1024 0 Log number of calls to the oracle Log image size K = 10, σ = 0.5 8x8 16x16 32x32 64x64 128x128 256x256 512x512 1024x1024 0 Log number of calls to the oracle Log image size K = 2, σ = 1.0 8x8 16x16 32x32 64x64 128x128 256x256 512x512 1024x1024 0 Log number of calls to the oracle Log image size K = 3, σ = 1.0 8x8 16x16 32x32 64x64 128x128 256x256 512x512 1024x1024 0 Log number of calls to the oracle Log image size K = 10, σ = 1.0 Figure 4. The mean number of calls to the oracle over 100 samples plotted against the image size for k = 2, k = 3 and k = 10 object instances using the algorithms described in section 4. (row 1): No noise in the screening answers. (row 2): Gaussian noise with σ = 0.5 in the screening answers. (row 3): Gaussian noise with σ = 1.0 in the screening answers. Bayesian Multiple Target Localization 4.3. Face detection To illustrate potential benefits of IP and IPR, we evaluate their performance in the context of finding faces in images. In particular, we consider how these strategies can be used as cost effective ways to determine regions where faces may be located such that high-performing (and computationally expensive) classifiers may then be more effectively be used. We begin by training an extremely efficient but poorlyperforming face classifier, with the intention of evaluating it at each location of an image. To this end, we train a Boosted classifier, as described in (Ali et al., 2012), but only with 50 stumps (i.e. 5% compared to state-of-theart classifiers), using 4000 faces and 5 million background samples of size 30 30. Once trained, we evaluated our poor-classifier at multiple scales on 35 images from the MIT+CMU face dataset. For each image location, a pixel was scored as the sum of weighted stump outputs of our trained classifier (e.g. Fig. 5 (2nd row) illustrate these scores: higher values in white and lower values in black). From these response image, we calculated the answers to the screening questions using an integral image representation followed by the PR algorithm (Alg. 1) Fig. 5 depicts for 2 images with 4 faces each, the corresponding response maps for the poor classifier and the posterior distribution after a new face has been located using the PR algorithm (i.e. higher values in white and lower values in black). No. of Pixels Oracle Calls Oracle Calls per face 12713984 347238 2671 Table 1. Results for PR algorithm on a dataset with 35 images containing 130 faces in total. 5. CONCLUSION In this work, we have considered the problem of localizing several targets simultaneously. We have derived a closeto-optimal policy within a Bayesian framework using the expected entropy of the posterior as a value function. We have then empirically evaluated this policy for a toy problem related to the localization of several instances of the same object in computer vision. We have shown dramatic performance increases compared to a baseline method. We have also shown that the method is robust to a reasonable level of noise. Acknowledgments Peter Frazier was partially supported by NSF CAREER CMMI-1254298, NSF IIS-1247696, AFOSR FA9550-121-0200, AFOSR FA9550-15-1-0038, and the ACSF AVF. Figure 5. Face detection result. (row 1): 2 example images with 4 faces each. (row 2): Result from the poor-classifier. row(3): The posterior at found by the PR algorithm. Light regions indicate pixels more likely to contain the face, while dark regions are less likely. The locations of the faces detected are indicated as a blue rectangles. Bruno Jedynak was partially funded by NASA Early Stage Innovations Grant NNX14AB04G and by the Science of Learning Institute at the Johns Hopkins University for the award: Spatial Localization Through Learning: An Information Theoretic Approach. Purnima Rajan was supported by NASA Early Stage Innovations Grant NNX14AB04G. Ali, K., Fleuret, F., Hasler, D., and Fua, P. A real-time de- formable detector. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(2), 2012. Burnashev, M V and Zigangirov, K S. On One Problem of Observation Control. Problemy Peredachi Informatsii, 11(3):44 52, 1975. ISSN 0555-2923. Buzas, J. and Warrington, G. Optimized random chemistry. Bayesian Multiple Target Localization Technical Report 1302.2895, Ar Xiv e-prints, 2013. Castro, R. and Nowak, R. Active sensing and learning. Foundations and Applications of Sensor Management, 2007. Chung, F., Graham, R., and Leighton, T. Guessing secrets. The Electronic Journal of Combinatorics, 8(13), 2001. Du, D. and Hwang, F. Combinatorial Group Testing and its Applications. World Scientific Pub Co, 2000. Eppstein, D., Goodrich, M, and Hirschberg, D. Improved combinatorial group testing algorithms for real-world problem sizes. SIAM Journal on Computing, 36:1360 1375, 2007. Geman, S. and Geman, D. Stochastic relaxation, gibbs dis- tributions, and the bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:721 741, 1984. Harvey, N., Patrascu, M., Wen, Y., Yekhanin, S., and Chan, V. Non-adaptive fault diagnosis for all-optical networks via combinatorial group testing on graphs. In IEEE International Conference on Computer Communications, pp. 697 705, 2007. Horstein, M. Sequential transmission using noiseless feed- back. IEEE Transactions on Information Theory, 9(3): 136 143, Jul 1963. ISSN 0018-9448. doi: 10.1109/TIT. 1963.1057832. Jedynak, B., Frazier, P., and Sznitman, R. Twenty ques- tions with noise: Bayes optimal policies for entropy loss. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1:114 136, 2012. Kauffman, S. At Home in the Universe: The Search for the Laws of Self-Organization and Complexity. Oxford University Press, 1996. Lee, D., Lee, K., Ho, W., and Lee, S. Target cell-specific involvement of presynaptic mitochondria in post-titanic potentiation at hippocampal mossy fiber synapse. Journal of Neuroscience, 27(50):13603 13613, 2007. Merchan-Perez, A., Rodriguez, J., Alonso-Nanclares, L., Schertel, A., and De Felipe, J. Counting synapses using fib/sem microscopy: A true revolution for ultrastructural volume reconstruction. Frontiers in Neuroanatomy, 3 (18), 2009. Mortlock, D. Astronomy: The age of the quasars. Nature, 514:43 44, 2009. Porat, E. and Rothschild, A. Altomata, Languages and Pro- gramming, volume 5125, chapter Explicit Non-adaptive Combinatorial Group Testing Schemes, pp. 748 759. Springer, Berlin Heidelberg, 2008. Stinson, D., Trung, T., and Wei, R. Secure flameproof codes, key distribution patterns, group testing algorithms and related structures. Journal of Statistical Planning and Inference, 86:595 617, 2000. Sznitman, R. and Jedynak, B. Active testing for face de- tection and localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(10):1914 1920, 2010. Sznitman, R., Lucchi, A., Frazier, P., Jedynak, B., and Fua, P. An optimal policy for target localization with application to electron microscopy. In Proceedings of the 30th International Conference on Machine Learning, pp. 1 9, 2013a. Sznitman, R., Richa, R., Taylor, R., Jedynak, B., and Hager, G. Unified detection and tracking of instruments during retina microsurgery. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(5):1263 1273, 2013b. Tsiligkaridis, T., Sadler, B. M., and Hero, A. O. Collaborative 20 questions for target localization. Co RR, abs/1306.1922, 2013. Waeber, R., Frazier, P. I., and Henderson, S. G. Bisection search with noisy responses. SIAM J. Control and Optimization, 51(3):2261 2279, 2013.