# efficient_minimax_signal_detection_on_graphs__e8c36ff1.pdf Efficient Minimax Signal Detection on Graphs Jing Qian Division of Systems Engineering Boston University Brookline, MA 02446 jingq@bu.edu Venkatesh Saligrama Department of Electrical and Computer Engineering Boston University Boston, MA 02215 srv@bu.edu Several problems such as network intrusion, community detection, and disease outbreak can be described by observations attributed to nodes or edges of a graph. In these applications presence of intrusion, community or disease outbreak is characterized by novel observations on some unknown connected subgraph. These problems can be formulated in terms of optimization of suitable objectives on connected subgraphs, a problem which is generally computationally difficult. We overcome the combinatorics of connectivity by embedding connected subgraphs into linear matrix inequalities (LMI). Computationally efficient tests are then realized by optimizing convex objective functions subject to these LMI constraints. We prove, by means of a novel Euclidean embedding argument, that our tests are minimax optimal for exponential family of distributions on 1-D and 2-D lattices. We show that internal conductance of the connected subgraph family plays a fundamental role in characterizing detectability. 1 Introduction Signals associated with nodes or edges of a graph arise in a number of applications including sensor network intrusion, disease outbreak detection and virus detection in communication networks. Many problems in these applications can be framed from the perspective of hypothesis testing between null and alternative hypothesis. Observations under null and alternative follow different distributions. The alternative is actually composite and identified by sub-collections of connected subgraphs. To motivate the setup consider the disease outbreak problem described in [1]. Nodes there are associated with counties and observations associated with each county correspond to reported cases of a disease. Under the null distribution, observations at each county are assumed to be poisson distributed and independent across different counties. Under the alternative there are a contiguous sub-collection of counties (connected sub-graph) that each experience elevated cases on average from their normal levels but are otherwise assumed to be independent. The eventual shape of the sub-collection of contiguous counties is highly unpredictable due to uncontrollable factors. In this paper we develop a novel approach for signal detection on graphs that is both statistically effective and computationally efficient. Our approach is based on optimizing an objective function subject to subgraph connectivity constraints, which is related to generalized likelihood ratio tests (GLRT). GLRTs maximize likelihood functions over combinatorially many connected subgraphs, which is computationally intractable. On the other hand statistically, GLRTs have been shown to be asymptotically minimax optimal for exponential class of distributions on Lattice graphs & Trees [2] thus motivating our approach.We deal with combinatorial connectivity constraints by obtaining a novel characterization of connected subgraphs in terms of convex Linear Matrix Inequalities (LMIs). In addition we show how our LMI constraints naturally incorporate other features such as shape and size. We show that the resulting tests are essentially minimax optimal for exponential family of distributions on 1-D and 2-D lattices. Conductance of the subgraph, a parameter in our LMI constraint, plays a central role in characterizing detectability. Related Work: The literature on signal detection on graphs can be organized into parametric and non-parametric methods, which can be further sub-divided into computational and statistical analysis themes. Parametric methods originated in the scan statistics literature [3] with more recent work including that of [4, 5, 6, 1, 7, 8] focusing on graphs. Much of this literature develops scanning methods that optimize over rectangles, circles or neighborhood balls [5, 6] across different regions of the graphs. However, the drawbacks of simple shapes and the need for non-parametric methods to improve detection power is well recognized. This has led to new approaches such as simulated annealing [5, 4] but is lacking in statistical analysis. More recent work in ML literature [9] describes semi-definite programming algorithm for non-parametric shape detection, which is similar to our work here. However, unlike us their method requires a heuristic rounding step, which does not lend itself to statistical analysis. In this context a number of recent papers have focused on statistical analysis [10, 2, 11, 12] with non-parametric shapes. They derive fundamental bounds for signal detection for the elevated means testing problem in the Gaussian setting on special graphs such as trees and lattices. In this setting under the null hypothesis the observations are assumed to be independent identically distributed (IID) with standard normal random variables. Under the alternative the Gaussian random variables are assumed to be standard normal except on some connected subgraph where the mean μ is elevated. They show that GLRT achieves near -minimax optimality in a number of interesting scenarios. While this work is interesting the suggested algorithms are computationally intractable. To the best of our knowledge only [13, 14] explores a computationally tractable approach and also provides statistical guarantees. Nevertheless, this line of work does not explicitly deal with connected subgraphs (complex shapes) but deals with more general clusters. These are graph partitions with small out-degree. Although this appears to be a natural relaxation of connected subgraphs/complex-shapes it turns out to be quite loose1 and leads to substantial gap in statistical effectiveness for our problem. In contrast we develop a new method for signal detection of complex shapes that is not only statistically effective but also computationally efficient. 2 Problem Formulation Let G = (V, E) denote an undirected unweighted graph with |V | = n nodes and |E| = m edges. Associated with each node, v V , are observations xv Rp. We assume observations are distributed P0 under the null hypothesis. The alternative is composite and the observed distribution, PS, is parameterized by S V belonging to a class of subsets Λ S, where S is the superset. We denote by SK S the collection of size-K subsets. ES = {(u, v) E : u S, v S} denotes the induced edge set on S. We let x S denote the collection of random variables on the subset S V . Sc denotes nodes V S. Our goal is to design a decision rule, π, that maps observations xn = (xv)v V to {0, 1} with zero denoting null hypothesis and one denoting the alternative. We formulate risk following the lines of [12] and combine Type I and Type II errors: R(π) = P0 (π(xn) = 1) + max S Λ PS (π(xn) = 0) (1) Definition 1 (δ-Separable). We say that the composite hypothesis problem is δ-separable if there exists a test π such that, R(π) δ. We next describe asymptotic notions of detectability and separability. These notions requires us to consider large-graph limits. To this end we index a sequence of graphs Gn = (Vn, En) with n and an associated sequence of tests πn. Definition 2 (Separability). We say that the composite hypothesis problem is asymptotically δseparable if there is some sequence of tests, πn, such that R(πn) δ for sufficiently large n. It is said to be asymptotically separable if R(πn) 0. The composite hypothesis problem is said to be asymptotically inseparable if no such test exists. Sometimes, additional granular measures of performance are often useful to determine asymptotic behavior of Type I and Type II error. This motivates the following definition: 1A connected subgraph on a 2-D lattice of size K has out-degree at least Ω( K) while set of subgraphs with out-degree Ω( K) includes disjoint union of Ω( K/4) nodes. So statistical requirements with out-degree constraints can be no better than those for arbitrary K-sets. Definition 3 (δ-Detectability). We say that the composite hypothesis testing problem is δ-detectable if there is a sequence of tests, πn, such that, sup S Λ PS(πn(xn) = 0) n 0, lim sup n P0(πn(xn) = 1) δ In general δ-detectability does not imply separability. For instance, consider x H0 N(0, σ2) and x H1 N(μ, σ2 n ). It is δ-detectable for μ δ but not separable. Generalized Likelihood Ratio Test (GLRT) is often used as a statistical test for composite hypothesis testing. Suppose φ0(xn) and φS(xn) are probability density functions associated with P0 and PS respectively. The GLRT test thresholds the best-case likelihood ratio, namely, GLRT: ℓmax(xn) = max S Λ ℓS(xn) H1 >< H0 η, ℓS(x) = log φS(xn) Local Behavior: Without additional structure, the likelihood ratio, ℓS(x) for a fixed S Λ is a function of observations across all nodes. Many applications exhibit local behavior, namely, the observations under the two hypothesis behave distinctly only on some small subset of nodes (as in disease outbreaks). This justifies introducing local statistical models in the following section. Combinatorial: The class Λ is combinatorial such as collections of connected subgraphs and GLRT is not generally computationally tractable. On the other hand GLRT is minimax optimal for special classes of distributions and graphs and motivates development of tractable algorithms. 2.1 Statistical Models & Subgraph Classes The foregoing discussion motivates introducing local models, which we present next. Then informed by existing results on separability we categorize subgraph classes by shape, size and connectivity. 2.1.1 Local Statistical Models Signal in Noise Models arise in sensor network (SNET) intrusion [7, 15] and disease outbreak detection [1]. They are modeled with Gaussian (SNET) and Poisson (disease outbreak) distributions. H0 : xv = wv; H1 : xv = μαuv1S(v) + wv, for some, S Λ, u S (3) For Gaussian case we model μ as a constant, wv as IID standard normal variables, αuv as the propagation loss from source node u S to the node v. In disease outbreak detection μ = 1, αuv Pois(λNv) and wv Pois(Nv) are independent Poisson random variables, and Nv is the population of county v. In these cases ℓS(x) takes the following local form where Zv is a normalizing constant. ℓS(x) = ℓS(x S) v V (Ψv(xv) log(Zv))1S(v) (4) We characterize μ0, λ0 as the minimum value that ensures separability for the different models: μ0 = inf{μ R+ | πn, lim n R(πn) = 0}, λ0 = inf{λ R+ | πn, lim n R(πn) = 0} (5) Correlated Models arise in textured object detection [16] and protein subnetwork detection [17]. For instance consider a common random signal z on S, which results in uniform correlation ρ > 0 on S. H0 : xv = wv; H1 : xv = ( ρ(1 ρ) 1)z1S(v) + wv, for some, S Λ, (6) z, wv are standard IID normal random variables. Again we obtain ℓS(x) = ℓS(x S). These examples motivate the following general setup for local behavior: Definition 4. The distributions P0 and PS are said to exhibit local structure if they satisfy: (1) Markovianity: The null distribution P0 satisfies the properties of a Markov Random Field (MRF). Under the distribution PS the observations x S are conditionally independent of x Sc 1 when conditioned on annulus S1 Sc, where S1 = {v V | d(v, w) 1, w S}, is the 1-neighborhood of S. (2) Mask: Marginal distributions of observations under P0 and PS on nodes in Sc are identical: P0(x Sc A) = PS(x Sc A), A A, the σ-algebra of measurable sets. Lemma 1 ([7]). Under conditions (1) and (2) it follows that ℓS(x) = ℓS(x S1). 2.1.2 Structured Subgraphs Existing works [10, 2, 12] point to the important role of size, shape and connectivity in determining detectability. For concreteness we consider the signal in noise model for Gaussian distribution and tabulate upper bounds from existing results for μ0 (Eq. 5). The lower bounds are messier and differ by logarithmic factors but this suffices for our discussion here. The table reveals several important points. Larger sets are easier to detect μ0 decreases with size; connected K-sets are easier to detect relative to arbitrary K-sets; for 2-D lattices thick connected shapes are easier to detect than thin sets (paths); finally detectability on complete graphs is equivalent to arbitrary K-sets, i.e., shape does not matter. Intuitively, these tradeoffs make sense. For a constant μ, signal-to-noise ratio increases with size. Combinatorially, there are fewer K-connected sets than arbitrary K-sets; fewer connected balls than connected paths; and fewer connected sets in 2-D lattices than dense graphs. These results point to the need for characterizing the signal detection problem in terms of Arbitrary K-Set K-Connected Ball K-Connected Path Line Graph ω 2-D Lattice ω connectivity, size, shape and the properties of the ambient graph. We also observe that the table is somewhat incomplete. While balls can be viewed as thick shapes and paths as thin shapes, there are a plethora of intermediate shapes. A similar issue arises for sparse vs. dense graphs. We introduce general definitions to categorize shape and graph structures below. Definition 5 (Internal Conductance). (a.k.a. Cut Ratio) Let H = (S, FS) denote a subgraph of G = (V, E) where S V , FS ES, written as H G. Define the internal conductance of H as: φ(H) = min A S |δS(A)| min{|A|, |S A|}; δS(A) = {(u, v) FS | u A, v S A} (7) Apparently φ(H) = 0 if H is not connected. The internal conductance of a collection of subgraphs, Σ, is defined as the smallest internal conductance: φ(Σ) = min H Σ φ(H) For future reference we denote the collection of connected subgraphs by C and by Ca,Φ the subcollections containing node a V with minimal internal conductance Φ: C = {H G : φ(H) > 0}, Ca,Φ = {H = (S, FS) G : a S, φ(H) Φ} (8) In 2-D lattices, for example, φ(BK) Ω(1/ K) for connected K-balls BK or other thick shapes of size K. φ(C SK) Ω(1/K) due to snake -like thin shapes. Thus internal conductance explicitly accounts for shape of the sets. 3 Convex Programming We develop a convex optimization framework for generating test statistics for local statistical models described in Section 2.1. Our approach relaxes the combinatorial constraints and the functional objectives of the GLRT problem of Eq.(2). In the following section we develop a new characterization based on linear matrix inequalities that accounts for size, shape and connectivity of subgraphs. For future reference we denote A B Δ= [Aij Bij]i,j. Our first step is to embed subgraphs, H of G, into matrices. A binary symmetric incidence matrix, A, is associated with an undirected graph G = (V, E), and encodes edge relationships. Formally, the edge set E is the support of A, namely, E = Supp(A). For subgraph correspondences we consider symmetric matrices, M, with components taking values in the unit interval, [0, 1]. M = {M [0, 1]n n | Muv Muu, M Symmetric} Definition 6. M M is said to correspond to a subgraph H = (S, FS), written as H M, if S = Supp{Diag(M)}, FS = Supp(A M) The role of M M is to ensure that if u S we want the corresponding edges Muv = 0. Note that A M in Defn. 6 removes the spurious edges Muv = 0 for (u, v) / ES. Our second step is to characterize connected subgraphs as convex subsets of M. Now a subgraph H = (S, FS) is a connected subgraph if for every u, v S, there is a path consisting only of edges in FS going from u to v. This implies that for two subgraphs H1, H2 and corresponding matrices M1 and M2, their convex combination Mη = ηM1 + (1 η)M2, η (0, 1) naturally corresponds to H = H1 H2 in the sense of Defn 6. On the other hand if H1 H2 = then H is disconnected and so Mη is as well. This motivates our convex characterization with a common anchor node. To this end we consider the following collection of matrices: M a = {M M | Maa = 1, Mvv Mav} Note that M a includes star graphs induced on subsets S = Supp(Diag(M)) with anchor node a. We now make use of the well known properties [18] of the Laplacian of a graph to characterize connectivity. The unnormalized Laplacian matrix of an undirected graph G with incidence matrix A is described by L(A) = diag(A1n) A where 1n is the all-one vector. Lemma 2. Graph G is connected if and only if the number of zero eigenvalues of L(A) is one. Unfortunately, we cannot directly use this fact on the subgraph A M because there are many zero eigenvalues because the complement of Supp(Diag(M)) is by definition zero. We employ linear matrix inequalities (LMI) to deal with this issue. The condition [19] F(x) = F0 + F1x1 + + Fpxp 0 with symmetric matrices Fj is called a linear matrix inequality in xj R with respect to the positive semi-definite cone represented by . Note that the Laplacian of the subgraph L(A M) is a linear matrix function of M. We denote a collection of subgraphs as follows: CLMI(a, γ) Δ= {H M | M M a, L(A M) γL(M) 0} (9) Theorem 3. The class CLMI(a, γ) is connected for γ > 0. Furthermore, every connected subgraph can be characterized in this way for some a V and γ > 0, namely, C = a V,γ>0 CLMI(a, γ). Proof Sketch. M CLMI(a, γ) implies M is connected. By definition of Ma there must be a star graph that is a subgraph on Supp(Diag(M)). This means that L(M) (hence L(A M)) can only have one zero eigenvalue on Supp(Diag(M)). We can now invoke Lemma 2 on Supp(Diag(M)). The other direction is based on hyperplane separation of convex sets. Note that Ca,γ is convex but C is not. This necessitates the need for an anchor. In practice this means that we have to search for connected sets with different anchors. This is similar to scan statistics the difference being that we can now optimize over arbitrary shapes. We next get a handle on γ. γ encodes Shape: We will relate γ to the internal conductance of the class C. This provides us with a tool to choose γ to reflect the type of connected sets that we expect for our alternative hypothesis. In particular thick sets correspond to relatively large γ and thin sets to small γ. In general for graphs of fixed size the minimum internal conductance over all connected shapes is strictly positive and we can set γ to be this value if we do not a priori know the shape. Theorem 4. In a 2-D lattice, it follows that Ca,Φ CLMI(a, γ), where γ = Θ( Φ2 log(1/Φ)). LMI-Test: We are now ready to present our test statistics. We replace indicator variables with the corresponding matrix components in Eq. 4, i.e., 1S(v) Mvv, 1S(u)1S(v) Muv and obtain: Elevated Mean: ℓM(x) = v V (Ψv(xv) log(Zv))Mvv Correlated Gaussian: ℓM(x) (u,v) E Ψ(xu, xv)Muv v Mvv log(1 ρ) (10) LMITa,γ ℓa,γ(x) = max M CLMI(a,γ) ℓM(x) H1 >< H0 η (11) This test explicitly makes use of the fact that alternative hypothesis is anchored at a and the internal conductance parameter γ is known. We will refine this test to deal with the completely agnostic case in the following section. In this section we analyze LMITa,γ and the agnostic LMI tests for the Elevated Mean problem for exponential family of distributions on 2-D lattices. For concreteness we focus on Gaussian & Poisson models and derive lower and upper bounds for μ0 (see Eq. 5). Our main result states that to guarantee separability, μ0 Ω 1 KΦ , where Φ is the internal conductance of the family Ca,Φ of connected subgraphs, K is the size of the subgraphs in the family, and a is some node that is common to all the subgraphs. The reason for our focus on homogenous Gaussian/Poisson setting is that we can extend current lower bounds in the literature to our more general setting and demonstrate that they match the bounds obtained from our LMIT analysis. We comment on how our LMIT analysis extends to other general structures and models later. The proof for LMIT analysis involves two steps (see Supplementary): 1. Lower Bound: Under H1 we show that the ground truth is a feasible solution. This allows us to lower bound the objective value, ℓa,γ(x), of Eq. 11. 2. Upper Bound: Under H0 we consider the dual problem. By weak duality it follows that any feasible solution of the dual is an upper bound for ℓa,γ(x). A dual feasible solution is then constructed through a novel Euclidean embedding argument. We then compare the upper and lower bounds to obtain the critical value μ0. We analyze both non-agnostic and agnostic LMI tests for the homogenous version of Gaussian and Poisson models of Eq. 3 for both finite and asymptotic 2-D lattice graphs. For the finite case the family of subgraphs in Eq. 3 is assumed to belong to the connected family of sets, Ca,Φ SK, containing a fixed common node a V of size K. For the asymptotic case we let the size of the graph approach infinity (n ). For this case we consider a sequence of connected family of sets Cn a.Φn SKn on graph Gn = (Vn, En) with some fixed anchor node a Vn. We will then describe results for agnostic LMI tests, i.e., lacking knowledge of conductance Φ and anchor node a. Poisson Model: In Eq. 3 we let the population Nv to be identically equal to one across counties. We present LMI tests that are agnostic to shape and anchor nodes: LMITA : ℓ(x) = max a V,γ Φ2 min H0 >< H1 0 (12) where Φmin denotes the minimum possible conductance of a connected subgraph with size K, which is 2/K. Theorem 5. The LMITa,γ test achieves δ-separability for λ = Ω( log(K) KΦ ) and the agnostic test LMITA for λ = Ω(log K log n). Next we consider the asymptotic case and characterize tight bounds for separability. Theorem 6. The two hypothesis H0 and H1 are asymptotically inseparable if λnΦn Kn log(Kn) 0. It is asymptotically separable with LMITa,γ for λn KnΦn/ log(Kn) . The agnostic LMITA achieves asymptotic separability with λn/(log(Kn) log n) . Gaussian Model: We next consider agnostic tests for Gaussian model of Eq. 3 with no propagation loss, i.e., αuv = 1. Theorem 7. The two hypotheses H0 and H1 for the Gaussian model are asymptotically inseparable if μnΦn Kn log(Kn) 0, are separable with LMITa,γ if μn KnΦn/ log(Kn) , and are separable with LMITA if μn/(log(Kn) log n) Our inseparability bound matches existing results on 2-D Lattice & Line Graphs by plugging in appropriate values for Φ for the cases considered in [2, 12]. The lower bound is obtained by specializing to a collection of non-decreasing band subgraphs.Yet LMITa,γ and LMITA is able to achieves the lower bound within a logarithmic factor. Furthermore, our analysis extends beyond Poisson & Gaussian models and applies to general graph structures and models. The main reason is that our LMIT analysis is fairly general and provides an observation-dependent bound through convex duality. We briefly describe it here. Consider functions ℓS(x) that are positive, separable 0 2 4 6 8 10 0 (a) Thick shape 0 2 4 6 8 10 0 (b) Thin shape 0 2 4 6 8 10 0 (c) Snake shape 0 2 4 6 8 10 0 (d) Thin shape(8-neighbors) Figure 1: Various shapes of ground-truth anomalous clusters on a fixed 15 10 lattice. Anomalous cluster size is fixed at 17 nodes. (a) shows a thick cluster with a large internal conductance. (b) shows a relatively thinner shape. (c) shows a snake-like shape which has the smallest internal conductance. (d) shows the same shape of (b), with the background lattice more densely connected. and bounded for simplicity. By establishing primal feasibility that the subgraph S CLMI(a, γ) for a suitably chosen γ, we can obtain a lower bound for the alternative hypothesis H1 and show that EH1 max M CLMI(a,γ) ℓM(x) EH1 v S ℓS(xv) . On the other hand for the null hypothesis we can show that, EH0 max M CLMI(a,γ) ℓM(x) EH0 v B(a,Θ( γ)) ℓS(xv) . Here EH1 and EH0 denote expectations with respect to alternative and null hypothesis and B(a, Θ( γ)) is a ball-like thick shape centered at a V with radius Θ( γ). Our result then follows by invoking standard concentration inequalities. We can extend our analysis to the non-separable case such as correlated models because of the linear objective form in Eq. 10. 5 Experiments We present several experiments to highlight key properties of LMIT and to compare LMIT against other state-of-art parametric and non-parametric tests on synthetic and real-world data. We have shown that agnostic LMIT is near minimax optimal in terms of asymptotic separability. However, separability is an asymptotic notion and only characterizes the special case of zero false alarms (FA) and missed detections (MD), which is often impractical. It is unclear how LMIT behaves with finite size graphs when FAs and MDs are prevalent. In this context incorporating priors could indeed be important. Our goal is to highlight how shape prior (in terms of thick, thin, or arbitrary shapes) can be incorporated in LMIT using the parameter γ to obtain better AUC performance in finite size graphs. Another goal is to demonstrate how LMIT behaves with denser graph structures. From the practical perspective, our main step is to solve the following SDP problem: i yi Mii s.t. M CLMI(a, γ), tr(M) K We use standard SDP solvers which can scale up to n 1500 nodes for sparse graphs like lattice and n 300 nodes for dense graphs with m = Θ(n2) edges. To understand the impact of shape we consider the test LMITa,γ for Gaussian model and manually vary γ. On a 15 10 lattice we fix the size (17 nodes) and the signal strength μ |S| = 3, and consider three different shapes (see Fig. 1) for the alternative hypothesis. For each shape we synthetically simulate 100 null and 100 alternative hypothesis and plot AUC performance of LMIT as a function of γ. We observe that the optimum value of AUC for thick shapes is achieved for large γ and small γ for thin shape confirming our intuition that γ is a good surrogate for shape. In addition we notice that thick shapes have superior AUC performance relative to thin shapes, again confirming intuition of our analysis. To understand the impact of dense graph structures we consider performance of LMIT with neighborhood size. On the lattice of the previous experiment we vary neighborhood by connecting each node to its 1-hop, 2-hop, and 3-hop neighbors to realize denser structures with each node having 4, 8 and 12 neighbors respectively. Note that all the different graphs have the same vertex set. This is convenient because we can hold the shape under the alternative fixed for the different graphs. As before we generate 100 alternative hypothesis using the thin set of the previous experiment with the same mean μ and 100 nulls. The AUC curves for the different graphs highlight the fact that higher density leads to degradation in performance as our intuition with complete graphs suggests. We also 10 3 10 2 10 1 10 0 10 1 0.65 LMIT shape parameter AUC performance Thick shape Thin shape Snake shape = 0.2 AUC=0.952 = 0.05 AUC=0.899 = 0.02 AUC=0.865 (a) AUC with various shapes 10 3 10 2 10 1 10 0 10 1 0.65 LMIT shape parameter AUC performance 4 neighbor lattice 8 neighbor lattice 12 neighbor lattice = 0.2 AUC=0.855 = 0.05 AUC=0.899 = 0.1 AUC=0.874 (b) AUC with different graph structures Figure 2: (a) demonstrates AUC performances with fixed lattice structure, signal strength μ and size (17 nodes), but different shapes of ground-truth clusters, as shown in Fig.1. (b) demonstrates AUC performances with fixed signal strength μ, size (17 nodes) and shape (Fig.1(b)), but different lattice structures. see that as density increases a larger γ achieves better performance confirming our intuition that as density increases the internal conductance of the shape increases. In this part we compare LMIT against existing state-of-art approaches on a 300-node lattice, a 200node random geometric graph (RGG), and a real-world county map graph (129 nodes) (see Fig.3,4). We incorporate shape priors by setting γ (internal conductance) to correspond to thin sets. While this implies some prior knowledge, we note that this is not necessarily the optimal value for γ and we are still agnostic to the actual ground truth shape (see Fig.3,4). For the lattice and RGG we use the elevated-mean Gaussian model. Following [1] we adopt an elevated-rate independent Poisson model for the county map graph. Here Ni is the population of county, i. Under null the number of cases at county i, follows a Poisson distribution with rate Niλ0 and under the alternative a rate Niλ1 within some connected subgraph. We assume λ1 > λ0 and apply a weighted version of LMIT of Eq. 12, which arises on account of differences in population. We compare LMIT against several other tests, including simulated annealing (SA) [4], rectangle test (Rect), nearest-ball test (NB), and two naive tests: maximum test (Max T) and average test (Avg T). SA is a non-parametric test and works by heuristically adding/removing nodes toward a better normalized GLRT objective while maintaining connectivity. Rect and NB are parametric methods with Rect scanning rectangles on lattice and NB scanning nearest-neighbor balls around different nodes for more general graphs (RGG and countymap graph). Max T & Avg T are often used for comparison purposes. Max T is based on thresholding the maximum observed value while Avg T is based on thresholding the average value. We observe that uniformly Max T and Avg T perform poorly. This makes sense; It is well known that Max T works well only for alternative of small size while Avg T works well with relatively large sized alternatives [11]. Parametric methods (Rect/NB) performs poorly because the shape of the ground truth under the alternative cannot be well-approximated by Rectangular or Nearest Neighbor Balls. Performance of SA requires more explanation. One issue could be that SA does not explicitly incorporate shape and directly searches for the best GLRT solution. We have noticed that this has the tendency to amplify the objective value of null hypothesis because SA exhibits poor regularization over the shape. On the other hand LMIT provides some regularization for thin shape and does not admit arbitrary connected sets. Table 1: AUC performance of various algorithms on a 300-node lattice, a 200-node RGG, and the county map graph. On all three graphs LMIT significantly outperforms the other tests consistently for all SNR levels. SNR lattice (μ |S|/σ) RGG (μ |S|/σ) map (λ1/λ0) 1.5 2 3 1.5 2 3 1.1 1.3 1.5 LMIT 0.728 0.780 0.882 0.642 0.723 0.816 0.606 0.842 0.948 SA 0.672 0.741 0.827 0.627 0.677 0.756 0.556 0.744 0.854 Rect(NB) 0.581 0.637 0.748 0.584 0.632 0.701 0.514 0.686 0.791 Max T 0.531 0.547 0.587 0.529 0.562 0.624 0.525 0.559 0.543 Avg T 0.565 0.614 0.705 0.545 0.623 0.690 0.536 0.706 0.747 [1] G. P. Patil and C. Taillie. Geographic and network surveillance via scan statistics for critical area detection. In Statistical Science, volume 18(4), pages 457 465, 2003. [2] E. Arias-Castro, E. J. Candes, H. Helgason, and O. Zeitouni. Searching for a trail of evidence in a maze. In The Annals of Statistics, volume 36(4), pages 1726 1757, 2008. [3] J. Glaz, J. Naus, and S. Wallenstein. Scan Statistics. Springer, New York, 2001. [4] L. Duczmal and R. Assuncao. A simulated annealing strategy for the detection of arbitrarily shaped spatial clusters. In Computational Statistics and Data Analysis, volume 45, pages 269 286, 2004. [5] M. Kulldorff, L. Huang, L. Pickle, and L. Duczmal. An elliptic spatial scan statistic. In Statistics in Medicine, volume 25, 2006. [6] C. E. Priebe, J. M. Conroy, D. J. Marchette, and Y. Park. Scan statistics on enron graphs. In Computational and Mathematical Organization Theory, 2006. [7] V. Saligrama and M. Zhao. Local anomaly detection. In Artificial Intelligence and Statistics, volume 22, 2012. [8] V. Saligrama and Z. Chen. Video anomaly detection based on local statistical aggregates. 2013 IEEE Conference on Computer Vision and Pattern Recognition, 0:2112 2119, 2012. [9] J. Qian and V. Saligrama. Connected sub-graph detection. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2014. [10] E. Arias-Castro, D. Donoho, and X. Huo. Near-optimal detection of geometric objects by fast multiscale methods. In IEEE Transactions on Information Theory, volume 51(7), pages 2402 2425, 2005. [11] Addario-Berry, N. Broutin, L. Devroye, and G. Lugosi. On combinatorial testing problems. In The Annals of Statistics, volume 38(5), pages 3063 3092, 2010. [12] E. Arias-Castro, E. J. Candes, and A. Durand. Detection of an anomalous cluster in a network. In The Annals of Statistics, volume 39(1), pages 278 304, 2011. [13] J. Sharpnack, A. Rinaldo, and A. Singh. Changepoint detection over graphs with the spectral scan statistic. In International Conference on Artificial Intelligence and Statistics, 2013. [14] J. Sharpnack, A. Krishnamurthy, and A. Singh. Near-optimal anomaly detection in graphs using lovasz extended scan statistic. In Neural Information Processing Systems, 2013. [15] Erhan Baki Ermis and Venkatesh Saligrama. Distributed detection in sensor networks with limited range multimodal sensors. IEEE Transactions on Signal Processing, 58(2):843 858, 2010. [16] G. R. Cross and A. K. Jain. Markov random field texture models. In IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 5, pages 25 39, 1983. [17] M. Bailly-Bechet, C. Borgs, A. Braunstein, J. T. Chayes, A.Dagkessamanskaia, J. Francois, and R. Zecchina. Finding undetected protein associations in cell signaling by belief propagation. In Proceedings of the National Academy of Sciences (PNAS), volume 108, pages 882 887, 2011. [18] F. Chung. Spectral graph theory. American Mathematical Society, 1996. [19] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.