# uncovering_specificshape_graph_anomalies_in_attributed_graphs__0221316b.pdf The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19) Uncovering Specific-Shape Graph Anomalies in Attributed Graphs Nannan Wu, Wenjun Wang, Feng Chen, Jianxin Li, Bo Li, Jinpeng Huai College of Intelligence and Computing, Tianjin University, Tianjin 300072, China Dept. of Computer Science, University at Albany, SUNY, Albany, NY 12203 Dept. of Computer Science & Engineering, Beihang University, Beijing 100191, China {wunannan, lijx, libo}@act.buaa.edu.cn wwj@pku.org.cn fchen5@albany.edu huaijp@buaa.edu.cn As networks are ubiquitous in the modern era, point anomalies have been changed to graph anomalies in terms of anomaly shapes. However, the specific-shape priors about anomalous subgraphs of interest are seldom considered by the traditional approaches when detecting the subgraphs in attributed graphs (e.g., computer networks, Bitcoin networks, and etc.). This paper proposes a nonlinear approach to specific-shape graph anomaly detection. The nonlinear approach focuses on optimizing a broad class of nonlinear cost functions via specific-shape constraints in attributed graphs. Our approach can be used to many different graph anomaly settings. The traditional approaches can only support linear cost functions (e.g., an aggregation function for the summation of node weights). However, our approach can employ more powerful nonlinear cost functions, and enjoys a rigorous theoretical guarantee on the near-optimal solution with the geometrical convergence rate. Introduction In numerous network applications, the computer network data of interest consist of star-shape attacking subgraphs (Wu et al. 2017), and the political blog data of interest consist of core-periphery graph shape anomalies (Zhang, Martin, and Newman 2015). Anomalies appear in the real network applications in the form of specific-shapes rather than point shapes. In the cyber attack detection applications in Figure 1, the deep node (i.e., computer) color represents the higher transfer rate . Given the star-shape anomaly query in Figure 1, we formulate the following specific-shape constrained minimization problem to uncover the specific shape attack subgraph anomalies: min x Rn ϕ(x) s.t. supp(x) M(Q). (1) where ϕ : Rn 7 R is a powerful nonlinear cost function. An attributed graph G = (V, E, W) is comprised of a set of n vertices V = [v] = {1, , n}, an edge set E V V , and an attribute matrix W Rn p (p, the number of attributes). Given the specific-shape anomalous graph prior Q, let VQ denote the vertex set of Q, and M(Q) := {VC | C G, C = Q} be the family set of specific-shape vertex sets, whose Copyright c 2019, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Figure 1: The main idea of uncovering specific-shape graph anomalies by the powerful nonlinear function within the continuous space (Akoglu, Tong, and Koutra 2015). In the attributed graph, abnormal vertices come with red circles. graphs {C} are subgraph of the attributed graph G, and are isomorphic to the specific-shape anomalous graph prior Q. The basic intuition for Problem (1) is that the cost function ϕ is minimized at the constrained x whose nonzero entry index set supp(x) := {v | xv = 0, v V } corresponds to the specific-shape vertex set VC from M(Q). To motivate this problem, in an attributed computer network G, we represent the attribute matrix W as transfer rate attributes. Let Wv := WT v,: be the p-dimensional attribute vector for the vertex v V in the attributed graph G, and W(i) := W:,i be the n-dimensional attribute vector for the attribute i [p]. Let W(0) denote as the vector w. Problem (1) for identifying cyber attacking scenarios can be formulated as the least square problem: minx Rn ||w x||2 2, subject to supp(x) M(Q) (Chen and Zhou 2016). Related work. Point anomalies (i.e., outliers) can be considered as a binary 0/1 classification problem (Akoglu, Tong, and Koutra 2015). These approaches just assign what is called an outlierness score to each node. The specific-shape anomaly is not considered in these approaches. A number of approaches are proposed specifically for the graph anomaly detection problem. They can be directly classified as vector-based and graph-based approaches from the Table 1: Comparison between our work and several representative works in graph anomaly detection. The nonzero entry index set of vector x implies a graph anomaly in the attributed graph, and s = |VQ|. Related Work Target Solution Anomaly Shape Nonlinear Cost Function (Hegde, Indyk, and Schmidt 2015) s-sparse signal ||x||0 s Connected subgraph anomaly (Chen and Zhou 2016) s-sparse signal ||x||0 s Connected subgraph anomaly (Gupta et al. 2014) Subgraph C Specific-shape graph anomaly (Yang et al. 2016) Subgraph C Specific-shape graph anomaly (Wu et al. 2017) s-sparse signal ||x||0 s Tree-shape graph anomaly Our work s-sparse signal ||x||0 s Specific-shape graph anomaly target solution (e.g., in Table 1). In the vector-based approaches, a graph anomaly is considered as a sparse signal, and the true signal is recovered by the nonlinear cost functions (e.g., the norm-2 function). Graph-structured matching pursuit (Chen and Zhou 2016) and Graph-Co Sa MP (Hegde, Indyk, and Schmidt 2015) approaches are proposed to connected subgraph anomaly detection by general nonlinear functions. The graph tree projection pursuit approach (Wu et al. 2017) can employ the nonlinear cost functions, such as Kulldorff (Kulldorff 1997) and Expectation-based Poisson (EBP) graph scan statistics (Neill 2009b), to the tree shape graph anomaly detection. The specific-shape prior can be used to the powerful nonlinear cost function for anomaly detection in attributed graphs with network structures and vertex attributes. In the graph-based approaches, a graph anomaly is considered as a matching subgraph in attributed graphs, and the index is built on vertices and edges. Most traditional approaches (Cordella et al. 2004; Huan, Wang, and Prins 2003; Gupta et al. 2014; Yang et al. 2016; 2014; Zou, Chen, and Lu 2007) can employ just linear cost functions for the index structure. Those approaches can not handle the nonlinear anomalies exhibited in attributed graphs. In Table 1, our work can employ more powerful nonlinear cost functions than the graph-based approaches (Gupta et al. 2014; Yang et al. 2016). The other vector-based approaches (Hegde, Indyk, and Schmidt 2015; Chen and Zhou 2016) consider just the connected subgraph anomaly without the specific shape anomaly prior. Although the work (Wu et al. 2017) considers the tree-shape graph anomaly, the other complex specific shapes can not be handled in this approach. Our work aims to develop a nonlinear approach to specific shape sparsity anomaly structure via attributed graphs. Our approach has three main features: (a) Generality: the approach can encompass several previously studied subgraph matching methods and projection-pursuit oracles (e.g., model-projection algorithms). (b) Theoretical basis: the approach achieves the near-optimal solution at a constant error bound with the geometrically convergence rate. (c) Computational efficiency: we present a near-linear time algorithm for Problem (1). The main contributions of our work are summarized as follows: Develop an efficient nonlinear approach to graph anomalies. Given the specific shape anomaly query graph Q, a new and efficient approach, namely, Query-based matching pursuit (Query-map), is developed to optimize a nonlinear function over the specific shape graphstructured sparsity model M(Q). Theoretical basis. The proposed approach, Query-map, achieves the near-optimal solution at a constant error bound with the geometrically linear convergence rate. The cost function ϕ satisfies a weaker condition than the popular strong condition such as Restricted Strong Convexity/Smoothness (RSC/RSS). Comprehensive experiments to verify the proposed approach on the real datasets. The powerful graph scan statistic nonlinear functions are employed in our approach, Query-map, for the task of specific shape graph anomaly detection. The extensive experiments on the real datasets show that Query-map performs competitively with a variety of representative methods for the graph anomaly detection. This work aims to study the graph anomaly with a specific shape prior. For the tree shape prior, (Wu et al. 2017) work can be employed. For the connected subgraph prior, (Chen and Zhou 2016), and (Hegde, Indyk, and Schmidt 2015) works can be employed. Without any graph anomaly priors, our method can not guarantee the near-optimal solution. Query-based matching pursuit algorithm (Query-map) The specific-shape graph anomaly in attributed graphs can be considered as a sparse signal (Chen and Zhou 2016). In referring to matching pursuit techniques in compressive sensing works, the main idea in this paper is summarized as: a) from the gradient, getting the most specific-shape anomalous vertex set A := arg max A M(Q) Aϕ 1 (e.g., 2||x w||1, i.e., aggregation function) where ( Aϕ)v = ( ϕ)v for v A, and ( Aϕ)v = 0 otherwise; b) from the previous solution x, with Ω:= A supp(x), getting the vector b minimizing ϕ over Ω; c) from M(Q), pruning b into the new better solution x; and the three procedures are repeated until the halting condition holds. The final solution x implies the underlying specific-shape graph anomaly. The proposed approach to Problem (1) is illustrated in Algorithm 1. Query-map is an iterative selection method for approximately optimizing the nonlinear cost function ϕ in Problem (1). The method generates a sequence of intermediate specific shape s-sparse vectors x0, x1, . . . from a start approximation x0 = 0 (i.e., 0 Rn). At the i-th iteration, Step 4, g = ϕ(xi), computes the gradient of ϕ at the current solution xi. Step 5, A = P(g), selects the best specific shape vertex set A of the restricted gradient g A leading to the best approximation to g. The specific shape projection Algorithm 1: Query-map 1 Pick the specific shape anomaly query Q, attributed graph G; 2 Set i = 0, xi = 0; 4 g = ϕ(xi); 5 A = P(g); Specific Shape Projection Oracle 6 Ω= A supp(xi); 7 b = arg minx Rn ϕ(x) s.t. xΩc = 0; 8 B = P(b); Specific Shape Projection Oracle 9 xi+1 = b B; 10 i = i + 1; 11 until halting condition holds; 12 return the specific shape vertex set B; oracle, P(g), projects the gradient vector g onto the nearest point vector g A in the specific shape graph-structured sparsity model M(Q) in Problem (2). P(g) = arg min A M(Q) g g A 1 (2) We define in detail the specific shape graph-structured sparsity model M(Q). A graph C is a subgraph of G, denoted as C G, if VC VG, EC EG and (u, v) EC, u, v VC. A graph C is isomorphic to a query graph Q, denoted as C = Q, if there is a bijection ψ : VC VQ such that, for every pair of vertices u, v VC, (u, v) EC if and only if (ψ(u), ψ(v)) EQ. Therefore, M(Q) := {VC | C G, C = Q} represents the specific shape vertex sets of interest in the attributed graph G, whose corresponding subgraphs are isomorphic to the specific shape anomaly query graph Q. Obviously, Problem (2) is equal to the subgraph matching problem of arg max A M(Q) g A 1, where g implies a node-weighted graph. The target is to construct a maximum weight specific shape subgraph in the node-weighted graph. The subgraph matching problem is well studied in the graph matching field (Yang et al. 2016; 2014). In Steps 5 and 8, our approach can encompass the previously studied subgraph mathcing methods or projection projection-pursuit oracles. The main work of our paper is that we proposed a generic approach to optimize a broad class of nonlinear cost function via specific shape constraints in attributed graphs. In Step 6, the set, Ω= A supp(xi), is chosen as the support. Pursuing the minimization of a nonlinear cost function in Ωwill be most effective. In Step 7, we find a vector b with this support, and b minimizes the nonlinear cost function. In Step 8, the specific shape projection oracle, P(b), projects the vector b onto the desired specific shape vertex set B. In Step 9, prune b into the new better solution xi+1 by the set B. The above steps are repeated until the halting criterion is satisfied. The natural halting criterion is xi+1 = xi. In practice there are two popular options to define the halting criterion: (1) the difference between the current ϕ and the previous one is less than a threshold |ϕ(xi) ϕ(xi+1)| < ϵ; and (2) the difference between the current x and the previous one is less than a threshold ||xi xi+1|| < ϵ (e.g., ϵ = 0.01). Our approach connects to the existing works. In the special case where the specific shape projection oracles are the head and tail approximations for connected subgraph sparsity signals, Query-map reduces to Graph-MP (Chen and Zhou 2016) for the connected subgraph anomaly detection. Specifically, the support set selection in Step 6 is tuned by the gradient descent parameter η for Ω= supp(xi η Aϕ(xi)), and the specific shape projection oracles consider just the tree-shape graph anomaly. Query-map reduces to Graph-TPP (Wu et al. 2017) for the tree-shape subgraph anomaly detection. Theoretical Analysis In this section, for our proposed algorithm Query-map, we analyze its theoretical properties on the two aspects: 1) Studying the convergence rate; and 2) time complexity of Query-map. Before obtaining the theoretical properties, we require the following key technical condition under which the convergence of Query-map is guaranteed. Without loss of generality, we assume the cardinality of the vertex set of specific shape query graph to s = |VQ|. Definition 1. Weak Restricted Strong Convexity (WRSC) condition for the objective function ϕ (Wu et al. 2017; Chen and Zhou 2016; Yuan, Li, and Zhang 2014). As M(Q) is the vertex sets of attributed graphs, if S M(Q) with cardinality |S| 4s and y, z Rn with supp(y) supp(z) S, the following inequality holds for some ξ > 0 and 0 < δ4s < 1. The objective function ϕ has the technical condition (ξ, δ4s, M(Q))-WRSC. ||y z ξ Sϕ(y) + ξ Sϕ(z)|| δ4s||y z|| (3) where Sϕ(y) is a restriction of ϕ(y) in S: we have ( Sϕ(y))v = ( ϕ(y))v for v S, and ( Sϕ(y))v = 0 otherwise. The WRSC condition is derived from the Restricted Strong Convexity/Smoothness (RSC/RSS) conditions (Yuan, Li, and Zhang 2014). The RSC condition is first characterized for the functions have quadratic bounds on the derivative of the objective function. Then a function must have the WRSC condition if the function has the RSC or RSS conditions (Yuan, Li, and Zhang 2014). Now we analyze the convergence property of Query-map. We make a simple observation about Query-map that the sequence solution {xi} defined by Query-map is eventually periodic on a vertex set S, due to the fact that the limit of ϕ in the support vertex set of size 4s is exactly achieved after a finite number of iterations. Theorem 1. Query-map Convergence Rate. Consider a nonlinear cost function ϕ : Rn 7 R that satisfies the condition (ξ, δ4s, M(Q))-WRSC. Given the prior query graph Q, the attributed graph G, let x be the true specific shape vector: x Rn, ϕ(x) ϕ(x ) subject to supp(x), supp(x ) M(Q), the sequence solution {xi} defined by Query-map holds ||xi+1 x ||2 α||xi x ||2 + β|| Iϕ(x )||2 (4) where the parameters are specified as α = 4 q δ4s 1 δ4s , β = ξ 1 δ4s [ 4 1 2δ4s + 2(1 2δ4s) δ4s δ2 4s + 2], and the vertex set I = arg max S M(Q) || Sϕ(x )||2. We ensure α < 1, when selecting δ4s < 1 17. We analyze the constant || Iϕ(x )||2, and the parameters, α β, impact on the convergence to the near-optimal x for the sequence solution {xi}. The following result is an immediate corollary achieved by Theorem 1. Corollary 1.1. Estimation Error Level. For the specific shape anomaly detection (i.e., isomorphism to Q), the vector x is calibrated to x [0, 1]n where v corresponds to the desired anomaly node in G if xv = 0, and v is not anomaly node otherwise. For the i-th solution xi, and the true specific shape vector x , we have xi x 2 β 1 α Iϕ(x ) 2 (5) Iϕ(x ) 2 1 α Before reaching the estimation error level (5), Query-map geometrically converges to the near-optimal x . The estimation error of Query-map is determined by the multipliers of || Iϕ(x )||2. Especially, when || Iϕ(x )||2 = 0, Querymap guarantees that the true x is exactly recovered within finite iterations. The shrinkage rate α relates to the parameter δ, where δs δ2s δ4s (Tropp and Needell 2008). The smaller δ is attained, and the faster convergence rate of Query-map is achieved. In this work, as s(1 α)/β in the upper bound (6) is a small constant, we consider the case where || Iϕ(x )|| is a sufficiently small constant, such that ||xi+1 x || ||xi x ||. Thus the speed of convergence of our algorithm Query-map mainly depends on the shrinkage rate α for || Iϕ(x )|| is a sufficiently small constant. Theorem 2. Query-map Time Complexity. The absolute upper bound to the estimation error ||xi x || is 2s for xi, x [0, 1]n. Within an approximate estimation error, we obtain a near-optimal solution ˆx [0, 1]n, such that ||x ˆx||2 ||x ||2 + β/(1 α) || Iϕ(x )||2, subject to, supp(ˆx) M(Q). The time complexity of Query-map is O T log(1 / || Iϕ(x )||2) (7) where for each iteration, the time T consists of two parts: 1) One execution of the subproblem in Line 5 and 8; and 2) one execution of the subproblem in Line 7. The estimation error, ||x ||2+β/(1 α) || Iϕ(x )||2 , is a tighter bound for the estimation error level (5). The total time complexity relates to which well-studied oracle is employed to the projection pursuit for specific shape anomaly query. Our approach in Algorithm 1 reaches a near-linear time complexity with linear projection pursuit oracles. We observe that M(Q) is not a convex set. The optimization methods, such as Frank-Wolfe, can not be directly applied to the specific shape anomalous subgraph discovery based problem as studied in this paper before. We first proposed an approach to optimize nonlinear functions in specific shape anomaly queries with the better theoretical basis. Application to Well-known Objective Function The nonlinear cost function ϕ in Problem (1) can be applied to least square function, Kulldorff s original Poisson scan statistic (KULL) and Expectation-based Poisson statistic (EBP) (Neill 2009b). Least square is one of the most popular models in regression analysis that finds specific-shape sets of best fit for the attributed graphs (Chen and Zhou 2016). In this model, given the attribute vector w Rn, the least square function is optimized via specific-shape constraints in attributed graphs. min x Rn w x 2 2 s.t. supp(x) M(Q) It is well-known that the least square function is strongly convex. Its Bregman divergence ( ϕ := ϕ(x) ϕ(y) < ϕ(y), x y >) is x y 2 2. We obtain that the least square function satisfies the condition (ξ, δ4s, M(Q))- WRSC that δ4s = 1 2ξ for ξ < 1 (Yuan and Liu 2014). For ensuring that Query-map converges to the near-optimal solution, the parameter ξ ranges between 8/17 and 1. In this work, there are the other two well-known graph scan statistics functions: KULL and EBP that are widely employed in pattern detection in graphs. For the statistics, there are just two attributes, observed count and expected count (Neill 2009a). Let c = W:,0 Rn denote the observed count attribute values. Similarly, let d = W:,1 Rn denote the expected count attribute values. The log forms of EBP and KULL are examined in the following functions. ϕEBP (x) := x T c log(x T c / x T d) x T d + x T c ϕKULL(x) := x T c log(x T c / x T d) (1 x)T c log (1 x)T c / (1 x)T d The forms of EBP and KULL are derived from the statistics F(VC) in (Neill 2009a) that is formulated as the detection problem: min C G F(VC) s.t. VC M(Q). Experiments Our experiments consist of two parts: (i) uncovering high quality specific shape attacking anomalies in the real-world edu.cn network dataset by our method; and (ii) demonstrating the efficiency of our method on different applications. Uncovering High Quality Specific Shape Attacking Anomaly Dataset. Real-World edu.cn Network Dataset. An Internet security company1 provided us with the total 3,978,073 web sites browsing logs from May 31, 2014 to May 13, 2015. The real-world traffic network of 131,107 nodes and 358,386 edges, is built from the browsing logs (i.e., the edge (IP site A, IP site B) denotes that A visited B). For a day t and a node v in this network, we denote the number of logs within v on that day t as the observed value cv, and the average number of logs within v before t as the expected value dv. 1An Internet security company in China with more than 0.6 billion users. Our method. Query-map employs the graph scan statistic ϕEBP as the objective function to detect specific shape attack subgraphs in real networks. Result. In Figure 2, given the query graphs Q1, Q2 and Q3, choose a random day (i.e., March 13, 2015) from the period of Jan 1, 2015 and May 13, 2015. The specific shape attacking anomalies are presented in Figure 2. The right subfigure of Figure 2 denotes the star-chain-shape anomaly queries. The middle subfigure of Figure 2 presents an attacking sub-network. The left subfigure of Figure 2 shows the specific shape cyber attacking cases. All of the detected sites with our method are identified as true attacking sources by the experts from the Internet security company. (1) Specific shape anomaly Q1. The user x.x.223.66 attacking the server xkb.hlu.edu.cn is categorized to Fck Editor , Upload Webshell and Common Vulnerability attacks. This user attacking the servers tw.hlu.edu.cn, zsb2.hlu.edu.cn and kydown.hlu.edu.cn with the same approach is categorized to Upload Webshell attacks. (2)Specific shape anomaly Q2. The server www.hlu.edu.cn is attacked from the users x.x.78. 34 by SQL Inject , x.x.3.69 and x.x.103.149 by Dedecms and Common Vulnerability . The user x.x.103.149 also attacked the servers xkb.hlu.edu.cn and tw.hlu.edu.cn with Dedecms and Common Vulnerability attacks. (3)Specific shape anomaly Q3. The user x.x.223.66 attacked the server www.hlu.edu.cn with Fck Editor , Upload Webshell and Common Vulnerability approaches. These attacks caused this server to be a bot machine. This infected server attacked the server www.cq51edu.cn with Common Vulnerability . The user x.x.32.220 attacked the server www. cq51edu.cn with Upload Webshell and Common Vulnerability approaches. The user x.x.21.210 attacked the server www.cq51edu.cn with the nginx parse approach. From the results, we can observe that the specific shape anomalies are exactly detected from the network. Efficiency Comparison of Query-map and Baseline Datasets: 1) Water Pollution Dataset. By the chemical contaminant plumes are distributed at 4 nodes within different areas (Chen and Zhou 2016), we collected the realworld water pollution network of 12,527 nodes and 14,831 edges. The network is constructed by the K-Nearest Neighbor (KNN) algorithm. The spreads of contaminant plumes were simulated using the water network simulator EPANET for 8 hours (Chen and Zhou 2016). For the observed count attribute, the value at each node v is collected from the corresponding sensor per hour, cv 1 if it is polluted and cv 0 otherwise. For testing the robustness of methods to noises, we randomly flipped K percent sensor binary values, where K {2, 4, 6, 8, 10}. The noise ratio dv K% is considered as the expected count attribute (Chen and Zhou 2016). 2) Respiratory Emergency Department (ED) Dataset. In a grid network of 10,000 nodes and 14,850 edges, for each node v, we collected the T day period of respiratory ED visit data Wv RT (e.g., T = 28, and the time t = 0 denotes the current day). The outbreak linearly increases in cases over the outbreak duration (Neill 2009a). During non-outbreak period, the number of patients visiting ED in v is Wt v Poisson(µ) for t = 0, , T where µ {1, , 34} de- notes the expected number in v on that days. During outbreak period, we randomly select the outbreak duration of U from {1, , 7}, normalizes the weight wv P t Wt v so that the total weight is equal to 1 in infected nodes, and set the outbreak severity (e.g., =800) (Neill 2009a). On each day t {0, , U}, we inject cases into each infected node v i.e., Wt v Wt v + Poisson((T t)wv ) for t = 0, , U (medium-size outbreaks injected for 10 percent nodes (Neill 2009a)). For testing the robustness of methods, we flipped values of K {2, 4, 6, 8, 10} percent nodes randomly, i.e., no inject outbreak cases if the nodes are infected, inject outbreak cases otherwise. Let cv = W0 v denote the observed count of infected cases, and dv = 1 T PT t=1 Wt v denote the expected count. The datasets are summarized in Table 2. Comparison Methods. We compared our method Query-map with the two state-of-the-art baselines: Topk (Gupta et al. 2014) and Fast-k (Yang et al. 2016). The baselines are designed specifically for specific shape anomaly discovery in attributed graphs. The baselines aim to obtain top k subgraphs with the maximal sum of its node scores, which are isomorphic to the query graph. The parameters k and D are tuned completely based on the author recommendation in the original papers for k = 10, D = 2 to Topk (Gupta et al. 2014) and k = 20, d = 2 to Fast-k (Yang et al. 2016). Let w(v) cv for each vertex v VG. Performance Metrics. 1) Precision. We compute the precision of the target subgraph (i.e., the ratio of the number of correct anomalous nodes and the number of nodes). The recall metric is ignored for the fixed size of returned target subgraph. 2) Function Score and Running Time. The optimization power of our method is examined in the scores of graph scan statistics. We compare our method to the baselines on running times. Results: Precision for target subgraphs detection. Figure 3 illustrates the precisions obtained by our methods Query-map (EBP and KULL) corresponding to red bars and cyan bars, and the competitive methods Fast-K and Top-K corresponding to green bars and blue bars. For the 2% noise level, Figure 3 (a) and (c) illustrate that our methods Querymap (KULL) for Water Pollution dataset and Query-map (EBP) for Emergency dataset outperformed all the baselines on the precision. For the Water Pollution dataset in Figure 3 (a), our methods and baselines perform similar results. Our methods recovered at least 99.5% true target subgraphs on the anomaly queries Q2 and Q3. With the query graph expanding in Figure 3(c), the precisions of baselines decrease quickly, however the precisions of our method Query-map (EBP) even increase to 1.0. Especially for the query graph Q3, we can observe that the precisions of our methods are greater than at least 0.67 for the best baseline. For the 10% noise level, the most results in Figure 3(b) and (d) indicate that our methods outperform the baselines. In Figure 3(b), the precisions of baselines decrease with the query graph size. However, even at the 10% noise level, the precisions of our methods still increase to 1.0. For the Emergency dataset in Figure 3(d), although our Query-map (KULL) is little better than the best baselines, our Query-map (EBP) achieves at least 0.27 improvement in the precision larger than the best X.X.103.149 zsb2.hlu.edu.cn xkb.hlu.edu.cn kydown.hlu.edu.cn www.cq51edu.cn tw.hlu.edu.cn www.hlu.edu.cn Query Graphs Cyber Attack Cases Computer Network xkb.hlu.edu.cn kydown.hlu.edu.cn zsb2.hlu.edu.cn tw.hlu.edu.cn www.hlu.edu.cn xkb.hlu.edu.cn X.X.103.149 tw.hlu.edu.cn www.hlu.edu.cn X.X.103.149 tw.hlu.edu.cn zsb2.hlu.edu.cn X.X.21.210 www.cq51edu.cn X.X.223.66 Figure 2: The cyber attack cases on March 13, 2015 are detected by our method Query-map (EBP) in the edu.cn real-world network dataset (red solid nodes denote the attacking or attacked sites, and blue solid nodes denote normal nodes). Table 2: Summary of datasets (WP: Water Pollution; ED: Emergency Department). Attributed Networks Interesting Anomaly Queries Data set Attribute cv (i.e., observed value) for each node v Attribute dv (i.e., expected value) for each node v Q1 query graph Q2 query graph Q3 query graph WP 12,527 14,831 Sensor value (0 or 1) Noise level ED 10,000 14,850 Number of patient visits Average number of visits baseline. Although Fast-K and Top-K performed not bad in Figure 3(a), these heuristic algorithms do not have any theoretical guarantees on identifying specific shape anomaly subgraphs. From the results in Figure 3, we can observe that our methods not only have a significant theoretical property, but also always have a high precision. In Table 2, the graph anomaly shapes Q1, Q2 and Q3 in Water Pollution dataset are different from graph anomaly shapes in Emergency Department dataset for the pollution area showing strip shapes and the outbreak area showing star shapes (Wu et al. 2017). Query-map achieved the precisions for strip anomaly shapes in Water Pollution data and star anomaly shapes in Emergency Department data illustrated in Figure 3. The Top-K is a path index-based method (Gupta et al. 2014), and the Fast-K is a heuristic method for assembling star components. The baselines tend to perform better on Water Pollution data with strip shape graph anomalies, however, perform not better on Emergency Department data with star shape graph anomaly. Evolving curves of graph scan statistics. Figure 4(a-b) report the graph scan statistic (EBP) scores of detected subgraphs for the {0, , 9} iterations. The results in Figure 4(a-b) illustrate that our method Query-map (EBP) converges in less than 10 iterations. Especially in Figure 4(a) for the Water Pollution dataset, Query-map (EBP) converges in less than 3 iterations. The empirical results show the fast convergence trends of our methods Query-map. Scalability analysis of running time. Table 3 reports the comparison between our methods Query-map and the competitive baseline methods on the running time. In Table 3, the running times were collected from the computer with Intel Xeon E3-1220 (e.g., 4 CPU, 3.1 GHz) and 24GB RAM. The results in Table 3 show that our proposed method Querymap ran faster than all the baseline methods in most settings, except for the specific shape anomaly graph Q1. Even though the baseline method Fast-K ran the fastest over the query graph Q1, this method can not detect the target subgraph with high qualities in Figure 3(b-d). In Emergency data, for the Q3, our methods returned results within 97 seconds, however, Top-K did not output any results within 7 hours. The method can performs on the edu.cn data with 131,107 nodes. Results in Table 3 also imply that the running time of our methods is insensitive to the noise level, which is consistent with the time complexity of Query-map as discussed in Theorem 2. This paper presents an efficient algorithm to optimize nonlinear functions subject to specific shape anomaly. Our approach is guaranteed to the near-optimal solution under the estimation error upper bound of ||x ||2 + β/(1 α) || Iϕ(x )||2 . A wide variety of attributes can be employed to the specific shape anomaly discovery in graphs. We will extend Query-map on the best-effort match to the specific shape graph anomaly. A1. Proof of Theorem 1 We first introduce Lemma 3 to bound residues between the solution xi and the optimal x . Lemma 3. Let ri = xi x . Given A M(Q), we have Table 3: Efficiency Measure: Comparison on the running times of our methods and the baselines. Water Pollution Data Emergency Data Run Time (second) 2 % noise 10 % noise 2 % noise 10 % noise Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q1 Q2 Query-map (EBP) 7.99 13.26 22.57 8.18 14.25 21.92 0.95 0.98 0.73 0.74 Query-map (KULL) 7.90 12.81 22.22 7.10 12.57 19.21 0.81 0.82 0.35 0.36 Fast-K 2.37 34.03 86.13 2.27 46.83 190.54 0.47 73.41 0.47 72.32 Top-K 33.27 389.69 1,343.22 56.62 660.79 2,313.82 315.99 6,433.98 389.73 6,640.92 Q1 Q2 Q3 (a) Water Pollution (2% noise) Fast-K Top-K Query-map (EBP) Query-map (KULL) Q1 Q2 Q3 (b) Water Pollution (10% noise) Q1 Q2 Q3 (c) Emergency Data (2% noise) Q1 Q2 Q3 (d) Emergency Data (10% noise) Figure 3: Effective Validation: Precision comparisons of our methods Query-map (EBP) and Query-map (KULL), and the baseline methods Fast-K and Top-K. 0 1 2 3 4 5 6 7 8 9 iteration (a) Water Pollution Q1 2% noise Q2 2% noise Q3 2% noise Q1 10% noise Q2 10% noise Q3 10% noise 0 1 2 3 4 5 6 7 8 9 iteration 18 (b) Emergency Data Figure 4: Efficiency Measure: Evolving curves of graph scan statistic scores (EBP) for our method (Query-map) in different iterations. These scores are evaluated on these two datasets with 2% and 10% noises and the query graphs. the following inequality. ||ri Ac|| 2 q δ4s δ2 4s||ri||+ 2ξ 1 2δ4s + (1 2δ4s)ξ p || Iϕ(x )|| (8) where I = arg max S M(Q) || Sϕ(x )||2. Now, we present the proof of Theorem 1. Proof. Similarly, we denote ri+1 = xi+1 x . We have ||(xi+1 b) + (b x )||2 ||xi+1 b||2 + ||b x ||2 by the triangular inequality property. In Algorithm 1, by Line 8, we have ||xi+1 b||2 = ||b B b||2. According to Problem (2), b B is restricted to the largest elements of b by the projection oracle P, and thus ||b B b||2 ||x b||2. The upper bound of ||ri+1||2 is 2||x b||2, for the deduction of ||ri+1||2 = ||xi+1 x ||2 ||xi+1 b||2 + ||x b||2 = ||b B b||2 + ||x b||2 2||x b||2. Compute the upper bound of ||(x b)Ω||2 2 =< b x , (b x )Ω> =< b x ξ Ωϕ(b) + ξ Ωϕ(x ), (b x )Ω> < ξ Ωϕ(x ), (b x )Ω> ||b x ξ Ωϕ(b) + ξ Ωϕ(x )||2||(b x )Ω||2+ ξ|| Ωϕ(x )||2||(b x )Ω||2 δ4s||b x ||2||(b x )Ω||2 + ξ|| Ωϕ(x )||2||(b x )Ω||2 where the second equality is derived from Ωϕ(b) = 0 for the function ϕ is minimized at b over the set Ωin Line 7 of Algorithm 1. The last inequality is derived from the condition (ξ, δ4s, M(Q))-WRSC. Thus the upper bound of ||(x b)Ω||2 is δ4s||b x ||2 + ξ|| Ωϕ(x )||2. For the sets Ωand A, we denote the complement sets are Ωc = VG \ Ωand Ac = VG \ A. We have ||x b||2 ||(x b)Ω||2 + ||(x b)Ωc||2. By the upper bound of ||(x b)Ω||2, we have ||(x b)Ω||2 δ4s||x b||2 + ξ|| Ωϕ(x )||2 + ||(x b)Ωc||2. We have ||x b||2 ||(x b)Ωc||2 1 δ4s + ξ|| Ωϕ(x )||2 1 δ4s . By supp(b) Ωat Line 7 and supp(xi) Ωat Line 6 of Algorithm 1, we have ||(x b)Ωc||2 1 δ4s + ξ|| Ωϕ(x )||2 1 δ4s = ||x Ωc||2 1 δ4s + ξ|| Ωϕ(x )||2 ||(x xi)Ωc||2 1 δ4s + ξ|| Ωϕ(x )||2 1 δ4s = ||ri Ωc||2 1 δ4s + ξ|| Ωϕ(x )||2 Ωc Ac, we have ||ri Ωc||2 1 δ4s + ξ|| Ωϕ(x )||2 1 δ4s ||ri Ac||2 1 δ4s + ξ|| Iϕ(x )||2 1 δ4s . At last we obtain the upper bound as follows ||x b||2 ||ri Ac||2 / (1 δ4s)+ξ|| Iϕ(x )||2 / 1 δ4s Combing ||ri+1||2 2||x b||2, the above inequality, and Lemma 3, we finish proving this theorem. A2. Proof of Theorem 2 Proof. As xi, x [0, 1]n and |supp(xi)|, |supp(x )| s, the absolute upper bound to ||xi x || is 2s. For the small constant || Iϕ(x )||2, we present the acceptable upper bound ||x ||2 + β/(1 α) || Iϕ(x )||2. By Inequality (4), at the i-th iterate of Algorithm 1, we obtain the bound ||x xi||2 αi||x ||2 + β 1 α|| Iϕ(x )||2. Let αi||x ||2 + β 1 α|| Iϕ(x )||2 ||x ||2 + β/(1 α) || Iϕ(x )||2. After log( 1 || Iϕ(x )||2 )/ log 1 α iterations, the estimate ˆx satisfies ||x ˆx||2 ||x ||2+β/(1 α) || Iϕ(x )||2. As the time complexity is T for one iteration, the overall time complexity of Query-map follows the result in Theorem 2. Acknowledgments This work was supported by the National Key R&D Program of China (No. 2018YFC0809800), and partly supported by the NSF grant (No. IIS-1815696). The corresponding author is Wenjun Wang. References Akoglu, L.; Tong, H.; and Koutra, D. 2015. Graph based anomaly detection and description: a survey. Data mining and knowledge discovery 29(3):626 688. Chen, F., and Zhou, B. 2016. A generalized matching pursuit approach for graph-structured sparsity. In IJCAI, 1389 1395. Cordella, L. P.; Foggia, P.; Sansone, C.; and Vento, M. 2004. A (sub) graph isomorphism algorithm for matching large graphs. IEEE TPAMI 26(10):1367 1372. Gupta, M.; Gao, J.; Yan, X.; Cam, H.; and Han, J. 2014. Topk interesting subgraph discovery in information networks. In ICDE, 820 831. IEEE. Hegde, C.; Indyk, P.; and Schmidt, L. 2015. A nearly-linear time framework for graph-structured sparsity. In ICML, 928 937. Huan, J.; Wang, W.; and Prins, J. 2003. Efficient mining of frequent subgraphs in the presence of isomorphism. In ICDM, 549 552. IEEE. Kulldorff, M. 1997. A spatial scan statistic. Communications in Statistics-Theory and methods 26(6):1481 1496. Neill, D. B. 2009a. An empirical comparison of spatial scan statistics for outbreak detection. International journal of health geographics 8(1):1. Neill, D. B. 2009b. Expectation-based scan statistics for monitoring spatial time series data. International Journal of Forecasting 25(3):498 517. Tropp, J. A., and Needell, D. 2008. Cosamp: Iterative signal recovery from incomplete and inaccurate samples. Co RR abs/0803.2392. Wu, N.; Chen, F.; Li, J.; Huai, J.; and Li, B. 2017. Querydriven discovery of anomalous subgraphs in attributed graphs. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017, 3105 3111. Yang, S.; Wu, Y.; Sun, H.; and Yan, X. 2014. Schemaless and structureless graph querying. Proceedings of the VLDB Endowment 7(7):565 576. Yang, S.; Han, F.; Wu, Y.; and Yan, X. 2016. Fast top-k search in knowledge graphs. In ICDE, 990 1001. Yuan, X.-T., and Liu, Q. 2014. Newton greedy pursuit: A quadratic approximation method for sparsity-constrained optimization. In CVPR, 4122 4129. Yuan, X.; Li, P.; and Zhang, T. 2014. Gradient hard thresholding pursuit for sparsity-constrained optimization. In ICML, 127 135. Zhang, X.; Martin, T.; and Newman, M. E. 2015. Identification of core-periphery structure in networks. Physical Review E 91(3):032803. Zou, L.; Chen, L.; and Lu, Y. 2007. Top-k subgraph matching query in a large graph. In the ACM first Ph.D. workshop in CIKM, 139 146. ACM.