# uncovering_specificshape_graph_anomalies_in_attributed_graphs__0221316b.pdf

The Thirty-Third AAAI Conference on Artiﬁcial Intelligence (AAAI-19)

Uncovering Speciﬁc-Shape Graph Anomalies in Attributed Graphs

Nannan Wu, Wenjun Wang, Feng Chen, Jianxin Li, Bo Li, Jinpeng Huai

College of Intelligence and Computing, Tianjin University, Tianjin 300072, China Dept. of Computer Science, University at Albany, SUNY, Albany, NY 12203 Dept. of Computer Science & Engineering, Beihang University, Beijing 100191, China {wunannan, lijx, libo}@act.buaa.edu.cn wwj@pku.org.cn fchen5@albany.edu huaijp@buaa.edu.cn

As networks are ubiquitous in the modern era, point anomalies have been changed to graph anomalies in terms of anomaly shapes. However, the speciﬁc-shape priors about anomalous subgraphs of interest are seldom considered by the traditional approaches when detecting the subgraphs in attributed graphs (e.g., computer networks, Bitcoin networks, and etc.). This paper proposes a nonlinear approach to speciﬁc-shape graph anomaly detection. The nonlinear approach focuses on optimizing a broad class of nonlinear cost functions via speciﬁc-shape constraints in attributed graphs. Our approach can be used to many different graph anomaly settings. The traditional approaches can only support linear cost functions (e.g., an aggregation function for the summation of node weights). However, our approach can employ more powerful nonlinear cost functions, and enjoys a rigorous theoretical guarantee on the near-optimal solution with the geometrical convergence rate.

Introduction In numerous network applications, the computer network data of interest consist of star-shape attacking subgraphs (Wu et al. 2017), and the political blog data of interest consist of core-periphery graph shape anomalies (Zhang, Martin, and Newman 2015). Anomalies appear in the real network applications in the form of speciﬁc-shapes rather than point shapes. In the cyber attack detection applications in Figure 1, the deep node (i.e., computer) color represents the higher transfer rate . Given the star-shape anomaly query in Figure 1, we formulate the following speciﬁc-shape constrained minimization problem to uncover the speciﬁc shape attack subgraph anomalies:

min x Rn ϕ(x) s.t. supp(x) M(Q). (1)

where ϕ : Rn 7 R is a powerful nonlinear cost function. An attributed graph G = (V, E, W) is comprised of a set of n vertices V = [v] = {1, , n}, an edge set E V V , and an attribute matrix W Rn p (p, the number of attributes). Given the speciﬁc-shape anomalous graph prior Q, let VQ denote the vertex set of Q, and M(Q) := {VC | C G, C = Q} be the family set of speciﬁc-shape vertex sets, whose

Copyright c 2019, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

Figure 1: The main idea of uncovering speciﬁc-shape graph anomalies by the powerful nonlinear function within the continuous space (Akoglu, Tong, and Koutra 2015). In the attributed graph, abnormal vertices come with red circles.

graphs {C} are subgraph of the attributed graph G, and are isomorphic to the speciﬁc-shape anomalous graph prior Q. The basic intuition for Problem (1) is that the cost function ϕ is minimized at the constrained x whose nonzero entry index set supp(x) := {v | xv = 0, v V } corresponds to the speciﬁc-shape vertex set VC from M(Q). To motivate this problem, in an attributed computer network G, we represent the attribute matrix W as transfer rate attributes. Let Wv := WT v,: be the p-dimensional attribute vector for the vertex v V in the attributed graph G, and W(i) := W:,i be the n-dimensional attribute vector for the attribute i [p]. Let W(0) denote as the vector w. Problem (1) for identifying cyber attacking scenarios can be formulated as the least square problem: minx Rn ||w x||2 2, subject to supp(x) M(Q) (Chen and Zhou 2016). Related work. Point anomalies (i.e., outliers) can be considered as a binary 0/1 classiﬁcation problem (Akoglu, Tong, and Koutra 2015). These approaches just assign what is called an outlierness score to each node. The speciﬁc-shape anomaly is not considered in these approaches. A number of approaches are proposed speciﬁcally for the graph anomaly detection problem. They can be directly classiﬁed as vector-based and graph-based approaches from the

Table 1: Comparison between our work and several representative works in graph anomaly detection. The nonzero entry index set of vector x implies a graph anomaly in the attributed graph, and s = |VQ|.

Related Work Target Solution Anomaly Shape Nonlinear Cost Function (Hegde, Indyk, and Schmidt 2015) s-sparse signal ||x||0 s Connected subgraph anomaly

(Chen and Zhou 2016) s-sparse signal ||x||0 s Connected subgraph anomaly

(Gupta et al. 2014) Subgraph C Speciﬁc-shape graph anomaly (Yang et al. 2016) Subgraph C Speciﬁc-shape graph anomaly (Wu et al. 2017) s-sparse signal ||x||0 s Tree-shape graph anomaly

Our work s-sparse signal ||x||0 s Speciﬁc-shape graph anomaly

target solution (e.g., in Table 1). In the vector-based approaches, a graph anomaly is considered as a sparse signal, and the true signal is recovered by the nonlinear cost functions (e.g., the norm-2 function). Graph-structured matching pursuit (Chen and Zhou 2016) and Graph-Co Sa MP (Hegde, Indyk, and Schmidt 2015) approaches are proposed to connected subgraph anomaly detection by general nonlinear functions. The graph tree projection pursuit approach (Wu et al. 2017) can employ the nonlinear cost functions, such as Kulldorff (Kulldorff 1997) and Expectation-based Poisson (EBP) graph scan statistics (Neill 2009b), to the tree shape graph anomaly detection. The speciﬁc-shape prior can be used to the powerful nonlinear cost function for anomaly detection in attributed graphs with network structures and vertex attributes. In the graph-based approaches, a graph anomaly is considered as a matching subgraph in attributed graphs, and the index is built on vertices and edges. Most traditional approaches (Cordella et al. 2004; Huan, Wang, and Prins 2003; Gupta et al. 2014; Yang et al. 2016; 2014; Zou, Chen, and Lu 2007) can employ just linear cost functions for the index structure. Those approaches can not handle the nonlinear anomalies exhibited in attributed graphs. In Table 1, our work can employ more powerful nonlinear cost functions than the graph-based approaches (Gupta et al. 2014; Yang et al. 2016). The other vector-based approaches (Hegde, Indyk, and Schmidt 2015; Chen and Zhou 2016) consider just the connected subgraph anomaly without the speciﬁc shape anomaly prior. Although the work (Wu et al. 2017) considers the tree-shape graph anomaly, the other complex speciﬁc shapes can not be handled in this approach. Our work aims to develop a nonlinear approach to speciﬁc shape sparsity anomaly structure via attributed graphs. Our approach has three main features: (a) Generality: the approach can encompass several previously studied subgraph matching methods and projection-pursuit oracles (e.g., model-projection algorithms). (b) Theoretical basis: the approach achieves the near-optimal solution at a constant error bound with the geometrically convergence rate. (c) Computational efﬁciency: we present a near-linear time algorithm for Problem (1). The main contributions of our work are summarized as follows:

Develop an efﬁcient nonlinear approach to graph anomalies. Given the speciﬁc shape anomaly query graph Q, a new and efﬁcient approach, namely, Query-based matching pursuit (Query-map), is developed to optimize a nonlinear function over the speciﬁc shape graphstructured sparsity model M(Q).

Theoretical basis. The proposed approach, Query-map, achieves the near-optimal solution at a constant error bound with the geometrically linear convergence rate. The cost function ϕ satisﬁes a weaker condition than the popular strong condition such as Restricted Strong Convexity/Smoothness (RSC/RSS). Comprehensive experiments to verify the proposed approach on the real datasets. The powerful graph scan statistic nonlinear functions are employed in our approach, Query-map, for the task of speciﬁc shape graph anomaly detection. The extensive experiments on the real datasets show that Query-map performs competitively with a variety of representative methods for the graph anomaly detection. This work aims to study the graph anomaly with a speciﬁc shape prior. For the tree shape prior, (Wu et al. 2017) work can be employed. For the connected subgraph prior, (Chen and Zhou 2016), and (Hegde, Indyk, and Schmidt 2015) works can be employed. Without any graph anomaly priors, our method can not guarantee the near-optimal solution.

Query-based matching pursuit algorithm (Query-map) The speciﬁc-shape graph anomaly in attributed graphs can be considered as a sparse signal (Chen and Zhou 2016). In referring to matching pursuit techniques in compressive sensing works, the main idea in this paper is summarized as: a) from the gradient, getting the most speciﬁc-shape anomalous vertex set A := arg max A M(Q) Aϕ 1 (e.g., 2||x w||1, i.e., aggregation function) where ( Aϕ)v = ( ϕ)v for v A, and ( Aϕ)v = 0 otherwise; b) from the previous solution x, with Ω:= A supp(x), getting the vector b minimizing ϕ over Ω; c) from M(Q), pruning b into the new better solution x; and the three procedures are repeated until the halting condition holds. The ﬁnal solution x implies the underlying speciﬁc-shape graph anomaly. The proposed approach to Problem (1) is illustrated in Algorithm 1. Query-map is an iterative selection method for approximately optimizing the nonlinear cost function ϕ in Problem (1). The method generates a sequence of intermediate speciﬁc shape s-sparse vectors x0, x1, . . . from a start approximation x0 = 0 (i.e., 0 Rn). At the i-th iteration, Step 4, g = ϕ(xi), computes the gradient of ϕ at the current solution xi. Step 5, A = P(g), selects the best speciﬁc shape vertex set A of the restricted gradient g A leading to the best approximation to g. The speciﬁc shape projection

Algorithm 1: Query-map

1 Pick the speciﬁc shape anomaly query Q, attributed graph G;

2 Set i = 0, xi = 0;

4 g = ϕ(xi);

5 A = P(g); Speciﬁc Shape Projection Oracle

6 Ω= A supp(xi);

7 b = arg minx Rn ϕ(x) s.t. xΩc = 0;

8 B = P(b); Speciﬁc Shape Projection Oracle

9 xi+1 = b B;

10 i = i + 1;

11 until halting condition holds;

12 return the speciﬁc shape vertex set B;

oracle, P(g), projects the gradient vector g onto the nearest point vector g A in the speciﬁc shape graph-structured sparsity model M(Q) in Problem (2).

P(g) = arg min A M(Q) g g A 1 (2)

We deﬁne in detail the speciﬁc shape graph-structured sparsity model M(Q). A graph C is a subgraph of G, denoted as C G, if VC VG, EC EG and (u, v) EC, u, v VC. A graph C is isomorphic to a query graph Q, denoted as C = Q, if there is a bijection ψ : VC VQ such that, for every pair of vertices u, v VC, (u, v) EC if and only if (ψ(u), ψ(v)) EQ. Therefore, M(Q) := {VC | C G, C = Q} represents the speciﬁc shape vertex sets of interest in the attributed graph G, whose corresponding subgraphs are isomorphic to the speciﬁc shape anomaly query graph Q. Obviously, Problem (2) is equal to the subgraph matching problem of arg max A M(Q) g A 1, where g implies a node-weighted graph. The target is to construct a maximum weight speciﬁc shape subgraph in the node-weighted graph. The subgraph matching problem is well studied in the graph matching ﬁeld (Yang et al. 2016; 2014). In Steps 5 and 8, our approach can encompass the previously studied subgraph mathcing methods or projection projection-pursuit oracles. The main work of our paper is that we proposed a generic approach to optimize a broad class of nonlinear cost function via speciﬁc shape constraints in attributed graphs. In Step 6, the set, Ω= A supp(xi), is chosen as the support. Pursuing the minimization of a nonlinear cost function in Ωwill be most effective. In Step 7, we ﬁnd a vector b with this support, and b minimizes the nonlinear cost function. In Step 8, the speciﬁc shape projection oracle, P(b), projects the vector b onto the desired speciﬁc shape vertex set B. In Step 9, prune b into the new better solution xi+1 by the set B. The above steps are repeated until the halting criterion is satisﬁed. The natural halting criterion is xi+1 = xi. In practice there are two popular options to deﬁne the halting criterion: (1) the difference between the current ϕ and the previous one is less than a threshold |ϕ(xi) ϕ(xi+1)| < ϵ; and (2) the difference between the current x and the previous

one is less than a threshold ||xi xi+1|| < ϵ (e.g., ϵ = 0.01). Our approach connects to the existing works. In the special case where the speciﬁc shape projection oracles are the head and tail approximations for connected subgraph sparsity signals, Query-map reduces to Graph-MP (Chen and Zhou 2016) for the connected subgraph anomaly detection. Speciﬁcally, the support set selection in Step 6 is tuned by the gradient descent parameter η for Ω= supp(xi η Aϕ(xi)), and the speciﬁc shape projection oracles consider just the tree-shape graph anomaly. Query-map reduces to Graph-TPP (Wu et al. 2017) for the tree-shape subgraph anomaly detection.

Theoretical Analysis

In this section, for our proposed algorithm Query-map, we analyze its theoretical properties on the two aspects: 1) Studying the convergence rate; and 2) time complexity of Query-map. Before obtaining the theoretical properties, we require the following key technical condition under which the convergence of Query-map is guaranteed. Without loss of generality, we assume the cardinality of the vertex set of speciﬁc shape query graph to s = |VQ|. Deﬁnition 1. Weak Restricted Strong Convexity (WRSC) condition for the objective function ϕ (Wu et al. 2017; Chen and Zhou 2016; Yuan, Li, and Zhang 2014). As M(Q) is the vertex sets of attributed graphs, if S M(Q) with cardinality |S| 4s and y, z Rn with supp(y) supp(z) S, the following inequality holds for some ξ > 0 and 0 < δ4s < 1. The objective function ϕ has the technical condition (ξ, δ4s, M(Q))-WRSC.

||y z ξ Sϕ(y) + ξ Sϕ(z)|| δ4s||y z|| (3)

where Sϕ(y) is a restriction of ϕ(y) in S: we have ( Sϕ(y))v = ( ϕ(y))v for v S, and ( Sϕ(y))v = 0 otherwise. The WRSC condition is derived from the Restricted Strong Convexity/Smoothness (RSC/RSS) conditions (Yuan, Li, and Zhang 2014). The RSC condition is ﬁrst characterized for the functions have quadratic bounds on the derivative of the objective function. Then a function must have the WRSC condition if the function has the RSC or RSS conditions (Yuan, Li, and Zhang 2014). Now we analyze the convergence property of Query-map. We make a simple observation about Query-map that the sequence solution {xi} deﬁned by Query-map is eventually periodic on a vertex set S, due to the fact that the limit of ϕ in the support vertex set of size 4s is exactly achieved after a ﬁnite number of iterations.

Theorem 1. Query-map Convergence Rate. Consider a nonlinear cost function ϕ : Rn 7 R that satisﬁes the condition (ξ, δ4s, M(Q))-WRSC. Given the prior query graph Q, the attributed graph G, let x be the true speciﬁc shape vector: x Rn, ϕ(x) ϕ(x ) subject to supp(x), supp(x ) M(Q), the sequence solution {xi} deﬁned by Query-map holds

||xi+1 x ||2 α||xi x ||2 + β|| Iϕ(x )||2 (4)

where the parameters are speciﬁed as α = 4 q

δ4s 1 δ4s ,

β = ξ 1 δ4s [ 4 1 2δ4s + 2(1 2δ4s)

δ4s δ2 4s + 2], and the vertex set

I = arg max S M(Q) || Sϕ(x )||2. We ensure α < 1, when selecting δ4s < 1 17. We analyze the constant || Iϕ(x )||2, and the parameters, α β, impact on the convergence to the near-optimal x for the sequence solution {xi}. The following result is an immediate corollary achieved by Theorem 1. Corollary 1.1. Estimation Error Level. For the speciﬁc shape anomaly detection (i.e., isomorphism to Q), the vector x is calibrated to x [0, 1]n where v corresponds to the desired anomaly node in G if xv = 0, and v is not anomaly node otherwise. For the i-th solution xi, and the true speciﬁc shape vector x , we have

xi x 2 β 1 α Iϕ(x ) 2 (5)

Iϕ(x ) 2 1 α

Before reaching the estimation error level (5), Query-map geometrically converges to the near-optimal x . The estimation error of Query-map is determined by the multipliers of || Iϕ(x )||2. Especially, when || Iϕ(x )||2 = 0, Querymap guarantees that the true x is exactly recovered within ﬁnite iterations. The shrinkage rate α relates to the parameter δ, where δs δ2s δ4s (Tropp and Needell 2008). The smaller δ is attained, and the faster convergence rate of Query-map is achieved. In this work, as s(1 α)/β in the upper bound (6) is a small constant, we consider the case where || Iϕ(x )|| is a sufﬁciently small constant, such that ||xi+1 x || ||xi x ||. Thus the speed of convergence of our algorithm Query-map mainly depends on the shrinkage rate α for || Iϕ(x )|| is a sufﬁciently small constant. Theorem 2. Query-map Time Complexity. The absolute upper bound to the estimation error ||xi x || is

2s for xi, x [0, 1]n. Within an approximate estimation error, we obtain a near-optimal solution ˆx [0, 1]n, such that ||x ˆx||2 ||x ||2 + β/(1 α) || Iϕ(x )||2, subject to, supp(ˆx) M(Q). The time complexity of Query-map is

O T log(1 / || Iϕ(x )||2) (7) where for each iteration, the time T consists of two parts: 1) One execution of the subproblem in Line 5 and 8; and 2) one execution of the subproblem in Line 7. The estimation error, ||x ||2+β/(1 α) || Iϕ(x )||2 , is a tighter bound for the estimation error level (5). The total time complexity relates to which well-studied oracle is employed to the projection pursuit for speciﬁc shape anomaly query. Our approach in Algorithm 1 reaches a near-linear time complexity with linear projection pursuit oracles. We observe that M(Q) is not a convex set. The optimization methods, such as Frank-Wolfe, can not be directly applied to the speciﬁc shape anomalous subgraph discovery based problem as studied in this paper before. We ﬁrst proposed an approach to optimize nonlinear functions in speciﬁc shape anomaly queries with the better theoretical basis.

Application to Well-known Objective Function The nonlinear cost function ϕ in Problem (1) can be applied to least square function, Kulldorff s original Poisson scan statistic (KULL) and Expectation-based Poisson statistic (EBP) (Neill 2009b). Least square is one of the most popular models in regression analysis that ﬁnds speciﬁc-shape sets of best ﬁt for the attributed graphs (Chen and Zhou 2016). In this model, given the attribute vector w Rn, the least square function is optimized via speciﬁc-shape constraints in attributed graphs.

min x Rn w x 2 2 s.t. supp(x) M(Q)

It is well-known that the least square function is strongly convex. Its Bregman divergence ( ϕ := ϕ(x) ϕ(y) < ϕ(y), x y >) is x y 2 2. We obtain that the least square function satisﬁes the condition (ξ, δ4s, M(Q))- WRSC that δ4s = 1 2ξ for ξ < 1 (Yuan and Liu 2014). For ensuring that Query-map converges to the near-optimal solution, the parameter ξ ranges between 8/17 and 1. In this work, there are the other two well-known graph scan statistics functions: KULL and EBP that are widely employed in pattern detection in graphs. For the statistics, there are just two attributes, observed count and expected count (Neill 2009a). Let c = W:,0 Rn denote the observed count attribute values. Similarly, let d = W:,1 Rn denote the expected count attribute values. The log forms of EBP and KULL are examined in the following functions.

ϕEBP (x) := x T c log(x T c / x T d) x T d + x T c

ϕKULL(x) := x T c log(x T c / x T d)

(1 x)T c log (1 x)T c / (1 x)T d

The forms of EBP and KULL are derived from the statistics F(VC) in (Neill 2009a) that is formulated as the detection problem: min C G F(VC) s.t. VC M(Q).

Experiments Our experiments consist of two parts: (i) uncovering high quality speciﬁc shape attacking anomalies in the real-world edu.cn network dataset by our method; and (ii) demonstrating the efﬁciency of our method on different applications.

Uncovering High Quality Speciﬁc Shape Attacking Anomaly Dataset. Real-World edu.cn Network Dataset. An Internet security company1 provided us with the total 3,978,073 web sites browsing logs from May 31, 2014 to May 13, 2015. The real-world trafﬁc network of 131,107 nodes and 358,386 edges, is built from the browsing logs (i.e., the edge (IP site A, IP site B) denotes that A visited B). For a day t and a node v in this network, we denote the number of logs within v on that day t as the observed value cv, and the average number of logs within v before t as the expected value dv.

1An Internet security company in China with more than 0.6 billion users.

Our method. Query-map employs the graph scan statistic ϕEBP as the objective function to detect speciﬁc shape attack subgraphs in real networks. Result. In Figure 2, given the query graphs Q1, Q2 and Q3, choose a random day (i.e., March 13, 2015) from the period of Jan 1, 2015 and May 13, 2015. The speciﬁc shape attacking anomalies are presented in Figure 2. The right subﬁgure of Figure 2 denotes the star-chain-shape anomaly queries. The middle subﬁgure of Figure 2 presents an attacking sub-network. The left subﬁgure of Figure 2 shows the speciﬁc shape cyber attacking cases. All of the detected sites with our method are identiﬁed as true attacking sources by the experts from the Internet security company. (1) Speciﬁc shape anomaly Q1. The user x.x.223.66 attacking the server xkb.hlu.edu.cn is categorized to Fck Editor , Upload Webshell and Common Vulnerability attacks. This user attacking the servers tw.hlu.edu.cn, zsb2.hlu.edu.cn and kydown.hlu.edu.cn with the same approach is categorized to Upload Webshell attacks. (2)Speciﬁc shape anomaly Q2. The server www.hlu.edu.cn is attacked from the users x.x.78. 34 by SQL Inject , x.x.3.69 and x.x.103.149 by Dedecms and Common Vulnerability . The user x.x.103.149 also attacked the servers xkb.hlu.edu.cn and tw.hlu.edu.cn with Dedecms and Common Vulnerability attacks. (3)Speciﬁc shape anomaly Q3. The user x.x.223.66 attacked the server www.hlu.edu.cn with Fck Editor , Upload Webshell and Common Vulnerability approaches. These attacks caused this server to be a bot machine. This infected server attacked the server www.cq51edu.cn with Common Vulnerability . The user x.x.32.220 attacked the server www. cq51edu.cn with Upload Webshell and Common Vulnerability approaches. The user x.x.21.210 attacked the server www.cq51edu.cn with the nginx parse approach. From the results, we can observe that the speciﬁc shape anomalies are exactly detected from the network.

Efﬁciency Comparison of Query-map and Baseline Datasets: 1) Water Pollution Dataset. By the chemical contaminant plumes are distributed at 4 nodes within different areas (Chen and Zhou 2016), we collected the realworld water pollution network of 12,527 nodes and 14,831 edges. The network is constructed by the K-Nearest Neighbor (KNN) algorithm. The spreads of contaminant plumes were simulated using the water network simulator EPANET for 8 hours (Chen and Zhou 2016). For the observed count attribute, the value at each node v is collected from the corresponding sensor per hour, cv 1 if it is polluted and cv 0 otherwise. For testing the robustness of methods to noises, we randomly ﬂipped K percent sensor binary values, where K {2, 4, 6, 8, 10}. The noise ratio dv K% is considered as the expected count attribute (Chen and Zhou 2016). 2) Respiratory Emergency Department (ED) Dataset. In a grid network of 10,000 nodes and 14,850 edges, for each node v, we collected the T day period of respiratory ED visit data Wv RT (e.g., T = 28, and the time t = 0 denotes the current day). The outbreak linearly increases in cases over the outbreak duration (Neill 2009a). During non-outbreak period, the number of patients visiting ED in v is Wt v Poisson(µ) for t = 0, , T where µ {1, , 34} de-

notes the expected number in v on that days. During outbreak period, we randomly select the outbreak duration of U from {1, , 7}, normalizes the weight wv P

t Wt v so that the total weight is equal to 1 in infected nodes, and set the outbreak severity (e.g., =800) (Neill 2009a). On each day t {0, , U}, we inject cases into each infected node v i.e., Wt v Wt v + Poisson((T t)wv ) for t = 0, , U (medium-size outbreaks injected for 10 percent nodes (Neill 2009a)). For testing the robustness of methods, we ﬂipped values of K {2, 4, 6, 8, 10} percent nodes randomly, i.e., no inject outbreak cases if the nodes are infected, inject outbreak cases otherwise. Let cv = W0 v denote the observed count of infected cases, and dv = 1 T PT t=1 Wt v denote the expected count. The datasets are summarized in Table 2. Comparison Methods. We compared our method Query-map with the two state-of-the-art baselines: Topk (Gupta et al. 2014) and Fast-k (Yang et al. 2016). The baselines are designed speciﬁcally for speciﬁc shape anomaly discovery in attributed graphs. The baselines aim to obtain top k subgraphs with the maximal sum of its node scores, which are isomorphic to the query graph. The parameters k and D are tuned completely based on the author recommendation in the original papers for k = 10, D = 2 to Topk (Gupta et al. 2014) and k = 20, d = 2 to Fast-k (Yang et al. 2016). Let w(v) cv for each vertex v VG. Performance Metrics. 1) Precision. We compute the precision of the target subgraph (i.e., the ratio of the number of correct anomalous nodes and the number of nodes). The recall metric is ignored for the ﬁxed size of returned target subgraph. 2) Function Score and Running Time. The optimization power of our method is examined in the scores of graph scan statistics. We compare our method to the baselines on running times. Results: Precision for target subgraphs detection. Figure 3 illustrates the precisions obtained by our methods Query-map (EBP and KULL) corresponding to red bars and cyan bars, and the competitive methods Fast-K and Top-K corresponding to green bars and blue bars. For the 2% noise level, Figure 3 (a) and (c) illustrate that our methods Querymap (KULL) for Water Pollution dataset and Query-map (EBP) for Emergency dataset outperformed all the baselines on the precision. For the Water Pollution dataset in Figure 3 (a), our methods and baselines perform similar results. Our methods recovered at least 99.5% true target subgraphs on the anomaly queries Q2 and Q3. With the query graph expanding in Figure 3(c), the precisions of baselines decrease quickly, however the precisions of our method Query-map (EBP) even increase to 1.0. Especially for the query graph Q3, we can observe that the precisions of our methods are greater than at least 0.67 for the best baseline. For the 10% noise level, the most results in Figure 3(b) and (d) indicate that our methods outperform the baselines. In Figure 3(b), the precisions of baselines decrease with the query graph size. However, even at the 10% noise level, the precisions of our methods still increase to 1.0. For the Emergency dataset in Figure 3(d), although our Query-map (KULL) is little better than the best baselines, our Query-map (EBP) achieves at least 0.27 improvement in the precision larger than the best

X.X.103.149

zsb2.hlu.edu.cn

xkb.hlu.edu.cn

kydown.hlu.edu.cn

www.cq51edu.cn

tw.hlu.edu.cn

www.hlu.edu.cn

Query Graphs

Cyber Attack Cases Computer Network

xkb.hlu.edu.cn

kydown.hlu.edu.cn

zsb2.hlu.edu.cn

tw.hlu.edu.cn

www.hlu.edu.cn

xkb.hlu.edu.cn

X.X.103.149

tw.hlu.edu.cn

www.hlu.edu.cn

X.X.103.149

tw.hlu.edu.cn

zsb2.hlu.edu.cn X.X.21.210

www.cq51edu.cn X.X.223.66

Figure 2: The cyber attack cases on March 13, 2015 are detected by our method Query-map (EBP) in the edu.cn real-world network dataset (red solid nodes denote the attacking or attacked sites, and blue solid nodes denote normal nodes).

Table 2: Summary of datasets (WP: Water Pollution; ED: Emergency Department).

Attributed Networks Interesting Anomaly Queries Data set

Attribute cv (i.e., observed value) for each node v

Attribute dv (i.e., expected value) for each node v

Q1 query graph

Q2 query graph

Q3 query graph WP 12,527 14,831 Sensor value (0 or 1) Noise level

ED 10,000 14,850 Number of patient visits Average number of visits

baseline. Although Fast-K and Top-K performed not bad in Figure 3(a), these heuristic algorithms do not have any theoretical guarantees on identifying speciﬁc shape anomaly subgraphs. From the results in Figure 3, we can observe that our methods not only have a signiﬁcant theoretical property, but also always have a high precision. In Table 2, the graph anomaly shapes Q1, Q2 and Q3 in Water Pollution dataset are different from graph anomaly shapes in Emergency Department dataset for the pollution area showing strip shapes and the outbreak area showing star shapes (Wu et al. 2017). Query-map achieved the precisions for strip anomaly shapes in Water Pollution data and star anomaly shapes in Emergency Department data illustrated in Figure 3. The Top-K is a path index-based method (Gupta et al. 2014), and the Fast-K is a heuristic method for assembling star components. The baselines tend to perform better on Water Pollution data with strip shape graph anomalies, however, perform not better on Emergency Department data with star shape graph anomaly. Evolving curves of graph scan statistics. Figure 4(a-b) report the graph scan statistic (EBP) scores of detected subgraphs for the {0, , 9} iterations. The results in Figure 4(a-b) illustrate that our method Query-map (EBP) converges in less than 10 iterations. Especially in Figure 4(a) for the Water Pollution dataset, Query-map (EBP) converges in less than 3 iterations. The empirical results show the fast convergence trends of our methods Query-map. Scalability analysis of running time. Table 3 reports the comparison between our methods Query-map and the competitive baseline methods on the running time. In Table 3, the running times were collected from the computer with Intel Xeon E3-1220 (e.g., 4 CPU, 3.1 GHz) and 24GB RAM. The

results in Table 3 show that our proposed method Querymap ran faster than all the baseline methods in most settings, except for the speciﬁc shape anomaly graph Q1. Even though the baseline method Fast-K ran the fastest over the query graph Q1, this method can not detect the target subgraph with high qualities in Figure 3(b-d). In Emergency data, for the Q3, our methods returned results within 97 seconds, however, Top-K did not output any results within 7 hours. The method can performs on the edu.cn data with 131,107 nodes. Results in Table 3 also imply that the running time of our methods is insensitive to the noise level, which is consistent with the time complexity of Query-map as discussed in Theorem 2.

This paper presents an efﬁcient algorithm to optimize nonlinear functions subject to speciﬁc shape anomaly. Our approach is guaranteed to the near-optimal solution under the estimation error upper bound of ||x ||2 + β/(1 α) || Iϕ(x )||2 . A wide variety of attributes can be employed to the speciﬁc shape anomaly discovery in graphs. We will extend Query-map on the best-effort match to the speciﬁc shape graph anomaly.

A1. Proof of Theorem 1

We ﬁrst introduce Lemma 3 to bound residues between the solution xi and the optimal x .

Lemma 3. Let ri = xi x . Given A M(Q), we have

Table 3: Efﬁciency Measure: Comparison on the running times of our methods and the baselines.

Water Pollution Data Emergency Data Run Time (second) 2 % noise 10 % noise 2 % noise 10 % noise Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q1 Q2 Query-map (EBP) 7.99 13.26 22.57 8.18 14.25 21.92 0.95 0.98 0.73 0.74 Query-map (KULL) 7.90 12.81 22.22 7.10 12.57 19.21 0.81 0.82 0.35 0.36 Fast-K 2.37 34.03 86.13 2.27 46.83 190.54 0.47 73.41 0.47 72.32 Top-K 33.27 389.69 1,343.22 56.62 660.79 2,313.82 315.99 6,433.98 389.73 6,640.92

Q1 Q2 Q3 (a) Water Pollution (2% noise)

Fast-K Top-K Query-map (EBP) Query-map (KULL)

Q1 Q2 Q3 (b) Water Pollution (10% noise)

Q1 Q2 Q3 (c) Emergency Data (2% noise)

Q1 Q2 Q3 (d) Emergency Data (10% noise)

Figure 3: Effective Validation: Precision comparisons of our methods Query-map (EBP) and Query-map (KULL), and the baseline methods Fast-K and Top-K.

0 1 2 3 4 5 6 7 8 9 iteration

(a) Water Pollution

Q1 2% noise Q2 2% noise Q3 2% noise Q1 10% noise Q2 10% noise Q3 10% noise

0 1 2 3 4 5 6 7 8 9 iteration

18 (b) Emergency Data

Figure 4: Efﬁciency Measure: Evolving curves of graph scan statistic scores (EBP) for our method (Query-map) in different iterations. These scores are evaluated on these two datasets with 2% and 10% noises and the query graphs.

the following inequality.

||ri Ac|| 2 q

δ4s δ2 4s||ri||+ 2ξ 1 2δ4s + (1 2δ4s)ξ p

|| Iϕ(x )|| (8)

where I = arg max S M(Q) || Sϕ(x )||2.

Now, we present the proof of Theorem 1.

Proof. Similarly, we denote ri+1 = xi+1 x . We have ||(xi+1 b) + (b x )||2 ||xi+1 b||2 + ||b x ||2 by the triangular inequality property. In Algorithm 1, by Line 8, we have ||xi+1 b||2 = ||b B b||2. According to Problem (2), b B is restricted to the largest elements of b by the projection oracle P, and thus ||b B b||2 ||x b||2. The upper bound of ||ri+1||2 is 2||x b||2, for the deduction of ||ri+1||2 = ||xi+1 x ||2 ||xi+1 b||2 + ||x b||2 = ||b B b||2 + ||x b||2 2||x b||2.

Compute the upper bound of ||(x b)Ω||2 2

=< b x , (b x )Ω> =< b x ξ Ωϕ(b) + ξ Ωϕ(x ), (b x )Ω> < ξ Ωϕ(x ), (b x )Ω> ||b x ξ Ωϕ(b) + ξ Ωϕ(x )||2||(b x )Ω||2+

ξ|| Ωϕ(x )||2||(b x )Ω||2 δ4s||b x ||2||(b x )Ω||2 + ξ|| Ωϕ(x )||2||(b x )Ω||2

where the second equality is derived from Ωϕ(b) = 0 for the function ϕ is minimized at b over the set Ωin Line 7 of Algorithm 1. The last inequality is derived from the condition (ξ, δ4s, M(Q))-WRSC. Thus the upper bound of ||(x b)Ω||2 is δ4s||b x ||2 + ξ|| Ωϕ(x )||2. For the sets Ωand A, we denote the complement sets are Ωc = VG \ Ωand Ac = VG \ A. We have ||x b||2 ||(x b)Ω||2 + ||(x b)Ωc||2. By the upper bound of ||(x b)Ω||2, we have ||(x b)Ω||2 δ4s||x b||2 + ξ|| Ωϕ(x )||2 + ||(x b)Ωc||2. We have ||x b||2 ||(x b)Ωc||2

1 δ4s + ξ|| Ωϕ(x )||2

1 δ4s . By supp(b) Ωat Line 7 and supp(xi) Ωat Line 6 of Algorithm 1, we have ||(x b)Ωc||2

1 δ4s + ξ|| Ωϕ(x )||2

1 δ4s = ||x Ωc||2 1 δ4s + ξ|| Ωϕ(x )||2

||(x xi)Ωc||2

1 δ4s + ξ|| Ωϕ(x )||2

1 δ4s = ||ri Ωc||2 1 δ4s + ξ|| Ωϕ(x )||2

Ωc Ac, we have ||ri Ωc||2 1 δ4s + ξ|| Ωϕ(x )||2

1 δ4s ||ri Ac||2 1 δ4s +

ξ|| Iϕ(x )||2

1 δ4s . At last we obtain the upper bound as follows

||x b||2 ||ri Ac||2 / (1 δ4s)+ξ|| Iϕ(x )||2 / 1 δ4s

Combing ||ri+1||2 2||x b||2, the above inequality, and Lemma 3, we ﬁnish proving this theorem.

A2. Proof of Theorem 2 Proof. As xi, x [0, 1]n and |supp(xi)|, |supp(x )| s, the absolute upper bound to ||xi x || is

2s. For the

small constant || Iϕ(x )||2, we present the acceptable upper bound ||x ||2 + β/(1 α) || Iϕ(x )||2. By Inequality (4), at the i-th iterate of Algorithm 1, we obtain the bound ||x xi||2 αi||x ||2 + β 1 α|| Iϕ(x )||2. Let αi||x ||2 + β 1 α|| Iϕ(x )||2 ||x ||2 + β/(1 α) || Iϕ(x )||2. After log( 1 || Iϕ(x )||2 )/ log 1

α iterations, the estimate ˆx satisﬁes ||x ˆx||2 ||x ||2+β/(1 α) || Iϕ(x )||2. As the time complexity is T for one iteration, the overall time complexity of Query-map follows the result in Theorem 2.

Acknowledgments This work was supported by the National Key R&D Program of China (No. 2018YFC0809800), and partly supported by the NSF grant (No. IIS-1815696). The corresponding author is Wenjun Wang.

References Akoglu, L.; Tong, H.; and Koutra, D. 2015. Graph based anomaly detection and description: a survey. Data mining and knowledge discovery 29(3):626 688. Chen, F., and Zhou, B. 2016. A generalized matching pursuit approach for graph-structured sparsity. In IJCAI, 1389 1395. Cordella, L. P.; Foggia, P.; Sansone, C.; and Vento, M. 2004. A (sub) graph isomorphism algorithm for matching large graphs. IEEE TPAMI 26(10):1367 1372. Gupta, M.; Gao, J.; Yan, X.; Cam, H.; and Han, J. 2014. Topk interesting subgraph discovery in information networks. In ICDE, 820 831. IEEE. Hegde, C.; Indyk, P.; and Schmidt, L. 2015. A nearly-linear time framework for graph-structured sparsity. In ICML, 928 937. Huan, J.; Wang, W.; and Prins, J. 2003. Efﬁcient mining of frequent subgraphs in the presence of isomorphism. In ICDM, 549 552. IEEE. Kulldorff, M. 1997. A spatial scan statistic. Communications in Statistics-Theory and methods 26(6):1481 1496. Neill, D. B. 2009a. An empirical comparison of spatial scan statistics for outbreak detection. International journal of health geographics 8(1):1. Neill, D. B. 2009b. Expectation-based scan statistics for monitoring spatial time series data. International Journal of Forecasting 25(3):498 517. Tropp, J. A., and Needell, D. 2008. Cosamp: Iterative signal recovery from incomplete and inaccurate samples. Co RR abs/0803.2392. Wu, N.; Chen, F.; Li, J.; Huai, J.; and Li, B. 2017. Querydriven discovery of anomalous subgraphs in attributed graphs. In Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017, 3105 3111. Yang, S.; Wu, Y.; Sun, H.; and Yan, X. 2014. Schemaless and structureless graph querying. Proceedings of the VLDB Endowment 7(7):565 576.

Yang, S.; Han, F.; Wu, Y.; and Yan, X. 2016. Fast top-k search in knowledge graphs. In ICDE, 990 1001. Yuan, X.-T., and Liu, Q. 2014. Newton greedy pursuit: A quadratic approximation method for sparsity-constrained optimization. In CVPR, 4122 4129. Yuan, X.; Li, P.; and Zhang, T. 2014. Gradient hard thresholding pursuit for sparsity-constrained optimization. In ICML, 127 135. Zhang, X.; Martin, T.; and Newman, M. E. 2015. Identiﬁcation of core-periphery structure in networks. Physical Review E 91(3):032803. Zou, L.; Chen, L.; and Lu, Y. 2007. Top-k subgraph matching query in a large graph. In the ACM ﬁrst Ph.D. workshop in CIKM, 139 146. ACM.