# sharp_impossibility_results_for_hypergraph_testing__d49d505a.pdf Sharp Impossibility Results for Hypergraph Testing Jiashun Jin Carnegie Mellon University jiashun@stat.cmu.edu Zheng Tracy Ke Harvard University zke@fas.harvard.edu Jiajun Liang Purdue University liangjj@purdue.edu In a broad Degree-Corrected Mixed-Membership (DCMM) setting, we test whether a non-uniform hypergraph has only one community or has multiple communities. Since both the null and alternative hypotheses have many unknown parameters, the challenge is, given an alternative, how to identify the null that is hardest to separate from the alternative. We approach this by proposing a degree matching strategy where the main idea is leveraging the theory for tensor scaling to create a least favorable pair of hypotheses. We present a result on standard minimax lower bound theory and a result on Region of Impossibility (which is more informative than the minimax lower bound). We show that our lower bounds are tight by introducing a new test that attains the lower bound up to a logarithmic factor. We also discuss the case where the hypergraphs may have mixed-memberships. 1 Introduction The hypergraph is a useful representation of social relationships beyond pairwise interactions [5, 11]. For example, the co-authorship hypergraph is often used to analyze the co-authorship topology of authors, and it provides more information than a co-authorship graph (where an m-author paper is treated as an m-clique). The community detection on a hypergraph [10] is a problem of great interest (communities in a hypergraph are clusters of nodes that have more hyperedges within than across). It has many applications in social network analysis [15] and machine learning [1, 17, 18, 23]. We are interested in the problem of global testing, where we test whether the hypergraph has one community or multiple communities. It has applications in measuring co-authorship and citation diversity [12] and discovering non-obvious social groups and patterns [4]. It is also useful for understanding other problems such as community detection and change-point detection in dynamic hypergraphs. For instructional purpose only, we start with the 3-uniform hypergraphs (i.e., each hyperedge consists of 3 nodes), but our results cover both higher-order hypergraphs and non-uniform hypergraphs. Let A be the adjacency tensor of a uniform and symmetric order-3 hypergraph with n nodes, where Ai1i2i3 equals 1 if i1, i2, i3 share a hyperedge and equals 0 otherwise, (1.1) for 1 i1, i2, i3 (distinct) n. Since the tensor is symmetric, Ai1i2i3 = Aj1j2j3 for two sets of indices {i1, i2, i3} and {j1, j2, j3} if one is a permutation of the other. We do not consider self hyperedges, hence, Ai1i2i3 = 0 whenever i1, i2, i3 are non-distinct. Real world hypergraphs have several noteworthy features. First, there may be severe degree heterogeneity (i.e., the degree of one node is many times higher than that of another). Second, the overall sparsity levels may vary significantly from one hypergraph to another. Last, a node may have mixed-memberships across multiple communities (i.e., nonzero weights on more than one community). To accommodate these features, we adopt the Degree-Corrected Mixed-Membership (tensor-DCMM) model. The notations below are frequently used in tensor analysis. Definition 1.1. (matricization, slicing, and slice aggregation). Let A be a 3-symmetric tensor of ndimension. First, we call the n n2 matrix A the matricization of A, defined by Ai,j+n(k 1) = Aijk, 35th Conference on Neural Information Processing Systems (Neur IPS 2021). 1 i, j, k n. Second, for 1 k n, we use A::k to denote the n n matrix whose row-i-andcolumn-j is Aijk, 1 i, j n, and call it the k-th slice of A. Last, for any n 1 vector x, we use (Ax) to denote the matrix Pn k=1 xk A::k, which is an aggregation of the slices. Now, suppose the tensor has K perceivable communities. Let P be a symmetric (3-uniform) tensor of K-dimension that models the community structure, let θ = (θ1, θ2, . . . , θn) be positive parameters that model the degree heterogeneity of nodes, and π1, π2, . . . , πn be K-dimensional membership vectors where πi(k) = the weight node i puts on community k, 1 k K. We assume for all 1 i1, i2, i3 n that are three distinct indicies, Ai1i2i3 are independent Bernoulli random variables with P(Ai1i2i3 = 1) = θi1θi2θi3 PK k1,k2,k3=1 Pk1,k2,k3πi1(k1)πi2(k2)πi3(k3), where by Definition 1.1, the right hand side equals to θi1θi2θi3π i1(Pπi3)πi2. Introduce a non-stochastic 3-uniform tensor Q of n-dimension where Qi1i2i3 = θi1θi2θi3π i1(Pπi3)πi2 for 1 i1, i2, i3 n. Let diag(Q) be the tensor with the same size of Q where (diag(Q))i1i2i3 = Qi1i2i3 if i1, i2, i3 are non-distinct, and (diag(Q))i1i2i3 = 0 otherwise. It follows that E(A) = Q diag(Q), where Qi1i2i3 = θi1θi2θi3π i1(Pπi3)πi2. (1.2) For identifiability, let P RK,K2 be the matricization of P (see Definition 1.1). We assume rank(P) = K, and Piii = 1 for all 1 i K. (1.3) Definition 1.2. We call (1.1)-(1.3) the tensor-DCMM model for 3-uniform hypergraphs. We call Q and P the Bernoulli probability tensor and community structure tensor for DCMM, respectively. Later in Section 3, we introduce the non-uniform tensor-DCMM as a more sophisticated model. In tensor-DCMM, if we require all πi to be degenerate (i.e., one entry is 1, all other entries are 0), then tensor-DCMM reduces to the Degree-Corrected Block Model (tensor-DCBM) [15]. If we further require θ1 = . . . = θn (but the second condition in (1.3) can be removed), then tensor-DCBM further reduces to the Stochastic Block Model (tensor-SBM) [9]. For simplicity, we may drop tensor" in these terms if there is no confusion. The global testing problem above is then to test H0 : K = 1 vs. H1 : K > 1. (1.4) Our primary goals are (a) to find a sharp information lower bound for 3-uniform DCMM, and especially, to fully characterize the lower bound by a simple quantity to be discovered, and (b) extend the results to more sophisticated non-uniform hypergraphse (see Section 3). A good understanding of the problem greatly helps us understand the fundamental limits of many other problems (e.g., community detection, determining the number of communities K, dynamic hypergraphs). For example, for community detection (e.g., [10]), we either assume K as known or estimate it first. Note that in parameter regions where we cannot tell whether K = 1 or K > 1, we cannot estimate K consistently, so we cannot have consistent community detection either. Therefore, a lower bound for global testing is always a valid lower bound for estimating K and for community detection. To facilitate the lower bound study, we frequently adopt a Random Mixed-Membership (RMM) model. Introduce a subset V0 = {x RK : xk 0, PK k=1 xk = 1}, and let F be a K-variate distribution with support contained in V0. We assume πi iid F; (let h = EF [πi]). (1.5) Moreover, let V 0 = {e1, e2, . . . , e K} V0, where ek is the k-th basis vector of RK. Similarly, RMM-DCMM reduces to RMM-DCBM if we require supp(F) = V 0 , and reduces to RMM-SBM if we further require θ1 = θ2 = . . . = θn (but the second condition in (1.3) can be removed). Let θ = (θ1, θ2, . . . , θn) . We allow (θ, P, h, F) to vary with n to cover a variety of settings where we allow for severe degree heterogeneity, mixed-memberships, flexible sparsity levels, and weak signals. Example 1 (2-parameter SBM [2, 22, 19]). This model is a special case of DCMM, where θ1 = . . . = θn = αn (no degree heterogeneity), all πi are degenerate (no mixed membership), and Pijk = 1 if i = j = k and Pijk = ρn otherwise, for two parameters (αn, ρn). Also, Lin et al. [20] studied a 3-parameter SBM, which is the same as above except that they assume a different form of P, where Pijk equals to 1, ρn, or τn if i, j, k take 1, 2, or 3 distinct values, respectively, for three parameters (αn, ρn, τn). Compared to DCMM, these models are much narrower: they do not accommodate severe degree heterogeneity or mixed-memberships, and P is parametrized by 2 or 3 different values. How to derive a sharp lower bound for global testing (and especially, to identify a simple quantity that fully characterizes the lower bound) in our setting is a rather challenging problem. Our model is a non-uniform hypergraph model (see Section 3), which consists of hypergraphs of order 2, 3, . . . , M, and each layer consists of many unknown parameters (θ, P, h, F). Existing works on lower bounds have been focused on uniform hypergraphs, and non-uniform hypergraphs are much less studied. Even for uniform hypergraphs, existing works have been focused on the the special SBM as in Example 1, not the more general DCMM model. For example, Yuan et al. [22] derived the lower bounds for global testing with the 2-parameter SBM, focusing on the extremely sparse case. Ahn et al. [2] provided lower bound results for exactly recovering the communities (see Liang et al. [19] and Kim et al. [16] for related settings) with a similar model. Lin et al. [20] and Chien et al. [7] used the 3-parameter SBM in Example 1 for study of the lower bounds for community detection. While these papers are very interesting, their lower bounds are characterized by only 2 or 3 parameters (i.e., αn, ρn, τn) assumed in their models. For the much broader tensor-DCMM model considered here, we have many parameters θ, P, h, F (θ is an n-vector and P is a tensor), and how to extend existing results to the tensor-DCMM setting here is a challenging problem. Our contributions. The main challenge in lower bound study is that, since each DCMM model (no matter what K is) has a large number of unknown parameters, so given an alternative hypothesis (K > 1), it is hard to identify the null hypothesis (K = 1) that is most difficult to distinguish from the alternative hypothesis. As our main contribution, we approach this by proposing a degree matching strategy, where for any given DCMM model with K > 1, we pair it with a DCMM model with K = 1 in a way so that for each node, the expected degree under the null matches with that under the alternative. This way, it is hard to separate the two hypotheses by a naive degree-based statistic. We show (a) the degree matching is always possible by using a tensor scaling technique [3, 8], and (b) the pair of hypotheses we construct this way lead to sharp results on lower bounds. See Section 2. We have the following results. Consider the 3-uniform hypergraph first. We first present the standard minimax theory. Define a class of RMM-DCMM models with K > 1 and µ2 2 θ 2 θ 1 0 (µ2 is the second singular value of P). In this class, we can find an RMM-DCMM model (the alternative), and pair it with a DCMM model with K = 1 (the null), such that the χ2-divergence between the pair converges to 0 as n . Therefore, in this class, there exists an alternative model that is asymptotically inseparable from the null. The standard minimax theory only claims that there exists an alternative model within a specified class that is inseparable from the null. It is desirable to show a much stronger result where for any alternative in the class, we can pair it with a null so that the χ2-divergence of the pair goes to 0 as n . In detail, we show that in the parameter space (θ, P, h, F) of RMM-DCBM, there is a Region of Impossibility defined by µ2 2 θ 2 θ 1 0; for any alternative in Region of Impossibility, we can pair it with a null such that the χ2-divergence between the pair goes 0 as n . Compared with existing results on minimax lower bounds, these results are more informative and theoretically more satisfactory. The proof is also different from the proof of minimax lower bounds: we have used the tensor scaling theory [3, 8] and the degree matching strategy" aforementioned. We also extend such results to the broader RMM-DCMM case (Section 2.3) and discuss some major differences on the Region of Impossibility between DCBM and the more restrictive SBM (Section 2.4). Next, we generalize the results to higher-order and non-uniform hypergraphs. Fix M 2 and consider a non-uniform hypergraph (e.g., see [10]) that consists of m-uniform hypergraphs for all m = 2, . . . , M, each following a DCMM model with individual (θ(m), P(m)) but the common π1, . . . , πn. Let ℓm = θ(m) m 2 1 θ(m) 2(µ(m) 2 )2 (to be defined in Section 3). We show that (a) for the m-uniform hypergraph case, the Region of Impossibility for the hypothesis testing (1.4) is fully characterized by the condition of ℓm 0, and (b) for the non-uniform hypergraph case, the Region of Impossibility for the hypothesis testing (1.4) is fully characterized by the condition of max2 m M{ℓm} 0. Last, we show that our lower bounds are tight. Consider the non-uniform hypergraph above. We propose a new test statistic and show that the sum of Type I error and Type II error 0 if max2 m M{ℓm} log(n)1+δ for a constant δ > 0 (taking δ = 0.1 will work). Therefore, except for a logarithmic factor here, our lower bounds are tight. In summary, existing results on lower bounds are largely focused on more restrictive settings (e.g., uniform hypergraphs without degree heterogeneity or mixed membership). We provide sharp lower bounds for a much broader setting. Our study is highly non-trivial because we need (i) a novel degree matching strategy to construct least favorable hypothesis pairs, (ii) to identify a simple quantity that is able to fully characterize the lower bounds, (iii) delicate analysis of the χ2-divergence between the null and alternative, (iv) a carefully designed test that leads to tight upper bounds. 2 Sharp Lower Bounds for 3-Uniform Hypergraphs For notational simplicity, we focus on 3-uniform hypergraphs in this section. The study of higherorder and non-uniform hypergraphs is deferred to Section 3. Consider a (3-uniform) DCMM model. Recall that h = EF [πi] and that P RK,K2 is the matricization of the tensor P. In this paper, we use C > 0 as a generic constant which may vary from occurrence to occurrence. We assume P C, θmax max{θ1, . . . , θn} C, max 1 k K{hk} C min 1 k K{hk}, θ 2 θ 1 . (2.6) The first condition is mild, because the model identifiability already requires that all diagonal entries of P are 1. The second one is also mild (note that while the largest possible value of θmax is O(1), θmax is allowed to tend to 0 relatively fast). The third one assumes that the community memberships are balanced, which is also mild. For the last condition, we will see soon that if θ 2 θ 1 0, then the signal is so weak that successful global testing is impossible, so this condition is also mild. 2.1 Standard minimax lower bounds (RMM-DCMM) We start with the least favorable configuration. The goal is to find a pair of models (a null model with K = 1, and an alternative model with K > 1) which are hard to distinguish from each other. We use the following degree-matching technique: we choose the pair in a way so that for each node, the expected degree under the null matches with that under the alternative, approximately. The idea is, if the degrees are not matching, then we may separate the two hypotheses by a simple degree-based statistic, so we should not expect the χ2-divergence between the pair to be small. In detail, consider a pair of models, a DCMM model with K = 1 and an RMM-DCMM model with K > 1, where for all 1 i1, i2, i3 n, the Bernoulli probability tensors Q and Q satisfy Qi1i2i3 = θi1θi2θi3, Q i1i2i3 = θ i1θ i2θ i3 π i1(Pπi3)πi2; (2.7) here we recall that (Pπi3) is a K K matrix (see Definition 1.1). We call the two models the null and the alternative, respectively. In (2.7), the community structure tensor P is as in (1.2), and πi and h = EF [πi] are as in (1.5). Recall that θ = (θ1, θ2, . . . , θn) . Similarly, let θ = (θ 1, θ 2, . . . , θ n) . Fix 1 i1 n. By definitions and elementary statistics, the leading term of the expected degree of node i1, conditional on its own membership πi1, under the null and alternative are θi1 θ 2 1 and θ i1 θ 2 1 π i1a, respectively, where a = (Ph)h RK. (2.8) For least favorable construction, we choose (P, h) in a way so that a = (Ph)h = c3 01K, for a scalar c0 > 0. (2.9) For broadness, we allow c0 to depend on n. There are many (P, h) that satisfy (2.9). For example, in the 2-parameter SBM model as in Example 1 with h = (1/K, . . . , 1/K) and ak = (1/K2)[1 + (K2 1)ρn] for 1 k K, (2.9) is satisfied with c3 0 = (1/K2)[1 + (K2 1)ρn]. Moreover, we choose θ i in the alternative model such that θ i = (1/c0)θi, 1 i n. (2.10) Now, by (2.8)-(2.10), for all 1 i1 n, θ i1 θ 2 1 π i1a = θ i1 θ 2 1 c3 0 = θi1 θ 2 1. Therefore, for each node, the expected degree under the alternative matches that under the null (at least in the leading term), making it hard to separate the null and alternative by any naive degree-based statistics. Only when such a degree-matching holds, we can hope two models are asymptotically indistinguishable. This is the key for our least favorable configuration. Recall that P RK,K2 is the matricization of the community tensor P and µk is the k-th largest singular value of P. Theorem 2.1 (Least favorable configuration). Fix K > 1 and consider a pair of models, a null and an alternative with K communities, given in (2.7), where (2.9)-(2.10) hold. Assume (2.6) holds and θ 1 θ 2µ2 2 = o(1). As n , the χ2-divergence 1 between the pair tends to 0. Therefore, the two models are asymptotically indistinguishable: for any test, the sum of Type I and Type II errors is no smaller than 1 + o(1). Remark (How degree matching affects the χ2-divergence). Intuitively speaking, the χ2-divergence has many terms, each being the sum (or a function) of terms in a Taylor expansion. We can roughly call these terms the first-order term, second-order term, and so on. When the expected node degrees are not matched between the null and the alternative, the first-order term dominates, and correspondingly, a degree-based χ2-test may have power; see Section 2.4. When the expected degrees are matched, the first-order term vanishes as we have hoped, and the degree-based χ2-test loses power. As a result, the second-order term in the χ2-divergence now dominates and gives the sharp lower bound. The above least favorable configuration gives rises to the standard minimax theorem. Fix K 1 and consider a DCMM model (where πi are non-random). Introduce a vector g RK by gk = (1/ θ 1) Pn i=1 θiπi(k), 1 k K. For a constant 0 < c0 < 1, and two positive sequences {αn} n=1 and {βn} n=1, we define a class of DCMM models by Mn(K, c0, αn, βn) = (θ, Π, P): P c0, θmax c0, max1 k K gk c 1 0 min1 k K gk, θ 1 θ 2 1/βn, θ 1 θ 2µ2 2 αn Here, we assume max1 k K{gk} C min1 k K{gk}, so the tensor is balanced. This is similar to the third condition in (2.6), except that πi s are random there. For the null case, K = P = πi = 1, and the above defines a class of θ, which we write for short by Mn(1, c0, αn, βn) = M n(βn). The following theorem is proved in the supplement. Theorem 2.2 (Minimax lower bound). Fix K 2, a constant c0 > 0, and any sequences {αn} n=1 and {βn} n=1 where αn = o(1) and βn = o(1). As n , infψ supθ M n(βn) P(ψ = 1) + sup(θ,Π,P ) Mn(K,c0,αn,βn) P(ψ = 0) = 1 o(1), where the infimum is over all possible tests ψ. 2.2 Region of Impossibility for RMM-DCBM The standard minimax theorem in Section 2.1 only says that in the class of all alternative models with θ 2 θ 1µ2 2 0, there exists one where we can pair it with a null so the pair are asymptotically inseparable. A much more satisfactory result is to show that, for any alternative in the same class, we can pair it with a null such that the pair are asymptotically inseparable. In this section, we prove this for the RMM-DCBM case (mixed-memberships are not allowed). The discussion for the RMM-DCMM (mixed-memberships allowed) is in Section 2.3 and the supplement. Consider again a pair of models, a DCBM null model and an RMM-DCBM model with K > 1, where the Bernoulli probability tensors are Q and Q , respectively. We assume for all 1 i1, i2, i3 n, Qi1i2i3 = θi1θi2θi3, Q i1i2i3 = θ i1θ i2θ i3π i1(Pπi3)πi2. (2.11) Here, the community structure tensor P is as in (1.2), πi and h = EF [πi] are as in (1.5), and supp(F) = {e1, . . . , e K}. Similarly, the goal is degree-matching: for any (θ, P, h, F), we construct θ in a way so that for each node, the expected degrees under the null and the alternative match with each other approximately. Recall that in Section 2.2, in order to have a desired degree matching, it is crucial to pick an alternative model where the (P, h) satisfies a (Ph)h = c3 01K for some scalar c0 > 0; once this holds, we have the desired degree matching by taking θ = (1/c0)θ. Unfortunately, for general (P, h), we don t have a (Ph)h 1K, so we should not expect to have the desired degree matching by taking θ θ. In short, for our purpose here, the approach in Section 2.1 no longer works, and we must find a new approach to constructing the model pair. Our proposal is as follows. For any diagonal matrix D = diag(d1, . . . , d K) with dk > 0, 1 k K, define a K-dimensional vector a D by (PD is a 3-tensor in dimension K) a D = (PDh)h, with PD k1k2k3 = dk1dk2dk3Pk1k2k3, 1 k1, k2, k3 K. We aim to select a matrix D such that a D = 1K. The next lemma states that such D always exists and is unique. It leverages classic results on tensor scaling (e.g., [3, 8]) and is proved in the supplement. 1The χ2-divergence between two models, f0(A) and f1(A), is defined as R [(f0(A) f1(A))2/f0(A)]d A. In our setting, the alternative model f1(A) alone involves an integral over the distribution of πi in (1.5) Lemma 2.1. Fix K > 1 and let P, h, D, and a D be as above. Suppose min{h1, h2, . . . , h K} C. There exists a unique diagonal matrix D = diag(d1, d2, . . . , d K) such that a D = 1K. Now, given (θ, P, h, F) and the two models in (2.11), let D = diag(d1, d2, . . . , d K) be as in Lemma 2.1. Moreover, in (2.11), we choose θ i as follows: θ i = dkθi, if node i belongs to community k. (2.12) Combining it with (2.11), we have Q i1i2i3 = θ i1θ i2θ i3 π i1(Pπi3)πi2 = θi1θi2θi3 π i1(PDπi3)πi2. By similar calculations as in (2.8), for 1 i n, in the null and the alternative, the leading terms of the expected degrees of node i are θi θ 1 and θi(π ia D) θ 2 1, respectively, where a D = (PDh)h. By Lemma 2.1, a D = 1K. Hence, for each node, the expected degrees match under the null and alternative, so it is hard to separate two models by a naive degree-based statistic. Recall that P is the matricization of P and µk is the k-th singular value of P. Theorem 2.3 is proved in the supplement. Theorem 2.3 (Impossibility for RMM-DCBM). Fix K > 1. For any given (θ, P, h, F), consider a pair of models, a null and an alternative with K communities, as in (2.11), where θ i are given by (2.12) with the matrix D as in Lemma 2.1. Suppose (2.6) holds and θ 1 θ 2µ2 2 = o(1). As n , the χ2-divergence between the pair tends to 0. Therefore, the two models are asymptotically indistinguishable: for any test, the sum of Type I and Type II errors is no smaller than 1 + o(1). Region of Possibility, Region of Impossibility, and tightness. In the parameter space (θ, P, h, F) for DCBM, we call the region prescribed by θ 1 θ 2µ2 2 0 the Region of Impossibility: by Theorem 2.3, for any model in this region, we can pair it with a null so that the pair are asymptotically inseparable. We call the region prescribed by θ 1 θ 2µ2 2/ log1.1(n) the Region of Possibility: for any alternative model in this region, there is a method that can separate it from any null with asymptotically full power (this follows from Theorem 3.2 as a special case). Comparing Region of Impossibility and Region of Possibility, except for a log(n) term, our lower bounds are tight. 2.3 Region of Impossibility for RMM-DCMM The discussion for RMM-DCMM is similar, so for reasons of space, we leave it to Section A of the supplement. In that section, we present a similar theorem for RMM-DCMM, where the hypergraphs may have mixed-memberships. The proofs are largely similar, where again the key is to construct a pair of null and alternative using the degree matching strategy. Since the model is more complicated than RMM-DCBM, we need an extra (but mild) condition. See details therein. 2.4 Major differences on Region of Impossibility for the more restrictive RMM-SBM For an alternative RMM-DCBM, we can always pair it with a null using tensor scaling technique: for each node, the expected degrees under the null and the alternative match with each other. For the more restrictive RMM-SBM (where θ1 = . . . = θn), such a degree matching is not always possible: A null SBM has only 1 parameter, so we have insufficient flexibility in choosing the null for degree matching. A consequence is that a naive degree-based test may have non-trivial power for SBM. Consider a pair of models, where the Bernoulli probability tensors Q and Q under two hypotheses are such that, for 1 i1, i2, i3 n, Qi1i2i3 = αn, Q i1i2i3 = π i1(Pπi3)πi2. (2.13) Same as before, πi are iid generated from a distribution F on {e1, e2, . . . , e K} with h = EF [πi]. Let ˆαn = (n(n 1)(n 2)) 11 n(A1n)1n, η = (1/2)(A1n)1n, and η be the mean of η1, η2, . . . , ηK. Consider the centered-χ2-statistic ψn = (2n) 1/2 X (ηi η)2/( n 1 2 ˆαn(1 ˆαn)) 1 . Lemma 2.2. Consider the global testing problem (1.4) under the SBM model (2.13) for H0 and H1, respectively. Let eαn = E[ˆαn], h = (1/n) Pn i=1 πi and Σ = 1 n Pn i=1(πi h)(πi h) . Let λK 1(Σ) be the (K 1)-th largest eigenvalue (in magnitude) of Σ. Assume αn c0, max1 i,j,k K{Pijk} c0, n2αn , and n2eαn . Also, assume min{h1, h2, . . . , h K} C and λK 1(Σ) C. Write δn = eα 1 n (IK HK)(Ph)h , where HK = (1/K)1K1 K and IK is the identity matrix of the same size. As n , ψn N(0, 1) if H0 holds, and ψn if H1 holds and n3/2eα1/2 n δ2 n By Lemma 2.2, the power of the χ2-test hinges on δn δn = eα 1 n (IK HK)(Ph)h . Note that δn = 0 if and only if (Ph)h 1K. We call the cases of (Ph)h 1K and (Ph)h 1K the symmetric case and the asymmetric case, respectively. For symmetric SBM, δn = 0, and we do not expect the χ2-test to have power. However, for asymmetric SBM, the χ2-test may have non-trivial power, implying a potential shift of the lower bound. Example 2. Consider an SBM setting where we either have K = 1 (null) or K = 2 (alternative). Also, when K = 2, we assume h = (a, 1 a) for some 0 < a < 1, Pijk is equal to ρ0 if i = j = k and ρ1 otherwise. Suppose n2ρ0 and ρ1/ρ0 1. This can be viewed as a special DCBM with θi ρ1/3 0 and off-diagonals of P being ρ1/ρ0. In this case, we have θ 1 θ 2 n2ρ0, τn θ 1 θ 2µ2 2 n2ρ 1 0 (ρ1 ρ0)2, and n3/2eαnδ2 n (nρ0) 1/2(2a 1)2τn. In the symmetric case where a = 1/2, the χ2-test is powerless, and the Region of Impossibility is given by τn 0 (same as in the DCBM case). In the asymmetric case where |a 1/2| c0, by direct calculations, we have that even when τn 0, we may have n3/2eα1/2 n δ2 n (so χ2-test has asymptotically full power). Here, the interesting range of ρ0 is (n 2, 1), so we may have (nρ0) 1/2 . Therefore, for the asymmetric case, the Region of Impossibility is different from that of DCBM. In most lower bound results for SBM [2, 7, 20, 22], they focused on the symmetric case (Ph)h 1K. Our lower bound restricted to symmetric SBM agrees with those in the literature. The asymmetric case (Ph)h 1K is less studied, except for [14, 21] which focused on the network setting (m = 2). We discover: (i) the Region of Impossibility for symmetric SBM is similar to that of DCBM (see Example 2), and (ii) the Region of Impossibility for asymmetric SBM is quite different from that of DCBM. This is because DCBM is much broader than SBM, where the problem of global testing is much harder, and a naive degree-based test statistic may lose power. 3 Sharp lower bound for non-uniform hypergraphs Section 2 discusses lower bounds for uniform 3-hypergraph. We now first extend the results to more general non-uniform hypergraphs, and then present a tight upper bound. Note that our results include those for m-uniform hypergraphs as special cases (see the paragraph behind Theorem 3.1). The notation below is useful: Definition 3.1. Given an order-m tensor M in dimension K and vectors b1, b2, . . . , bm RK, let [M; b1, , bm] denote the summation P 1 k1,k2,...,km K[Mk1k2...kmb1(k1)b2(k2) bm(km)]. Fix M 2. Consider a general non-uniform hypergraph that consists of m-uniform hypergraphs for all 2 m M. Fixing 2 m M, let A(m) be the adjacency tensor of the order-m hypergraph (i.e., A(m) i1i2 im = 1 if {i1, i2, . . . , im} is a hyper-edge and 0 otherwise). As before, we model A(m) with the tensor-DCMM model. Let P(m) be a symmetric order-m tensor in dimension K, and let θ(m) = (θ(m) 1 , θ(m) 2 , . . . , θ(m) n ) be a positive vector of degree parameters. Let π1, π2, . . . , πn be the K-dimensional membership vectors (which do not depend on m). We assume {A(m) i1i2 im}1 i1 1, it is impossible to estimate K and community labels consistently. Therefore, our lower bound is also a valid lower bound for estimating K and for community detection. We present the Region of Impossibility for testing H0: K = 1 versus H1: K > 1. Similarly as in Section 2.2, we focus on the special case of tensor-DCBM models (i.e., each πi is degenerate); the study of tensor-DCMM is similar. Fix K > 1. Consider a DCBM null model with probability tensors Q[M] = {Q(2), . . . , Q(M)} and an RMM-DCBM model with probability tensors Q [M] = {Q (2), . . . , Q (M)}, where for every 2 m M and 1 i1, i2, . . . , im n, Q(m) i1,i2,...,im = θ(m) i1 θ(m) i2 θ(m) im , (3.15) Q (m) i1,i2,...,im = θ (m) i1 θ (m) im [P(m); πi1, . . . , πim], πi iid F. (3.16) The support of F is in V 0 = {e1, e2, . . . , e K}. Let h = EF [πi] and suppose min{h1, . . . , h K} C. In the supplemental material, we provide a lemma analogous to Lemma 2.1: For each 2 m M, there exists a unique diagonal matrix D(m) = diag(d(m) 1 , d(m) 2 , . . . , d(m) K ) such that X 1 i2,...,im K d(m) i1 P(m) i1 im (d(m) i2 hi2) (d(m) im him) = 1, for every 1 i1 K. Its proof leverages the Sinkhorn theorems for higher-order tensors [3, 8]. We choose θ (m) in (3.16) by θ (m) i = d(m) k θ(m) i , if node i belongs to community k. (3.17) This is analogous to the degree matching strategy in (2.12), and it is conducted for each m separately. Let P (m) RK Km 1 be the matricization of P(m) and let µ(m) 2 be the second singular value of P (m). For short, let ℓm = θ(m) m 2 1 θ(m) 2(µ(m) 2 )2. Theorem 3.1 is proved in the supplement. Theorem 3.1 (Impossibility for non-uniform RMM-DCBM). Fix K > 1 and M 2. For any given (h, F) and {(θ(m), P(m))}2 m M, consider a pair of models, a null as in (3.15) and an alternative with K communities as in (3.16), where {θ (m) i }1 i n,2 m M are as in (3.17). Suppose P (m) C, max1 i n θ(m) i C, and min1 k K hk C. If max2 m M{ℓm} = o(1), then as n , the χ2-divergence between the pair tends to 0. Remark (Comparison with [22]). Yuan et al. [22] gave a nice impossibility result for the 2-parameter symmetric SBM. Their model is a special case of our model where θ(m) i α(m) n and P(m) has equal off-diagonal entries ρ(m) (see Example 1 for m = 3). In this case, |µ(m) 2 | = |1 ρ(m)|, and ℓm nm 1αm n (1 ρ(m))2. Hence, ℓm 0 is equivalent to nm 1α(m) n (1 ρ(m))2 0, which matches with results in [22]. Below in Section 3.1, we show that the lower bounds are tight. Theorem 3.1 includes m-uniform hypergraphs as a special case (e.g., to apply the theorem to an m0-uniform hypergraph, we set θ(m) a zero vector for all m = m0). For an m-uniform hypergraph, the Region of Impossibility is given by ℓm 0, which is the same as that in Theorem 2.3 when m = 3. While many lower bound results are available for uniform hypergraphs [2, 7, 20, 22], non-uniform hypergraphs are less studied. Our lower bound in Theorem 3.1 leads to two notable discoveries: (i) the Regions of Possibility/Impossibility for A(m) are fully characterized by the simple quantity of ℓm; (ii) the Regions of Possibility/Impossibility for non-uniform hypergraphs are fully characterized by the simple quantity of max2 m M{ℓm}. 3.1 Tightness of the lower bounds We propose a test for the global testing problem (1.4) and show that it attains the lower bounds in Theorems 2.3 and Theorem 3.1. For simplicity, we only discuss this for DCBM with moderate degree heterogeneity but the tightness holds in much broader settings (e.g., DCMM). For each 2 m M, we first compute a vector η(m) Rn, which serves as an estimate of θ(m) when the null hypothesis is true. Fix m. Given a positive vector u Rn, define L(u) Rn by i2,...,im(distinct) A(m) i i2...im + P i2,...,im(non-distinct) uiui2 uim P i1,...,im(distinct) A(m) i1i2...im + P i1,...,im(non-distinct) ui1 uim (m 1)/m . (3.18) Let Nm = m 1 2 . Initialize at u(0) = 0n, compute u(k) = L(u(k 1)) iteratively for k = 1, ..., Nm, and output u(Nm) as η(m). We note that each u(1) i is a simple function of node degrees, 1 i n. For m {2, 3}, since η(m) = u(1), we estimate θ(m) directly from node degrees. However, for m 4, u(1) is a biased estimator of θ(m) under the null, and the bias is caused by diag(Q(m)). The iteration serves to reduce this bias. The required number of iterations depends on m explicitly. Next, we construct a statistic Q(m) n to capture the difference between A(m) and a rank-1 estimate of the Bernoulli probability tensor. Let A (m) be a tensor of the same size as A(m), where A (m) i1...im = A(m) i1...im η(m) i1 η(m) im for 1 i1, . . . , im n. We say that S = (S1, S2, . . . , Sm+1) is an (m+1)- partition of {1, 2, . . . , n} if S1, . . . , Sm+1 are disjoint and their union is {1, 2, . . . , n}. Let B be the set of all (m + 1)-partitions. For each S = (S1, . . . , Sm+1) B and 1 k1, ..., km m + 1, let Q(m) n = max S=(S1,...,Sm+1) B max 1 k1,...,km m+1 i1 Sk1,...,im Skm(distinct) A (m) i1...im Finally, we combine Q(2) n , . . . , Q(M) n . For each m, let V (m) n = n m ˆαn(1 ˆαn), where ˆαn = [(n m)!/n!] Pn i1,...,im=1 A(m) i1...im. The test statistic is ϕn = max 2 m M Q(m) n /[n log(n)1.1V (m) n ]1/2 . (3.20) Recall that by Theorem 3.1, for any alternative with max2 m M{ℓm} = o(1), we can pair it with a null so that the pair are asymptotically indistinguishable. The next theorem says that the proposed test statistic ϕn can successfully separate any alternative satisfying max2 m M{ℓm} log1+δ(n) (for some δ > 0; taking δ = 0.1 is adequate) from the null. Therefore, except for a logarithmic factor, our lower bounds are tight. Recall that ℓm = θ(m) m 2 1 θ(m) 2(µ(m) 2 )2. Let θ(m) max and θ(m) min be the maximum and minimum entry of the vector θ(m), respectively. Theorem 3.2 (Tightness of lower bounds). Consider the general tensor-DCBM model with M 2. Let h = 1 n Pn i=1 πi, and let P (m) RK,Km 1 be the matricization of P(m). Suppose P (m) C, max1 k K{hk} C min1 k K{hk}, θ(m) max Cθ(m) min, θ(m) max c0 for a constant c0 < 1, and θ(m) m 2 1 θ(m) 2/ log(n) , for every 2 m M. Then, ϕn 0 in probability, if H0 holds, and ϕn in probability, if H1 holds and max2 m M{ℓm}/[log1.1(n)] . To speed up the computation of ϕn for large n, we introduce a proxy statistic by replacing the search over B by a specific ˆB as follows: conduct SVD on the matricization of A(m), take the first (m + 1) left singular vectors, and apply the spectral algorithm [13] to partition nodes into (m + 1) groups. We use this partition ˆB to replace the maximization over B in (3.19), to get a proxy to Q(m) n ; the test statistic ϕn is then defined similarly as in (3.20). As long as the hypergraph is not too sparse, this proxy works well and is computationally much faster. 4 Numerical study We use simulated data to validate our theoretical results. Fix (n, K, m) = (500, 2, 3). In Experiment 1, we consider the SBM model and verify that the Regions of Impossibility are different for symmetric and asymmetric SBM (see Section 2.4). Let θi = n 1/2 for 1 i n, and Pijk = 1 if i = j = k and Pijk = 1/4 otherwise. We consider a symmetric case where each communities have 250 nodes and an asymmetric case where two communities have 375 and 125 nodes, respectively. For each setting, we randomly generate the hypergraphs, apply the degree-based χ2-statistic ψn in Section 2.4, and repeat for 500 times. The histograms of ψn for two cases are on the left panel of the figure below (green: symmetric alternative; red: asymmetric alternative; blue: density of N(0, 1)). By Lemma 2.2, ψn N(0, 1) in the null. Hence, the results suggest that ψn is unable to distinguish the symmetric alternative from the null but can distinguish the asymmetric alternative from the null. In Experiment 2, we consider DCBM and use the least-favorable configuration in Section 2.2 where the degree matching strategy is employed. We verify that a degree-based test such as ψn indeed has no power. Let (n, K, m) be the same as above. In the null, we let θi be iid drawn from Pareto(0.5, 5) and then re-normalize the vector of θ so that n 2 θ 1 θ 2 = cn, for cn = n 1; in the alternative, let (P , h) be the same as in the asymmetric case in Experiment 1 and generate θ i is as in (2.12) (D is obtained by treating D(P(Dh))Dh = 1K as a nonlinear equation and apply the Matlab function solve). The histograms of ψn under the null (green) and alternative (red) are shown on the middle panel of the figure below. The two histograms are inseparable from each other. 10 5 0 5 10 15 ψn under SBM Asymmetric case Symmetric case 0 10 20 30 ψn under DCBM Alternative 0 2 4 6 φn under DCBM Alternative Null In Experiment 3, we study the test statistic ϕn proposed in Section 3.1. The simulation setting is the same as that in Experiment 2, except that cn = n 1/2. To save computing time, we use the proxy of ϕn by plugging in a ˆB from spectral clustering (see Section 3.1). The histograms of the test statistic under the null (green) and alternative (red) are shown on the right panel of the figure above. We see that ϕn successfully distinguishes the alternative from the null. This validates our result in Theorem 3.2. The distribution of ϕn in the alternative has two modes, due to that the proxy ˆB we plug in has two most frequent realizations. However, this does not affect the testing performance. 5 Conclusion We consider the problem of global testing for non-uniform hypergraphs in a broad DCMM setting. Given an alternative, how to identify the null that is hardest to separate from the alternative is a challenging problem. We solve this by proposing a degree matching strategy, and use it to derive a tight lower bound by tensor scaling techniques and delicate analysis of the χ2-divergence. We discover that for an m-uniform hypergraph, the Regions of Impossibility/Possibility are governed by the simple quantity of ℓm = θ(m) m 2 1 θ(m) 2(µ(m) 2 )2 (and so those for a non-uniform hypergraph are governed by max2 m M{ℓm}). We also propose a new test that attains the lower bounds, so our lower bounds are tight. For future work, we notice that the test in Section 3.1 is computationally expensive. It is desirable to find some fast algorithms that also achieve the lower bounds. The signed-cycle statistics [6, 14] are polynomial-time statistics that have shown appealing performance for network global testing. It is possible that these statistics can be generalized to hypergraphs to provide polynomial-time tests that are also theoretically optimal. We leave it to future study. Acknowledgments and Disclosure of Funding The authors thank the anonymous reviewers for useful comments. J. Jin is supported in part by NSF Grant DMS-2015469. Z. Ke is supported in part by NSF CAREER Grant DMS-1943902. [1] Sameer Agarwal, Jongwoo Lim, Lihi Zelnik-Manor, Pietro Perona, David Kriegman, and Serge Belongie. Beyond pairwise clustering. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 05), volume 2, pages 838 845. IEEE, 2005. [2] Kwangjun Ahn, Kangwook Lee, and Changho Suh. Community recovery in hypergraphs. IEEE Transactions on Information Theory, 65(10):6561 6579, 2019. [3] Ravindra Bapat. D1AD2 theorems for multidimensional matrices. Linear Algebra and its Applications, 48:437 442, 1982. [4] Javier Béjar, Sergio Álvarez, Dario García, Ignasi Gómez, Luis Oliva, Arturo Tejeda, and Javier Vázquez-Salceda. Discovery of spatio-temporal patterns from location-based social networks. J. Exp. Theor. Artif. Intell., 28(1-2):313 329, 2016. [5] Austin R Benson, David F Gleich, and Jure Leskovec. Higher-order organization of complex networks. Science, 353(6295):163 166, 2016. [6] Sébastien Bubeck, Jian Ding, Ronen Eldan, and Miklós Z Rácz. Testing for high-dimensional geometry in random graphs. Random Structures & Algorithms, 49(3):503 532, 2016. [7] I Chien, Chung-Yi Lin, and I-Hsiang Wang. Community detection in hypergraphs: Optimal statistical limit and efficient algorithms. In International Conference on Artificial Intelligence and Statistics, pages 871 879, 2018. [8] Shmuel Friedland. Positive diagonal scaling of a nonnegative tensor to one with prescribed slice sums. Linear Algebra and its Applications, 434(7):1615 1619, 2011. [9] Debarghya Ghoshdastidar and Ambedkar Dukkipati. Consistency of spectral partitioning of uniform hypergraphs under planted partition model. Advances in Neural Information Processing Systems, 27:397 405, 2014. [10] Debarghya Ghoshdastidar and Ambedkar Dukkipati. Consistency of spectral hypergraph partitioning under planted partition model. The Annals of Statistics, 45(1):289 315, 2017. [11] Jacopo Grilli, György Barabás, Matthew J Michalska-Smith, and Stefano Allesina. Higher-order interactions stabilize dynamics in competitive network models. Nature, 548(7666):210, 2017. [12] Pengsheng Ji, Jiashun Jin, Zheng Tracy Ke, and Wanshan Li. Co-citation and co-authorship networks of statisticians. Journal of Business & Economic Statistics, to appear, 2021. [13] Jiashun Jin. Fast community detection by SCORE. The Annals of Statistics, 43(1):57 89, 2015. [14] Jiashun Jin, Zheng Tracy Ke, and Shengming Luo. Optimal adaptivity of signed-polygon statistics for network testing. Annals of Statistics, to appear, 2021. [15] Zheng Tracy Ke, Feng Shi, and Dong Xia. Community detection for hypergraph networks via regularized tensor power iteration. ar Xiv preprint ar Xiv:1909.06503, 2019. [16] Chiheon Kim, Afonso S Bandeira, and Michel X Goemans. Stochastic block model for hypergraphs: Statistical limits and a semidefinite programming approach. ar Xiv:1807.02884, 2018. [17] Sungwoong Kim, Sebastian Nowozin, Pushmeet Kohli, and Chang Yoo. Higher-order correlation clustering for image segmentation. Advances in Neural Information Processing Systems, 24:1530 1538, 2011. [18] Lei Li and Tao Li. News recommendation via hypergraph learning: encapsulation of user behavior and news content. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining, pages 305 314, 2013. [19] Jiajun Liang, Chuyang Ke, and Jean Honorio. Information theoretic limits of exact recovery in sub-hypergraph models for community detection. IEEE International Symposium on Information Theory, 2021. [20] Chung-Yi Lin, I Eli Chien, and I-Hsiang Wang. On the fundamental statistical limit of community detection in random hypergraphs. In 2017 IEEE International Symposium on Information Theory (ISIT), pages 2178 2182. IEEE, 2017. [21] Cammarata Louis and Zheng Tracy Ke. Power enhancement and phase transitions for global testing of the mixed membership stochastic block model. Manuscript, 2021. [22] Mingao Yuan, Ruiqi Liu, Yang Feng, and Zuofeng Shang. Testing community structures for hypergraphs. Annals of Statistics, to appear, 2021. [23] Dengyong Zhou, Jiayuan Huang, and Bernhard Schölkopf. Learning with hypergraphs: Clustering, classification, and embedding. Advances in Neural Information Processing Systems, 19:1601 1608, 2006.