# limits_of_dense_simplicial_complexes__7e2c7267.pdf Journal of Machine Learning Research 24 (2023) 1-42 Submitted 7/22; Revised 3/23; Published 7/23 Limits of Dense Simplicial Complexes T. Mitchell Roddenberry mitch@rice.edu Department of Electrical and Computer Engineering Rice University Houston, TX 77005-1827, USA Santiago Segarra segarra@rice.edu Department of Electrical and Computer Engineering Rice University Houston, TX 77005-1827, USA Editor: Bharath Sriperumbudur We develop a theory of limits for sequences of dense abstract simplicial complexes, where a sequence is considered convergent if its homomorphism densities converge. The limiting objects are represented by stacks of measurable [0, 1]-valued functions on unit cubes of increasing dimension, each corresponding to a dimension of the abstract simplicial complex. We show that convergence in homomorphism density implies convergence in a cut-metric, and vice versa, as well as showing that simplicial complexes sampled from the limit objects closely resemble its structure. Applying this framework, we also partially characterize the convergence of nonuniform hypergraphs. Keywords: Simplicial complex, graphon, stochastic topology, topological data analysis, graph limit 1. Introduction The theory of graph limits has been one of the most fruitful developments in modern combinatorics, with applications ranging from extremal graph theory to statistical physics (Borgs et al., 2008, 2012; Lov asz, 2012). Beyond combinatorics, graphs are undeniably of great interest in machine learning and data science applications, where they are used to reflect geometry in data (Bronstein et al., 2017). Moreover, graphs can be viewed as specific types of more general objects that capture higher-order relationships, such as simplicial complexes (Schaub et al., 2021). Given the fruitful applications of the theory of graphons in data science (Ruiz et al., 2020; Roddenberry et al., 2021; Navarro and Segarra, 2022; Coppini, 2022; Maskey et al., 2023), it is natural to ask how a similar theory can be developed for higher-order relational objects. Viewing graphs as (abstract) simplicial complexes of dimension one, we aim to develop such a theory for limiting objects of simplicial complexes. Indeed, recent work in stochastic topology has studied the topological properties of large random simplicial complexes, characterizing their connectivity and (co)homology (Bobrowski and Krioukov, 2022). An example of such a random simplicial complex is a random geometric ˇCech complex. Let µ be a probability measure on Rd for some integer d 1, and let ϵ > 0 be a real number. c 2023 T. Mitchell Roddenberry and Santiago Segarra. License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v24/22-0808.html. Roddenberry and Segarra Denote the support of µ by X = supp(µ), and treat X as a metric probability space, with the metric inherited from Rd. For some fixed ϵ > 0, we consider the ˇCech complex of X in Rd with radius ϵ, which is defined as the infinite simplicial complex ˇCϵ(X, Rd) composed of all finite subsets of X with diameter strictly less than ϵ. Figure 1: ˇCech complexes from a bouquet of circles in R2. Denoting by Tubϵ/2(X, Rd) the set of all points in Rd that are less than ϵ/2-away from X, the set of all open balls of radius ϵ/2 centered at points in X is a cover of Tubϵ/2(X, Rd) by contractible open sets. The nerve theorem then implies that the geometric realization of the ˇCech complex ˇCϵ(X, Rd) is homotopy equivalent to Tubϵ/2(X, Rd). As an example, let X be a bouquet of two unit circles in R2, that is, the union of two circles of unit radius centered at ( 1, 0) and (1, 0) respectively, as pictured in Figure 1, and let µ be the uniform measure on X. Clearly, the space Tubϵ/2(X, R2) (pictured in Figure 1 as the gray shaded region) is homotopy equivalent to X for any ϵ < 2. The field of topological data analysis (TDA) (Carlsson, 2009) has leveraged this idea to great success, replacing troublesome computations about the topological structure of complicated spaces with more tractable computations on approximations formed from point clouds, such as the ˇCech, or more commonly, Vietoris-Rips complex. In practice, TDA applications typically do not have access to the entire space X; rather, a finite set of n points Xn = {X1, . . . , Xn} in X sampled i.i.d. according to the probability measure µ is available, from which the ˇCech complex ˇCϵ(Xn, Rd) is formed. This ˇCech complex is used as an estimate of sorts for ˇCϵ(X, Rd). If the space X is compact, one can show that as n , the space Xn with the Euclidean metric almost surely converges to X in the Gromov-Hausdorffmetric. This implies, for instance, convergence of the persistence diagrams of Xn to that of X in the bottleneck distance (Chazal et al., 2009). Returning to our example of the bouquet of circles, see in Figure 1 that a finite subset Xn X yields a ˇCech complex that preserves the topological features of the space, for example, having a free fundamental group of rank two. Observe that sampling more points would not introduce any additional topological features to the constructed complex. However, convergence of this sort is not as easy to establish when X is not compact. For instance, the case when X = Rd and µ is a radially symmetric power-law, exponential, or Gaussian distribution was considered by Adler et al. (2014), where they showed that under many circumstances, highly nontrivial homological features emerge in ways that vary with n, precluding the convergence results mentioned above. We consider an alternative way to describe the convergence of the ˇCech complexes ˇCϵ(Xn, Rd) to ˇCϵ(X, Rd). Observe that each ˇCϵ(Xn, Rd) is a random induced subcomplex of ˇCϵ(X, Rd), with nodes distributed i.i.d. according to the probability measure µ. This motivates a preliminary definition of convergence for simplicial complexes: we say that a sequence of finite simplicial complexes Kn is convergent if for all k 1 and all finite simplicial complexes F with vertex set [k], the probability that the induced subcomplex of k nodes chosen uniformly at random from Kn is equal to F is convergent. That is to say, the distribution of uniformly sampled Limits of Dense Simplicial Complexes induced subcomplexes of any fixed size converges as n . We indicate this further subsampling operation in red in Figure 1. In this case, for instance, it holds for all n that the probability of subsampling a cycle on k < π/ arcsin(ϵ/2) nodes from ˇCϵ(Xn, R2) is zero. The characterization of convergence via the distribution of induced substructures is closely related to the theory of graphons (Borgs et al., 2008). Following this approach, we develop an analogous limit theory for simplicial complexes. We consider a sequence of simplicial complexes to be convergent if their homomorphism densities converge, which we show to be equivalent to convergence in a cut-metric. The limiting object of such a convergent sequence is described in terms of a collection of symmetric kernel functions on the sequence of unit cubes1 of increasing dimension; we dub these objects complexons.2 These objects closely relate to graphons in the random complex models that they induce, with two representations that correspond to either the usual presentation of a simplicial complex as a collection of finite sets closed under restriction, or the presentation in terms of its facets. The main results of this work relate the cut-metric and homomorphism densities for the limit objects of large, dense simplicial complexes. Theorem (Informal). If two complexons are close in the cut-metric, then they yield similar random simplicial complexes by sampling. Conversely, if they yield similar random simplicial complexes by sampling, then they are close in the cut-metric. This is stated more formally in Theorem 14. As part of the proof, it will also be shown that large simplicial complexes drawn from a complexon concentrate in the cut-metric, and thus have similar homomorphism densities to the complexon. The paper is organized as follows. After covering preliminaries in Section 2, complexons and random sampling from complexons are defined in Section 3. The main results are stated in Section 4, with most proofs relegated to the appendix. Some remarks relating complexons to other notions in the literature, including the convergence of nonuniform hypergraphs under certain conditions, are made in Section 5. 1.1 Related Work In graph limit theory, a key idea is to reduce extremely large graphs into simpler objects by treating them as distributions of random subgraphs. The study of such limits was initiated by Lov asz and Szegedy (2006), and further developed by Borgs et al. (2008, 2012). These notions relate to exchangeable random arrays, considered originally by Aldous (1981); Hoover (1979) and related to random graphs by Diaconis and Janson (2007). The work of Elek and Szegedy (2012) extended this approach to dense, uniform hypergraphs by considering analogous notions of homomorphism convergence and cut-metrics, which was further studied by Zhao (2015). The general approach of relating homomorphism densities and some cut-metric has been applied to form limit theories for other combinatorial objects as well, such as partially ordered sets (Janson, 2011) and random cographs (Stufler, 2021). The theory of large, random simplicial complexes for modeling higher-order interactions in network science has been quite active recently, with approaches from topological data analysis, dynamical systems, and signal processing (Schaub et al., 2021; Battiston and Petri, 1. Or, more generally, powers of a Borel probability space. See Section 5.3 for more details. 2. Following the concluding remarks of Bobrowski and Krioukov (2022). Roddenberry and Segarra 2022). A series of recent papers considered the topological properties of large, random simplicial complexes (Costa and Farber, 2016, 2017a,b). Indeed, a type of limiting object for simplicial complexes similar to the Rado graph was considered by Farber et al. (2021), which arises as the limit of a dense random simplicial complex on countably many nodes, in a certain regime. In the context of TDA, the study of persistent homology of random subsamples of metric measure spaces has been considered in great detail by Chazal et al. (2014, 2015, 2017). In particular, the persistence homology of persistence barcodes of random subsets of compact metric measure spaces is considered. This is in contrast to our work in two ways. First, the simplicial complexes we consider are not required to be modeled by an underlying metric measure space. Second, complexons naturally model simplicial complexes where not only the nodes are random (in that the nodes themselves are positioned in space according to some distribution), but where the faces connecting them are random. Initial studies of random complexes in noncompact spaces have been undertaken by Adler et al. (2014); Owada and Bobrowski (2020), where the (persistent) homological structure of point clouds drawn from distributions of unbounded support in Rd are considered. In the observation of what they dub topological crackle, the robustness of TDA methods to potentially unbounded noise is studied. Both of these works consider simplicial complexes determined via fixed rules applied to random sets of points, which varies from our study of complexes that are random even when conditioned on a fixed set of points. We refer the reader to the excellent survey paper of Bobrowski and Krioukov (2022) for more references on the emerging topic of large, random simplicial complexes. There has also been a body of literature studying regularity lemmas and limit theories for hypergraphs (R odl and Skokan, 2004; Gowers, 2006, 2007; Tao, 2006; Elek and Szegedy, 2012; Zhao, 2015; Balasubramanian, 2021). These works are concerned with studying uniform hypergraphs, where each hyperedge has some fixed cardinality. This stands in contrast to our work, as simplicial complexes are inherently nonuniform (unless of course, they are merely graphs, which are 2-uniform hypergraphs). We also develop a partial theory of limits of nonuniform hypergraphs in Section 5.4, framed by the upper and lower bounds of a hypergraphs by simplicial complexes. 2. Preliminaries The notation [n] denotes the set of integers {1, 2, . . . , n}. A simplicial complex3 K is a finite collection of finite sets that is closed under restriction, that is, the taking of subsets. The collection of sets in K with cardinality d + 1 is denoted by K(d), and we call that set the set of d-simplices. For a d-simplex σ, we say that dim σ = d. The set K(0) is referred to as the set of nodes, with the alternative notation V(K), when convenient. We denote the cardinality of V(K) by v(K). Although K(0) is a set of singleton sets, we will find it useful sometimes to identify said singleton sets with their constituent elements. The sets K(1) and K(2) are sometimes referred to as the sets of edges and triangles, respectively. If K is such that for all c > d, K(c) = , then we say that K is a simplicial complex of dimension d. We denote the smallest such d by dim K. A simple graph, for instance, is a simplicial complex of dimension one. It will also be useful to consider weighted simplicial complexes. 3. Strictly speaking, we are concerned with abstract simplicial complexes. However, we will never refer to the geometric realization, so we will omit the word abstract. Limits of Dense Simplicial Complexes A weighted simplicial complex (H, ω) is taken to be the power set H = 2V(H) for some finite V(H), and a function ω : H [0, 1], where ω is called the weight function. It is assumed that ω(V(H)) = 1 and ω( ) = 1. For a simplicial complex K, the set of facets of K is the collection of maximal sets in K, that is, the sets in K that are subsets of no other sets in K, denoted K . Observe that the set of facets of a simplicial complex completely characterizes the simplicial complex, as the simplicial complex can be reconstructed by taking the set of facets and all of their subsets. The set of antifacets of K is the collection of minimal sets in 2V(K)\K, that is, the collections of nodes that do not constitute a simplex, but whose strict subsets do constitute a simplex, denoted K . Similarly, the set of antifacets of a simplicial complex completely characterize the simplicial complex, as the simplicial complex can be reconstructed by taking the set of all strict subsets of the antifacets. We define the facets of a weighted simplicial complex not by changing the underlying set, but by changing the weight function. For a weighted simplicial complex (H, ω), the faceted weighted simplicial complex (H , ω ) is such that H = H, and for each σ H , ω (σ) = Y 2.1 Homomorphism Densities Much like in the theory of dense graph limits, we characterize simplicial complexes in terms of their homomorphism densities. For two simplicial complexes F, K, a homomorphism from F to K is a map φ : V(F) V(K) such that for any σ F (d), it holds that φ(σ) K(d), that is, a homomorphism is a simplex-preserving map. An induced homomorphism is an injective homomorphism from F to K with the added condition that σ F (d) if and only if φ(σ) K(d), that is, an injective homomorphism that also preserves non-simplices. We denote the number of homomorphisms and induced homomorphisms from F to K by hom(F, K) and ind(F, K), respectively. Normalizing these quantities appropriately yields homomorphism densities. Homomorphism densities answer the following type of question: given a uniformly chosen (injective) random map φ : V(F) V(K), what is the probability that that map is an (induced) homomorphism? More precisely, we respectively define the homomorphism density and induced homomorphism density of F in K as t(F, K) = hom(F, K) tind(F, K) = ind(F, K) P(v(K), v(F)), where P(n, k) is the number of k-permutation of n, that is, the number of injective maps from a set of cardinality k into a set of cardinality n. 2.2 The Cut Distance The homomorphism densities characterize the distribution of subcomplexes in a simplicial complex. This may be thought of as looking at local structures in a simplicial complex. On the other hand, the cut-metric characterizes simplicial complexes in a global way. The Roddenberry and Segarra cut-norm is defined for a multidimensional matrix (Frieze and Kannan, 1999, Section 6) in the following way. For finite sets X1, X2, . . . , Xr, let A : Q j Xj R. For subsets Sj Xj for j [r], put A(S1, . . . , Sr) = X e Q j Sj A(e). The (normalized) cut-norm of A is defined as A = max {|A(S1, . . . , Sr)| : Sj Xj for j [r]} Q j |Xj| . Let K1, K2 be simplicial complexes such that V(K1) = V(K2). For some d 1, let A1,d : Qd+1 j=1 V(K1) {0, 1} be the indicator function of d-simplices in K1, that is, A1,d(i1, . . . , id+1) = 1 if and only if {i1, . . . , id+1} K(d) 1 . Define A2,d in the same way for K2. The d-dimensional labeled cut-distance between K1 and K2 is defined as follows: d ,d(K1, K2) = A1,d A2,d . We extend this definition to weighted simplicial complexes (K3, ω) by putting A3,d(i1, . . . , id+1) = ω({i1, . . . , id+1}), and using the normalized cut-norm for the metric as before. This allows us to compare weighted and unweighted simplicial complexes. Observe that the d-dimensional labeled cut-distance finds the maximal collection of subsets S1, . . . , Sd+1 V(K1) such that the number of d-simplices contained in Q j Sj differs maximally between K1 and K2. More precisely, for arbitrary subsets S1, . . . , Sd+1 V(K1), put K(d) 1 (S1, . . . , Sd+1) = σ K(d) : σ Y and similarly define4 K(d) 2 (S1, . . . , Sd+1). Then, one can see that d ,d(K1, K2) = max Sj Xj:j [d+1] |K(d) 1 (S1, . . . , Sd+1)| |K(d) 2 (S1, . . . , Sd+1)| The d-dimensional labeled cut-distance describes how different two simplicial complexes may look when observing d-simplices across partitions of the node set. The distances d ,d for d 1 are each only able to characterize the difference between two complexes in a single dimension at a time. We define the labeled cut-distance by taking a weighted sum of all such distances. Namely, for a nonnegative, summable sequence (αj)j 1 of real numbers, put d (K1, K2; (αj)) = j=1 αjd ,j(K1, K2). Again, this definition naturally extends to include weighted simplicial complexes as well. 4. We abuse the notation σ Q j Sj here, meaning that some tuple formed by permuting the elements of σ is contained in the Cartesian product. Limits of Dense Simplicial Complexes The family of labeled cut-distances are dependent upon the assumption that the node sets of K1 and K2 are equal in size and correspond to one another. We now consider the case where two simplicial complexes have node sets that do not align. As a preliminary definition, for a nonnegative, summable sequence (αj)j 0, and two simplicial complexes K1, K2 such that v(K1) = v(K2), put bδ (K1, K2; (αj)) = min φ:V(K2) V(K1) d (K1, φ(K2); (αj)), where φ ranges over bijective maps from V(K2) to V(K1). That is to say, bδ first finds the optimal alignment between the two simplicial complexes, and then compares them via d . With this in hand, we now extend the definition to handle simplicial complexes whose node sets differ in size. First, for an integer m > 0, define the m-blowup of a simplicial complex K as the simplicial complex m K where V(m K) = V(K) [m], and a set {(vi, ji)}d+1 i=1 (m K)(d) if and only if {vi}d+1 i=1 K(d), where we assume that {vi}d+1 i=1 is a set of cardinality d + 1 (no duplicate nodes). Then, for two simplicial complex K1, K2 such that v(Ki) = ni, define the unlabeled cut-distance as δ (K1, K2; (αj)) = lim m bδ (mn2K1, mn1K2; (αj)). The unlabeled cut-distance blows both simplicial complexes up by a very large factor, so that it can then find a very fine alignment between them, amounting to a so-called fractional overlay of the node sets. Indeed, δ can be equivalently defined in terms of a minimizing fractional overlay of the node sets (see Lov asz, 2012, Eq. 8.10), but we omit that discussion here. 3. Complexons and Random Sampling As in the theory of graph limits, the convergence of a sequence of simplicial complexes can be defined in terms of homomorphism densities. Let K1, K2, . . . be a sequence of simplicial complexes. We say that the sequence (Kn)n 1 is convergent if for all simplicial complexes F, it holds that (t(F, Kn))n 1 is a convergent sequence. The remainder of this paper is devoted to understanding the appropriate limiting objects for such convergent sequences. The appropriate analogs for graphons in this case are functions on the disjoint union of unit cubes of dimension greater than or equal to 2. A complexon is a measurable5 function d 1 [0, 1]d+1 [0, 1], such that W is measurable and totally symmetric. That is to say, the restriction of W to each [0, 1]d+1 is measurable and symmetric in its coordinates. We find it convenient to assume that W([0, 1]) = 1 and W( ) = 1, that is, W evaluated on the empty tuple has unit value. We denote the set of all complexons by W. In the same way that a simplicial complex as a collection of sets closed under restriction can be equivalently described by its facets, we formulate a faceted complexon. For a 5. In this paper, measurable means Borel-measurable. The properties considered are not sensitive to differences on sets of measure zero, so one could also consider the Lebesgue measure. Roddenberry and Segarra complexon W, the faceted complexon W is a totally symmetric measurable function W : F d 1[0, 1]d+1 [0, 1] obeying the following rule. For all d 1, (x1, . . . , xd+1) [0, 1]d+1, put W (x1, . . . , xd+1) = Y σ [d+1] W(xσ), where xσ denotes the coordinates of (x1, . . . , xd+1) indexed by σ (this is well-defined, by the total symmetry of the complexon W). Prematurely interpreting the values taken by W as probabilities of simplices existing conditioned on the existence of their faces, the faceted complexon W intuitively describes the probability of a simplex existing. For instance, if W(x1, x2, x3) = 1, but W(x1, x2) = W(x1, x3) = W(x2, x3) = 0.5, then W (x1, x2, x3) = 0.125, reflecting the closure of a simplicial complex under restriction. 3.1 Complexons as Random Simplicial Complex Models A complexon induces a distribution of random simplicial complexes via a simple sampling procedure. Let W be a complexon, and Sn = {x1, . . . , xn} be a set of n points contained in [0, 1], for some integer n 1. From W and Sn, define a weighted simplicial complex H = W[Sn] so that V(H) = [n], H = 2[n], and each σ H has weight ω(σ) = W(xσ). From the weighted simplicial complex H, we randomly sample an unweighted simplicial complex in the following way. Let K be a simplicial complex, and put V(K) = [n]. Inductively for d 1, for each d + 1-subset σ [n] such that every strict subset of σ is contained in K, include σ in K with probability ω(σ) (that is, the weight of the simplex σ in H). Upon termination, this yields a finite simplicial complex K. We call the distribution from which K is drawn K(H). Noting that H is defined based on the complexon W via the set Sn, we randomize the set Sn to yield a more general random simplicial complex model. Let Sn be such that x1, . . . , xn are i.i.d. samples from the uniform distribution probability distribution on [0, 1]. Then, the random simplicial complex sampled from W[Sn] with random Sn is denoted K(n, W). The distribution K(n, W) can be thought of in the same way as the lower model of Farber et al. (2022). Indeed, it can be constructed by taking the largest simplicial complex contained in a random hypergraph on node set [n] whose hyperedges σ [n] exist independently each with probability W(xσ). Moreover, the faceted complexon W can be used via the upper model of Farber et al. (2022) to yield the distribution K(n, W). Again, take a random hypergraph H on the node set [n] whose hyperedges σ [n] exist independently each with probability W (xσ). Taking K to be the smallest simplicial complex that contains H, it can be shown that K is distributed according to K(n, W). We go in to more detail on the relationship between complexons and hypergraph limits in Section 5.4. 3.2 Homomorphisms in Complexons Motivated by the random sampling model K(n, W), the homomorphism densities t and tind can be naturally generalized to complexons. Let F be a simplicial complex, and identify V(F) = [n] for some n 1. The homomorphism density of F in a complexon W asks the following question: what is the probability that F is contained in the random simplicial complex K(n, W)? Based on this question, we directly define the homomorphism density Limits of Dense Simplicial Complexes of F in a complexon W. Put t(F, W) = Z σ F W(xσ)dx. (1) The question of the homomorphism density can also be posed in terms of facets. Put t(F , W ) = Z σ F W (xσ)dx. (2) A simple comparison of the integrands of (1) and (2) yields the following result, further justifying the name faceted complexon for W . Lemma 1. For any simplicial complex F and complexon W, it holds that t(F, W) = t(F , W ). In the case of graphons, the induced homomorphism density is computed in a fashion similar to (1), but with a product over all antifacets included in the integrand. For simplicial complexes, taking the product over all non-simplices would be redundant, as the nonexistence of an edge, for instance, implies the non-existence of all higher-order simplices that would contain that edge. Treating a graph with no isolated nodes as a simplicial complex of dimension one, notice that the facets of a graph are its edges, and the antifacets are its non-edges. Based on this, we write the induced homomorphism density of F in W as follows: tind(F, W) = Z σ F W(xσ) Y τ F (1 W(xτ)) dx. The following result can be shown by unrolling definitions, in order to relate the induced homomorphism densities and random samples of a complexon: Lemma 2. For a complexon W and a simplicial complex F such that V(F) = [n], it holds that tind(F, W) = P {K(n, W) = F} . In graph theory, we are typically concerned with properties of graphs that do not change under the relabeling of nodes. Similarly, in graph limit theory, we typically describe properties of complexons up to measure-preserving transformations of [0, 1]. For complexons, a property of this type holds for homomorphism densities. Let φ : [0, 1] [0, 1] be a measure-preserving transformation, and for a complexon W, define W φ(x1, . . . , xd+1) = W(φ(x1), . . . , φ(xd+1)) for all d 1 and x [0, 1]d+1. The following result proceeds directly from the definition of the (induced) homomorphism density. Lemma 3. Let W be a complexon, and φ : [0, 1] [0, 1] a measure-preserving transformation. Then, for all simplicial complexes F, t(F, W) = t(F, W φ) tind(F, W) = tind(F, W φ). Roddenberry and Segarra 3.3 The Cut Distance Similar to the family of cut-distances for simplicial complexes, we define cut-distances for complexons. Let W1, W2 be complexons, and d 1 some integer. The d-dimensional labeled cut-distance between W1 and W2 is defined as follows: d ,d(W1, W2) = sup S1,...,Sd+1 B[0,1] Q j Sj (W1(x) W2(x)) dx where B[0, 1] denotes the Borel σ-field on the interval [0, 1]. This relates to the cut-norm defined for general measurable functions U : F d 1[0, 1]d+1 R: U ,d = sup S1,...,Sd+1 B[0,1] Q j Sj U(x)dx so that d ,d(W1, W2) = W1 W2 ,d. Alternatively, the cut-norm can be written as U ,d = sup f1,...,fd+1:[0,1] [0,1] [0,1]d+1 U(x)f1(x1) fd+1(xd+1)dx where f1, . . . , fd+1 are taken to be measurable functions from [0, 1] to [0, 1]. Following (Borgs et al., 2008, Eq. 7.2), restricting the sets considered in the d-dimensional labeled cut-distance to be pairwise disjoint only changes the distance by a constant factor depending on d. Lemma 4. Let W1, W2 W be complexons, and let d 1. Put S as the collection of pairwise disjoint sets S1, . . . , Sd+1 B[0, 1]. Then, sup (S1,...,Sd+1) S Q j Sj (W1(x) W2(x))dx 1 (d + 1)d+1 d ,d(W1, W2). The proof follows the presentation of Janson (2013, Lemma E.2). Proof Let A = {Ai}n i=1 be an n-equipartition of [0, 1] (that is, a partition of [0, 1] into n measurable subsets all with measure 1/n). Let I be an element of [d+1]n chosen uniformly at random. For each j [d + 1], define Bj = S m:I(m)=j Am. For any measurable sets S1, . . . , Sd+1, the sets S1 B1, . . . , Sd+1 Bd+1 are pairwise disjoint, so that Q j Sj Bj (W1(x) W2(x))dx sup (S1,...,Sd+1) S Q j Sj (W1(x) W2(x))dx Q j Sj Bj (W1(x) W2(x))dx i1 =... =id+1 [n] W1(x) W2(x) (d + 1)d+1 dx. (4) Limits of Dense Simplicial Complexes One can see that the right-hand side of (4) approaches the quantity 1 (d + 1)d+1 Q j Sj (W1(x) W2(x))dx as n . Combining (3) and (4) concludes the proof. As the labeled d-dimensional cut-distance only considers the value of complexons on [0, 1]d+1, we define a cut-distance that considers all dimensions. Let (αj)j 1 be a nonnegative, summable sequence of real numbers. For complexons W1, W2, put d (W1, W2; (αj)j 1) = X j 1 αjd ,j(W1, W2). One can clearly see that for any complexons W1, W2, d (W1, W2; (αj)j 1) X [0,1]j+1 |W1(x) W2(x)|dx. (5) We extend this definition to finite nonnegative sequences α1, . . . , αd by implicitly assuming that αj = 0 for j > d. The labeled cut-distance implicitly assumes a correspondence between the domains of W1 and W2. Intuitively, this corresponds to coupling the distributions K(n, W1) and K(n, W2) (for any n 1) in such a way that the points x1, . . . , xn [0, 1] are always equal for K(n, W1) and K(n, W2). To decouple the domains, we define the unlabeled cut-distance between W1 and W2 by taking the infimum over measure preserving transformations of the domain. Put, again for a (finite or infinite) nonnegative summable sequence (αj)j 1, δ (W1, W2; (αj)j 1) = inf φ:[0,1] [0,1] d (W1, W φ 2 ; (αj)j 1), where φ ranges over measure-preserving transformations. Similar to (5), we have the inequality δ (W1, W2; (αj)j 1) inf φ:[0,1] [0,1] [0,1]j+1 |W1(x) W φ 2 (x)|dx. The definition of the unlabeled cut-distance for simplicial complexes in terms of the limiting distance under blowups motivates an immediate representation of simplicial complexes as complexons, closely resembling the representation of graphs as graphons via pixel pictures. For a simplicial complex K whose nodes are identified as V(K) = [n], define a complexon WK in the following way. Define the sets Pj = [(j 1)/n, j/n) for j [n]. Then, for each {j1, . . . , jd+1} K, put WK(x1, . . . , xd+1) = 1 for all permutations of (x1, . . . , xd+1) Qd+1 ℓ=1 Pjℓ. Otherwise, put WK(x1, . . . , xd+1) = 0. If instead of a simplicial complex K we have a weighted simplicial complex (H, ω), we put WH(x1, . . . , xd+1) = ω({j1, . . . , jd+1}). Roddenberry and Segarra For a simplicial complex K and a complexon W, we will find it convenient to use the notation d (K, W) and δ (K, W) to refer to d (WK, W) and δ (WK, W), respectively. Moreover, the complexon corresponding to a simplicial complex preserves the cut-distance and all homomorphism densities. The following lemma summarizes this, in a way similar to (Lov asz, 2012, Eq. 7.2 and Lemma 8.9), so we omit the proof here. Lemma 5. For any simplicial complexes F, K and any nonnegative, summable sequence (αj)j 1, t(F, K) = t(F, WK) δ (F, K; (αj)) = δ (WF , WK; (αj)). Moreover, if V(F) = V(K), then, under any identification of V(F) with [n], d (F, K; (αj)) = d (WF , WK; (αj)). Furthermore, if (H, ω) is a weighted simplicial complex such that V(F) = V(H), then d (F, H; (αj)) = d (WF , WH; (αj)). 4. Main Results In this section, we characterize the space of complexons equipped with the cut-distance, and show how the cut-distance relates to the homomorphism densities. In particular, we show that they induce equivalent, compact topologies on the space of complexons. Our treatment closely follows that of Lov asz and Szegedy (2006); Borgs et al. (2008). 4.1 The Counting Lemma We begin by showing how closeness in the cut-distance yields closeness in the sense of homomorphism densities. Lemma 6 (Counting Lemma). Let F be a simplicial complex. Define the sequences αj = |(F )(j)|, βj = |F (j)|, γj = |F (j)| + |(F )(j)| for j 1. Then, for complexons U, W, the following three inequalities hold: |t(F, U) t(F, W)| δ (U , W ; (αj)j 1) |t(F, U) t(F, W)| δ (U, W; (βj)j 1) |tind(F, U) tind(F, W)| δ (U, W; (γj)j 1). This result is analogous to (Lov asz and Szegedy, 2006, Lemma 4.1), with a similar proof as well. Proof We establish the first inequality, as the other two follow from a similar argument. Order the elements of F as {σ1, . . . , σm}, where m = |F |. For x [0, 1]V(F), t [m], define st W (xσs) (U (xσt) W (xσt)) . Limits of Dense Simplicial Complexes One can check that |t(F, U) t(F, W)| = t=1 Xt(x)dx [0,1]V(F )\σt [0,1]σt Xt(x) Y j V(F)\σt dxj For all t [m], it holds that Y st W (xσs) [0, 1], This implies via Lemma 1 that |t(F, U) t(F, W)| t=1 d ,dim σt(U , W ) = d (U , W ; (αj)j 1). Taking the infimum over measure-preserving transformations W φ yields the first inequality, via Lemma 3. The proofs of the second and third inequalities proceed similarly. This result is perhaps not too surprising. Thinking of the cut-metric as describing the global differences between complexons, closeness globally forces closeness locally (that is, in the sense of homomorphism densities). More formally, for any simplicial complex F, the homomorphism density t(F, ) : W [0, 1] is continuous with respect to the cut-distance δ ( , ; (αj)j 1) for any strictly positive, summable sequence (αj)j 1. This can be shown using the usual ϵ-δ definition of continuity, choosing for a given simplicial complex F and ϵ > 0 δ = ϵ dim F maxj |F (j)| mink [dim F] αk . Note then that δ (U, W; (αj)j 1) < δ implies |t(F, U) t(F, W)| < ϵ. Similarly, since tind(F, W) = P {K(n, W) = F} for any simplicial complex F with V(F) = [n], closeness in the cut-metric forces two complexons to yield similar random sampling models, particularly for small n. 4.2 The Sampling Lemma Before proving the inverse of Lemma 6, we establish a concentration result for samples K(n, W): namely, we show that sufficiently large samples drawn from a complexon will be close in the cut-distance. Formally, Lemma 7 (Sampling Lemma). Let W be a complexon. Let n 1 be an integer, and α1, . . . , αd be a finite nonnegative sequence. Then, with probability at least 1 exp ( n/(2 log2 n)), it holds that δ (W , K(n, W); (αj)) 12 2d + 1 p Roddenberry and Segarra We leave the proof to Section A. As a corollary of Lemma 7, it can be shown via the Borel-Cantelli Lemma that every faceted complexon W arises as the limit of a sequence of simplicial complexes. That is, Corollary 8. Let α1, . . . , αd be a finite nonnegative sequence. For a complexon W, the sequence of simplicial complexes (K(n, W))n 1 converges to W in the cut-distance δ ( , ; (αj)) with probability 1. 4.3 The Inverse Counting Lemma We are now ready to state the inverse of Lemma 6. Lemma 9 (Inverse Counting Lemma). Let U, W be two complexons, and suppose that for some d 1, n 2, for all simplicial complexes F on n nodes of dimension d, |t(F, U) t(F, W)| 0.999 2 ((1+n)d+2) Then, for all finite nonnegative sequences α1, . . . , αd, δ (U , W ; (αj)) 24 2d + 2 p We leave the proof to Section B. The bound in Lemma 9 is only useful for log2 n 24 2d, with homomorphism densities that are extremely close. However, it does demonstrate that the cut-distance, a quantity determined by a Lebesgue integral on a continuous domain over all possible Borel sets of certain dimension, can be characterized by a collection of quantities determined by finite objects. Keeping in mind that this bound is generic, holding for any possible pair of complexons, one would expect it to be much better if considered on a tamer family of complexons, for example, those that are stepfunctions on some coarse partition of [0, 1]. Indeed, we have the following result for stepfunctions. Proposition 10. Let U, W be two complexons. Suppose for some integer m 1, U and W are both stepfunctions on respective equipartitions PU, PW , each with at most m steps. Furthermore, suppose that for some d 1, n 2, for all simplicial complexes F on n nodes of dimension d, |t(F, U) t(F, W)| 0.999 2 ((1+n)d+2). Then, for all finite nonnegative sequences α1, . . . , αd, δ (U , W ; (αj)) m(4d + 5) + 4 n We omit the proof, as the argument is a mere simplification of the proofs of Lemmas 7 and 9. By making the stronger assumption that the complexons of interest are stepfunctions on coarse equipartitions of [0, 1], Proposition 10 presents an inverse counting lemma that holds for n md2. Limits of Dense Simplicial Complexes 4.4 Topology of the Space of Complexons Denote the set of all complexons by W, and let (αj)j 1 be a nonnegative, summable sequence of real numbers. If two complexons W1, W2 W only differ on a set of measure zero, then d (W1, W2; (αj)j 1) = 0. Indeed, the labeled cut-distance is only a pseudometric on W. Furthermore, the unlabeled cut-distance δ can take value zero if there exists a measure-preserving bijection φ : [0, 1] [0, 1] such that W1 and W φ 2 only differ on a set of measure zero, and is thus also a pseudometric. The following result states that, topologically speaking, the choice of (αj) in this construction does not matter too much. Lemma 11. Let (αj)j 1 and (βj)j 1 be strictly positive, summable sequences of real numbers. Then, the cut-metrics δ ( , ; (αj)j 1) and δ ( , ; (βj)j 1) are topologically equivalent pseudometrics on W. With Lemmas 6, 9 and 11 in mind, we endow W with a topology determined by any strictly positive, summable sequence (αj)j 1. In particular, we define the canonical topology on W as the topology determined by the pseudometric γ(U, W; (αj)j 1) = δ (U , W ; (αj)j 1), for an arbitrary strictly positive, summable sequence (αj)j 1. Observe that the pseudometric γ behaves similarly to the unlabeled cut-distance δ , except it first applies the faceting map to the arguments, to reflect the fact that when considering homomorphism densities, the faceted complexon is a better descriptor indeed, one can check that the faceting map is a contractive mapping with respect to the cut-distance δ . We also find it useful to define the δ-topology on W as the topology determined by the pseudometric δ , again for some strictly positive, summable sequence. The following result about the canonical topology on W is immediate. Proposition 12. Let W0 W be the set of complexons that arise from simplicial complexes, that is, for every W W0, there is some simplicial complex K such that W = WK. Then, W0 is dense in W in the canonical topology. Our primary result of this section establishes the compactness of W with respect to this topology. Theorem 13 (Compactness). The space W with the canonical topology is compact. We leave the proofs of Lemma 11, Proposition 12, and Theorem 13 to Section C. So far, we have established that the space of complexons with the canonical topology is essentially finite, in that it can be approximated by finitely many (Theorem 13) finite simplicial complexes (Proposition 12) in the cut-metric. This implies, for instance, that the space of complexons W with the pseudometric γ is complete. The canonical topology on W can also be defined based on homomorphism densities. Theorem 14 (Equivalence of Cuts and Homomorphisms). Let (aj)j 1 be a strictly positive, summable sequence of real numbers. Noting that there are countably many simplicial Roddenberry and Segarra complexes (up to isomorphism), let (Fj)j 1 be an enumeration of the set of all isomorphism classes of simplicial complexes. Define a pseudometric ρ on W so that for any U, W W, ρ(U, W; (aj)) = j=1 aj |t(Fj, U) t(Fj, W)|. Then, the topology on W induced by ρ is equal to the canonical topology. Proof We prove that a sequence of complexons W1, W2, . . . is convergent in the canonical topology if and only if it is convergent with respect to the pseudometric ρ. (If). Let ϵ > 0 be given, and suppose Wm W in the canonical topology. Let M be such that P j>M aj < ϵ/2. Then, by Lemma 6, there exists some m0 such that for all j M, m > m0, it holds that |t(Fj, Wm) t(Fj, W)| < ϵ 2M maxk M ak . It follows, then, that for all m > m0, ρ(Wm, W; (aj)j 1) < ϵ. Since ϵ was given arbitrarily, this implies that Wm W in the topology induced by ρ, as desired. (Only if). Let ϵ > 0 be given, and let (αj)j 1 be an arbitrary strictly positive, summable sequence. Suppose Wm W in the topology induced by ρ. Let d be such that P j>d αj < ϵ/2. Put n such that 24 2d + 2 p Noting that there are only finitely many isomorphism classes of simplicial complexes on n nodes of dimension d, let M be such that for every simplicial complex F on n nodes of dimension d, there is some j M such that F = Fj. By assumption, there is some m0 such that for all j M, m > m0 |t(Fj, Wm) t(Fj, W)| < 0.999 2 ((1+n)d+2). By Lemma 9, this implies that δ (W m, W ; (αj)j 1) 24 2d + 2 p j>d αj < ϵ. Since ϵ was given arbitrarily, this implies that Wm W in the canonical topology, as desired. Observe that the pseudometric ρ in Theorem 14 is defined for an arbitrary strictly positive, summable sequence (aj)j 1 and enumeration (Fj)j 1: this indicates that, similar to the δ-topology as described by Lemma 11, the topology induced by ρ is not dependent on the particular sequence (aj)j 1 and enumeration (Fj)j 1. In other words, there is a homtopology on W that is pseudometrizable by any such ρ. Moreover, one can see that a sequence (Wm)m 1 converges in the hom-topology if and only if for all simplicial complexes Limits of Dense Simplicial Complexes F, t(F, Wm) is a convergent sequence. Theorem 14, then, indicates that the canonical topology on W is the topology defined by convergence in homomorphism densities. Indeed, by the definition of the canonical topology on W, two complexons U, W are identified with each other if and only if they are indistinguishable by sampling, which is a property that holds modulo sets of measure zero across the faceted complexons, rather than the original complexons themselves. The identification of complexons by the canonical topology not only ignores differences on sets of measure zero, which have no consequence in sampling and homomorphism densities, but also enforces a consistency structure across different dimensions as suggested by Bobrowski and Krioukov (2022), by considering the faceted version. 5. Discussion and Remarks By Theorem 14, homomorphism densities are sufficient to characterize complexons in the cut-metric. This yields a characterization of limit objects for sequences of simplicial complexes. Indeed, recall that sequence (Kn) for n 1 of simplicial complexes is said to be convergent if for all simplicial complexes F, the sequence (t(F, Kn))n 1 is convergent. Equivalently, by Lemma 5, the sequence (Kn) is convergent if for all simplicial complexes F, the sequence (t(F, WKn)) is convergent. Applying Theorem 14, this is equivalent to convergence of the sequence (WKn) in the canonical topology. Since W is compact in the canonical topology (see Theorem 13), the limit of the sequence (WKn) is itself an element W W. It is in this sense that we say W (or any equivalent complexon in the canonical topology) is the limit of the sequence of simplicial complexes (Kn) as n . In the remainder of the paper, we discuss how complexons relate to some other notions in the literature. 5.1 Limits of partially ordered sets Limits of partially ordered sets (posets) in homomorphism density were considered by Janson (2011), where it was shown that any sequence of partially ordered sets whose homomorphism densities converge have a limit representable by a kernel on an ordered probability space. It was conjectured that all such kernels can be taken to be defined on the ordered probability space [0, 1] with the Lebesgue measure, which was affirmed by Hladk y et al. (2015). In the context of this work, one may wonder if limits of sequences of simplicial complexes could be understood as posets. Indeed, a simplicial complex K carries with it a strict partial order determined by set inclusion, yielding the partially ordered set (K, ). Moreover, since the simplices have a notion of dimension, there is a rank function associated with (K, ), yielding a graded poset (K, , dim). A simplicial homomorphism from a simplicial complex F into K is a poset homomorphism as considered by Janson (2011), but has some additional structure. Namely, a simplicial homomorphism φ : V(F) V(K) is rank-preserving as a poset homomorphism, in that dim(σ) = dim(φ(σ)) for all σ F. General poset homomorphisms as considered by Janson (2011) do not require the rank to be preserved. It is interesting to see how the rankpreservation condition on these poset homomorphisms yields a significantly different limit structure, namely a complexon, as opposed to a kernel on an ordered probability space. Indeed, finite graded posets may be treated as abstract cell complexes, of which abstract Roddenberry and Segarra simplicial complexes are a special type this suggests that the symmetry conditions of complexons may be relaxed in order to yield limit objects for more general combinatorial structures. 5.2 Local Consistent Random Simplicial Complexes We have studied the limits of dense sequences of simplicial complexes via complexons, but this is not the only obvious limiting structure. Another such object for the graph case is a locally consistent random graph model (Lov asz, 2012, Chapter 11.2). Analogous structures exist for simplicial complexes. A random complex model is a sequence (µn)n 1 of probability measures such that, for all n 1, µn is a probability measure on the set of simplicial complexes K such that V(K) = [n], and if K and K are isomorphic, then µn(K) = µn(K ). The random complex model (µn) is said to be consistent if for any simplicial complex K on nodes [n 1], we have µn 1(K) = µn F : V(F) = [n], F = K , where F = K indicates that the removal of the node n from F yields the simplicial complex K. Furthermore, if for all n > 1, and disjoint subsets S, T [n], the random simplicial complexes determined by taking the induced subcomplex on S, T of a simplicial complex sampled following µn are independent, we say that (µn) is local. Indeed, any simplicial complex K yields a random complex model. Define, for n 1, the probability measure µK,n such that for any simplicial complex F with V(F) = [n] µK,n(F) = tind(F, K). The following result, following (Lov asz, 2012, Theorem 11.7), links convergent sequences of simplicial complexes to local consistent random complex models. Theorem 15. Let a convergent sequence of simplicial complexes K1, K2, . . . with v(Km) as m be given. Define, for all n 1 and all simplicial complexes F with V(F) = [n], the probability measure µn(F) = lim m µKm,n(F), noting that the sequence K1, K2, . . . being convergent implies the limit exists. Then, the sequence (µn)n 1 forms a local consistent random complex model. Conversely, every local consistent random complex model arises this way. The proof follows that of (Lov asz, 2012, Theorem 11.7) almost exactly, so we omit it here. Furthermore, an initial connection with complexons is stated in the following result (compared to Lemma 2). Proposition 16. Let W be a complexon. Define, for all n 1, the probability measure for simplicial complexes F on nodes [n] µn(F) = P {K(n, W) = F} = tind(F, W). Then, the sequence (µn)n 1 forms a local consistent random complex model. Limits of Dense Simplicial Complexes Viewing the sampled simplicial complexes from a complexon as being equivalently sampled from a local consistent random complex model is a helpful perspective for establishing convergence of a complexon s samples in the cut-metric. In particular, Proposition 17. Let (µn)n 1 be a local consistent random complex model. For each n 1, independently draw a simplicial complex Kn on nodes [n] according to the distribution µn. Then, with probability 1, (i) The sequence (Kn) is convergent (ii) For all n, limm µKm,n = µn. In particular, if (µn = K(n, W))n 1 for some complexon W, we have that a sequence of increasingly large simplicial complexes sampled from a complexon converges to W with probability 1. We also omit the proof, as it resembles that of (Lov asz, 2012, Lemma 11.8) with very little modification. Propositions 16 and 17 together indicate that for a given complexon W and a sequence of increasingly large simplicial complexes (Kn K(n, W))n 1, Kn W with probability 1. That is to say, we can (almost surely) generate a sequence converging to an arbitrary complexon by simply taking a sequence of large simplicial complexes sampled from it. Indeed, this recovers Corollary 8. 5.2.1 Homogeneous random complexes Some simple instances of local consistent random complex models have been described in the literature. We go through the dense variants of these models, specifying both the corresponding complexon and the random complex model. Linial and Meshulam (2006); Meshulam and Wallach (2009) considered the homological connectivity of random simplicial complexes generated in the following way. For some real number p [0, 1] and integers d > 1, n 1, a random simplicial complex Kn,p is formed on the node set [n] by including all d 1 simplices on [n], and including each element of [n] d+1 independently with probability p. This model is known as the Linial-Meshulam model. One can define a complexon whose samples are distributed in this way. Define W W so that for all x [0, 1]c+1 for c < d, we have W(x) = 1, and for all x [0, 1]d+1 we have W(x) = p. It is clear to see that for all n, the random simplicial complex K(n, W) follows the Linial-Meshulam model. If the Linial-Meshulam assumes a fully-connected (d 1)-skeleton and fills in d-simplices at random, the random flag, or clique, complex (Kahle, 2009) does the opposite. The flag complex, again for some p [0, 1], yields a random complex on nodes [n] by beginning with an Erd os-R enyi random graph Gn with edge probability p, and then taking the maximal simplicial complex whose 1-skeleton is equal to Gn. This can also be expressed with an appropriately constructed complexon. Define W W so that for all x [0, 1]2, W(x) = p, and otherwise W(x) = 1. Then, the random simplicial complex K(n, W) is a random flag complex. As a generalization of both of these, Costa and Farber (2016) considers random simplicial complexes parameterized by a sequence of coefficients p1, p2, p3, . . . contained in [0, 1]. The multiparameter, or Costa-Farber, random simplicial complex is distributed according to Roddenberry and Segarra K(n, W), where W is such that for all x [0, 1]d+1, W(x) = pd. Further work showed that in the dense, or medial, regime of p pj P for some 0 < p P < 1 for all j, the dth homology group of such a random complex in a given dimension is nontrivial for n in a logarithmically-sized interval, being trivial elsewhere with high probability (Farber and Mead, 2020). That is to say, as n , the homology group of a fixed dimension for complexes sampled according to K(n, W) vanishes with probability tending to one. For some complexons, particularly those bounded away from zero in correspondence with the medial regime studied by Farber and Mead (2020), it is seemingly difficult then to speak of the homological structure of a complexon in general. As an example, consider a simplicial complex K on n nodes with a nonzero first Betti number. Then, consider some simplicial complex on n n nodes sampled from WK, which will likely have a much larger first Betti number. Thus, the Betti numbers are in some sense destroyed when going from a simplicial complex to its corresponding complexon. These three models are all instances of homogeneous random simplicial complexes, where the probability of a simplex existing is dependent only on its dimension. For a complexon W, this corresponds to W being constant on [0, 1]d+1 for each d 1. 5.3 Other Random Simplicial Complex Models As described in Section 3.1, a random simplicial complex sampled from a complexon following K(n, W) essentially picks nodes [n] with latent positions x1, . . . , xn uniformly distributed in the unit interval. That is to say, the random complex structure is dependent on each node s position in the space [0, 1]. In the ensuing construction, no topological properties of [0, 1] were leveraged, suggesting that the definition of a complexon can be extended to nodes sampled from any Borel probability space. Let (Ω, F, µ) be a Borel probability space, which we denote by Ωfor short. A complexon on Ωis a measurable function d 1 Ωd+1 [0, 1]. All of the basic constructions so far generalize immediately to complexons on Ω: sampling, homomorphism densities, cut-metrics, and so on. In particular, if (Ω, F, µ) is a Borel probability space, then a complexon W on Ωcan be transformed into a complexon W on [0, 1] in a way that preserves all homomorphism densities. That is to say, complexons on [0, 1] are sufficient for describing complexons on Borel probability spaces in general, which arise, for instance, when Ωis a metric space endowed with the Borel σ-field. For example, it is well-known that every probability measure on the Borel σ-field of Rd yields a Borel probability space. We outline the details of this in Section D, with reference to (Lov asz, 2012, Section 13.1). Defining complexons over alternative probability spaces is useful for describing random simplicial complexes tied to a natural underlying space. We consider here dense random geometric ˇCech complexes (Bobrowski and Krioukov, 2022). Let µ be a probability measure on Rd for some integer d 1, and let X denote the support of µ. For some ϵ > 0, define a complexon Wϵ on X so that, for any points x1, . . . , xd+1 X, Wϵ(x1, . . . , xd+1) = 1( j=1 Bϵ/2(xj) = ), Limits of Dense Simplicial Complexes (a) (b) (c) Figure 2: (a) ˇCech complex formed from points uniformly sampled from a bouquet of two circles in R2. (b) The complexon Wϵ as defined on [0, 1]2 (dark regions indicate a value of 1). (c) The complexon Wϵ as defined on [0, 1]3 (dark regions indicate a value of 1). where Bϵ/2(xj) indicates the open ball in Rd of radius ϵ/2 centered about xj. That is to say, Wϵ(x1, . . . , xd+1) = 1 if and only if the collection of points {x1, . . . , xd+1} has diameter less than ϵ. Going back to our definition of the random simplicial complex K(n, Wϵ), let Xn = {x1, . . . , xn} be such that x1, . . . , xn are i.i.d. samples from the probability distribution µ. Form a simplicial complex K such that V(K) = [n]. Inductively for d 1 until termination, for each d + 1-subset σ [n] such that every strict subset of σ is contained in K, include σ in K with probability Wϵ(xσ). The random simplicial complex K formed in this way is, much like before, said to be sampled from the distribution K(n, Wϵ). Moreover, under our particular definition of Wϵ, it is clear that the resulting simplicial complex K is the ˇCech complex ˇCϵ(Xn, Rd). In this way, K(n, Wϵ) is the distribution of ˇCech complexes with fixed radius ϵ formed by sampling n i.i.d. points in X from the probability distribution µ. Revisiting our example of the ˇCech complex formed from the bouquet of two circles in Section 1 (see Figure 1), we can now couch our understanding of convergent subsampled ˇCech complexes in the language of complexons. As before, let X be the bouquet of two unit circles in R2. For any n 1, let Xn be a collection of n i.i.d. samples from the uniform distribution on X. An instance of the random ˇCech complex ˇCϵ(Xn, R2) is pictured in Figure 2 (a). We can equivalently describe the random ˇCech complex as a complexon. For the purpose of illustration, parameterize the space X as a curve c : [0, 1] R2, defined as ( (cos(4πt) 1, sin(4πt)) t < 1/2 (1 cos(4πt), sin(4πt)) t 1/2. The uniform measure µ on X is merely the pushforward measure of the uniform measure on [0, 1] via the map c. Moreover, c is injective almost everywhere, so we can represent the complexon modeling ˇCϵ(X, R2) as a standard complexon on [0, 1], up to a set of measure Roddenberry and Segarra zero. The complexon Wϵ as defined on [0, 1]2 and [0, 1]3 is pictured in Figure 2 (b,c). Indeed, Wϵ can be thought of as the indicator function for simplices in the ˇCech complex ˇCϵ(X, R2). 5.4 Convergence of Nonuniform Hypergraphs An open problem in the theory of graph homomorphisms and graph limits is the characterization of nonuniform hypergraphs with unbounded edges (Lov asz, 2008). As noted in Section 3.1, the distribution K(n, W) can be thought of in terms of lower sampling following (Farber et al., 2022), where a random hypergraph H is drawn according to W, and then taking the largest simplicial complex contained in that hypergraph. On the other hand, the upper sampling approach (Farber et al., 2022) can be reproduced by drawing a random hypergraph H according to W , and then taking the smallest simplicial complex that contains that hypergraph. Both of these approaches yield the same distribution K(n, W). The viewpoint of simplicial complexes formed from the upper or lower bounds of hypergraphs sheds some light on limits of dense, nonuniform hypergraphs. For a hypergraph H, let H be the largest simplicial complex contained in H, and H the smallest simplicial complex containing H. We say that a sequence of hypergraphs (Hn)n 1 is upper-lowerconvergent (or, for brevity, UL-convergent) if the following conditions hold simultaneously for some complexon W: (i) ( Hn )n 1 is convergent with limiting complexon W (ii) ( Hn )n 1 is convergent with limiting complexon W . If this is the case, we say that Hn W as n . Since for all n, Hn W , by Lemmas 33 and 34, it holds that W W . Based on Lemma 7, we know that large simplicial complexes sampled from a complexon are close to the faceted complexon in the cut-metric, with high probability. Based on the relationship between faceted complexons and upper/lower sampling, we establish a similar approach to show that sequences of hypergraphs sampled from faceted complexons are ULconvergent. Let W be a complexon, and n 1 an integer. Similar to the definition of K K(n, W), we define a random hypergraph H as follows. First, take n i.i.d. samples x1, . . . , xn uniformly at random in the unit interval [0, 1]. Let H be a hypergraph with V(H) = [n]. Then, for each σ 2[n], include σ H with probability W(xσ). We call the distribution from which H is drawn H(n, W). Lemma 18. For a complexon W and integer n 1, it holds that K(n, W) D= H(n, W) D= H(n, W ) , where D= denotes equality in distribution. We now consider the relationship between UL-convergence, simplicial homomorphisms, and hypergraph homomorphisms. For a hypergraph H, define a complexon WH in the same way as one does for simplicial complexes, that is, by taking indicator functions of tuples Limits of Dense Simplicial Complexes defined on a partition of [0, 1]. From this, we inherit all the usual notions of homomorphisms and cut-metrics for hypergraphs. For a hypergraph H and a complexon W, define t(H, W) = Z σ H W(xσ) Y j V(H) dxj. With the notion of a homomorphism for hypergraphs, it is natural to ask whether the convergence of a sequence of hypergraphs in all hypergraph homomorphisms implies ULconvergence. Indeed, convergence of all hypergraph homomorphism densities is enough to establish convergence of the sequence ( Hn ), but not to establish convergence of ( Hn ). Counterexample 1. For some p [0, 1] define Wp W so that W(x) = p for x [0, 1]3, and W(x) = 0 otherwise. Define a sequence of hypergraphs (Hn,p)n 1 where each Hn,p is a random hypergraph distributed according to H(n, Wp). One can check that with probability 1, for any hypergraph G, t(G, Hn,p) is convergent, so that (Hn,p) is a convergent sequence in the sense of homomorphisms. Setting p = 1, it is obvious that (Hn,1) is not UL-convergent, since Hn,1 is the empty simplicial complex on n nodes, and Hn,1 is the complete 2-dimensional simplicial complex on n nodes. Then, one can see that Hn,1 W, where W(x) = 1 for x [0, 1]d+1 when d {1, 2}, and W(x) = 0 otherwise. On the other hand, Hn,1 0. It is not true that W = 0 in this case, so UL-convergence does not hold. In the above counterexample, the hypergraphs (Hn,1) were drawn from a complexon W1 W \ W , that is, non-faceted complexons. In light of the UL-limit of a convergent hypergraph sequence necessarily being an element of W , this may imply some utility of the extra structure imposed by W . This is noted in the following proposition. Proposition 19. Let W W be given, and take a sequence (Hn)n 1 of random hypergraphs Hn H(n, W). Then, (Hn) is UL-convergent with limit W with probability 1. Proof For the sequence (Hn), note that for each n, Hn is distributed according to K(n, W) by Lemma 18. By Corollary 8, this implies that ( Hn ) W with probability 1. Similarly, since W W , there is some U W such that W = U . Then, for each n, Hn is distributed according to K(n, U), again by Lemma 18. Thus, ( Hn ) W with probability 1. Proposition 19 and Counterexample 1 together indicate that considering the upper and lower simplicial complexes of a sequence of hypergraphs may help establish a theory of nonuniform hypergraphs, but fails in cases where the limiting object is not a faceted complexon. Acknowledgments We would like to acknowledge support for this project from the National Science Foundation (NSF grant CCF-2008555). Roddenberry and Segarra Appendix A. Proof of Lemma 7 A.1 Equipartitioning Let P be a partition of [0, 1] into finitely many measurable sets. For a complexon W, define the projection WP as follows. For {pj Pj : j [d + 1]}, where each Pj is an element of P, put WP(p1, . . . , pd+1) = Z Qd+1 j=1 Pj W(x1, . . . , xd+1)dx. The complexon WP is an instance of a stepfunction on P, that is, WP(x1, . . . , xd+1) is only dependent on which sets in P the points x1, . . . , xd+1 are contained in. We establish a result stating that any complexon can be approximated in the cut-distance by a piecewise constant complexon with respect to an equipartition of the unit interval. The following is a weak regularity lemma, stating that any complexon can be approximated by a stepfunction over an appropriate partition. It is a trivial extension of (Frieze and Kannan, 1999, Theorem 12). Theorem 20. Let W be a complexon. For given integers n > 1, d 1, there is a partition P of [0, 1] into n measurable sets such that there exists a stepfunction U on P such that for all 1 j d, d + 1 log2 n. Following (Lov asz, 2012), we strengthen Theorem 20 to hold for equipartitions, where an equipartition of a measure space is a partition such that each set has equal measure. Lemma 21 (Equipartitioning). Let W be a totally symmetric, measurable function. For given integers n > 1, d 1, there exists an n-equipartition P of [0, 1] such that for all finite, nonnegative sequences of real numbers α1, . . . , αd, d (W, WP; (αj)) Pd j=1 αj 4 Proof Put m = n1/4 . Let Q = {Qj}m j=1 be an m-partition as guaranteed by Theorem 20, with corresponding stepfunction U. Partition each class Qj into sets of measure 1/n, with at most one exceptional set of measure less than 1/n for each class. Notice that the measure of the union of the exceptional sets is bounded by m/n. Take the union of the exceptional sets, and partition it into sets of measure 1/n, yielding an equipartition P. Take the common refinement R = P Q. Note that P and R only differ over those classes in P formed from exceptional sets. Thus, for all j 1, the restrictions of WP and WR to [0, 1]j+1 only differ on a set of measure at most 2j(m/n). This implies that d ,j(WR, WP) 2j m Via an easily shown variation of (Lov asz, 2012, Lemma 9.12), we see that d ,j(W, WR) 2d ,j(W, U), Limits of Dense Simplicial Complexes which implies via the triangle inequality that d ,j(W, WP) 2d ,j(W, U) + 2j m One can then show via straightforward calculation that d ,j(W, WP) 4 d + 1 + 2j p This yields the bound d(W, WP; (αj)) = j=1 αjd ,j(W, WP) Pd j=1 αj 4 as desired. A.2 Sample Concentration We show that nice parameters of simplicial complexes sampled from a complexon concentrate around their expectation. Definition 22 (Reasonably Smooth Parameter). Let f be a function mapping simplicial complexes to the real numbers. Suppose that for any simplicial complexes F, G such that V(F) = V(G) and whose structure varies only on faces incident to a single node, that is, there exists some v V(F) such that for all σ F, we have |f(F) f(G)| 1. Under these conditions, we say that f is a reasonably smooth parameter.6 We will need the following result (Lov asz, 2012, Corollary A.15), which is a corollary of Azuma s inequality. Proposition 23. Let (Ω, A, π) be a probability space. Let f : Ωn R be a measurable function such that |f(x) f(y)| 1 whenever x and y differ in one coordinate only. Let X be a random point in Ωn distributed according to the product measure. Then, for any ϵ > 0, P {f(X) E[f(X)] + ϵn} exp ϵ2n/2 . 6. F G denotes the symmetric difference between sets. Roddenberry and Segarra Proposition 23 gives us a generic, one-sided tail bound for well-behaved functions on sequences of i.i.d. random variables. We now construct the random simplicial complex K(n, W) in a way that fits this schema. Let W be a complexon, and let n 1 be given. Let Ω= Qn 1 d=0[0, 1]([n]d) with the uniform probability measure. Take i.i.d. samples xj = (αj,0, . . . , αj,n 1) from Ωfor j [n]. We construct a simplicial complex F from the points {xj}n j=1 in the following way. For all σ [n], put σ F if and only if the following condition holds. For any σ σ, denote the ordered elements of σ by 1 i1 . . . id+1 n. Then, we require for all such σ αi1,d(i2, . . . , id+1) W(αi1,0, . . . , αid+1,0). Under this model, a random simplicial complex F distributed according to K(n, W) is taken as a function of the i.i.d. sequence xj and the complexon W. Moreover, changing any coordinate xj in the sequence only affects simplices that contain j. We can now state the main result of this section. Theorem 24 (Sample Concentration). Let f be a reasonably smooth parameter. Then, for a complexon W, an integer n 1, and real number t 0, we have P n f(K(n, W)) E[f(K(n, W))] + 2tn o exp ( t) . Proof Follows from Proposition 23, treating f(K(n, W)) as the composition of functions of the sequence of i.i.d. random variables {xj Ω}n j=1. A.3 Concentration of Norm The following lemma will be the most challenging part of proving Lemma 7. Lemma 25 (Norm Concentration). Let U : [0, 1]d+1 [ 1, 1], and let X be a random ordered n-subset of [0, 1]. Let U[X] be the matrix in [ 1, 1]nd+1 determined by evaluating U on Xd+1. Then with probability at least 1 4 exp( n1/(d+1)/2), 3r(n, d + 1) U[X] ,d U ,d 5(d + 1)2 r(n, d + 1) = 1 P(n, d + 1) We begin with the following result, which is a special case of (Alon et al., 2003, Lemma 3). Lemma 26. Let B : [n]d+1 [ 1, 1]. Let S1, . . . , Sd+1 [n], and for each j [d + 1], let Qj be a random subset of [n][d+1]\{j} with cardinality q. Then, B(S1, . . . , Sd+1) EQj S1, . . . , Sj 1, (Qj Y i =j Si)+, Sj+1, . . . , Sd+1 Limits of Dense Simplicial Complexes where B(S1, . . . , Sd+1) = X vd+1 Sd+1 B(v1, . . . , vd+1), and (Qj Q i =j Si)+ is the collection of elements v [n] such that, q Qj Q i =j Si B(q1, . . . , qj 1, v, qj+1, . . . , qd+1) > 0. Lemma 26 can be iterated over all dimensions of the matrix B in order to yield an expectational bound on a variation of the cut-norm for a matrix. We find it useful to consider the one-sided cut-norm, defined as follows. Definition 27. Let B : [n]d+1 R and U : [0, 1]d+1 R. Define B + ,d = max S1,...,Sd+1 [n] B(S1, . . . , Sd+1) U + ,d = sup S1,...,Sd+1 B[0,1] Q j Sj U(x)dx. Observing that B ,d = max{ B + ,d; B + ,d}, our goal is to find bounds for U[X] + ,d U + ,d and apply them twice, yielding the desired result. Lemma 28. Let B : [n]d+1 [ 1, 1]. Let {Qj}d+1 j=1 each be random subsets of [n][d+1]\{j} with cardinality q. Then, B + ,d 1 nd+1 EQ1,...,Qd+1 max Rj Qj B(R+ 1 , . . . , R+ d+1) + d + 1 q . Proof Fix S1, . . . , Sd+1 [n]. Applying Lemma 26 repeatedly over each coordinate j [d + 1] yields B(S1, . . . , Sd+1) EQ1,...,Qd+1 max Rj Qj B(R+ 1 , . . . , R+ d+1) + (d + 1)nd+1 Taking the supremum of the left-hand side over all possible sets S1, . . . , Sd+1 and normalizing by nd+1 yields the desired bound. The following lemma immediately precedes our desired result. Lemma 29. Let U : [0, 1]d+1 [ 1, 1], and let X be a random ordered n-subset of [0, 1]. Then with probability at least 1 2 exp( n1/(d+1)/2), 3r(n, d + 1) U[X] + ,d U + ,d 5(d + 1)2 Roddenberry and Segarra Proof For a random n-subset X [0, 1], let B = U[X]. (Lower bound) For any measurable subsets S1, . . . , Sd+1 [0, 1] we have B + ,d 1 nd+1 U(S1 X, . . . , Sd+1 X), where U(S1 X, . . . , Sd+1 X) is used as shorthand for U(S1 X, . . . , Sd+1 X) = X xd+1 Sd+1 X U(x1, . . . , xd+1). On average, we see that EX h B + ,d i 1 nd+1 EX [U(S1 X, . . . , Sd+1 X)] = P(n, d + 1) Q j Sj U(x)dx + r(n, d + 1) g(U; S1, . . . , Sd+1), where g is some function taking values with magnitude at most 1. Thus, since the sets S1, . . . , Sd+1 were given arbitrarily, we can choose them to yield the following inequality. EX h B + ,d i Z Q j Sj U(x)dx 2r(n, d + 1) U + ,d 2r(n, d + 1). Applying Proposition 23 yields B + ,d U + ,d 3r(n, d + 1) with probability at least 1 exp( r2(n, d + 1)/2). (Upper bound) Let Q1, . . . , Qd+1 be, for each j, random q-subsets of [n][d+1]\{j}. We will bound the expectation of B + ,d by first fixing some parameters. Fix the sets Rj Qj [n]d, and define Q = S j{Qj,1, . . . , Qj,d}, so that Q [n]. Fix those values of Xi for which i Q. Define for each j [d + 1] the set y [0, 1] : X (i1,...,id) Rj U(Xi1, . . . , Xij 1, y, Xij, . . . , Xid) > 0 Let X = {Xi : i [n]\Q}. For every collection of indices {ij [n]\Q}d+1 j=1, the contribution of U(Xi1, . . . , Xid+1) to EX [B(R+ 1 , . . . , R+ d+1)] is at most Z Q j Yj U(y1, . . . , yd+1)dy U + ,d. There are fewer than nd+1 such choices of indices. The remaining terms where ij Q for at least one j [d + 1] contribute at most (2n|Q|)d (2(d + 1)nq)d in absolute value, yielding the bound EX B(R+ 1 , . . . , R+ d+1) nd+1 U + ,d + 2d ((d + 1)nq)d . Limits of Dense Simplicial Complexes Note that B(R+ 1 , . . . , R+ d+1) is a function of the random variables X . Moreover, B(R+ 1 , . . . , R+ d+1) varies by at most 4nd when a single coordinate of X is changed. Thus, by Proposition 23, we have for a given ϵ > 0, with probability at least 1 exp( ϵ2n/2), B(R+ 1 , . . . , R+ d+1) EX B(R+ 1 , . . . , R+ d+1) + 4ϵnd+1. The number of possible choices of sets R1, . . . , Rd+1 is at most 2q(d+1), so that the above bound holds for all {Rj Qj}d+1 j=1 with probability at least 1 2q(d+1) exp( ϵ2n/2). This property is preserved when taking the expectation over Q1, . . . , Qd+1, so that, by Lemma 28, B + ,d 1 nd+1 EQ1,...,Qd+1 max Rj Qj B(R+ 1 , . . . , R+ d+1) + d + 1 q U + ,d + (2(d + 1)q)d n + 4ϵ + d + 1 q with probability at least 1 2q(d+1) exp ϵ2n/2 . Put q = n1/(d+1) /(2(d+1)) and ϵ = p 2(d + 1)/q, noting that q 1 for n large enough to make the desired bound nontrivial. This yields the bound B + ,d U + ,d 5(d + 1)2 with probability at least 1 exp( n1/(d+1)/2). (Conclusion of Proof) Combining the lower and upper bounds, we see that 3r(n, d + 1) B + ,d U + ,d 5(d + 1)2 with probability at least 1 2 exp( n1/(d+1)/2), as desired. Proof [Proof of Lemma 25] Apply Lemma 29 to both U and U. Since U ,d = max{ U + ,d, U + ,d}, this yields the desired tail bound. A.4 Sampling Lemma With these results established, we now prove Lemma 7. We will first need the following result. Lemma 30. Let (H, ω) be a weighted simplicial complex on n nodes. Let K(H) be the random simplicial complex drawn from H. For every p [0, 1], we have E [d ,d (K(H), H )] 3(d + 1)p + 1 Proof Let ϵ > 0 be given. For {ij [n]}d+1 j=1, define the random variable Xi = 1(i K(H)). Let S1, . . . , Sd+1 be pairwise disjoint subsets of [n], and let AK,d and AH ,d be the (weighted) Roddenberry and Segarra d-dimensional adjacency matrices associated to K(H) and H , respectively. We call such a collection of sets bad if |(AK,d AH ,d)(S1, . . . , Sd+1)| > ϵ n d + 1 Note that this is only possible if Q j Sj > ϵ(n/(d+1))d+1. Then, by the Chernoff-Hoeffding inequality, i Q j Sj (Xi E[Xi]) > ϵ n d + 1 There are (d + 2)n such collections of disjoint subsets, so the probability of a bad collection existing is bounded by 2 (d + 2)n exp By Lemmas 4 and 5, the nonexistence of bad subsets implies d ,d(K(H), H ) ϵ. Therefore, we have E [d ,d(K(H), H )] ϵ + 2 (d + 2)n exp It is sufficient to consider the case when np > 3(d + 1)p + 1, otherwise the bound is trivial. In particular, taking ϵ = 3 d+1 n p yields the desired bound. Proof [Proof of Lemma 7] Let integers d 1, n > 1 be given. We bound the expected cut-distance for any finite, nonnegative sequence α1, . . . , αd as follows. For some partition P of [0, 1] and a random n-subset S [0, 1]: E [δ (W , K(W[S]); (αj))] d (W , W P; (αj)) (6a) + E [δ (W P, W P[S]; (αj))] (6b) + E [d (W P[S], W [S]; (αj))] (6c) + E [d (W [S], K(W[S]); (αj))] , (6d) by the triangle inequality. We now bound each of the above summands individually. Bound (6a): Let m = n1/4 . By Lemma 21, there is an m-equipartition P = {Pj}m j=1 of [0, 1] such that d (W , W P; (αj)) 2 Pd j=1 αj 4 for all finite, nonnegative sequences α1, . . . , αd, where W P denotes the projection of W to a stepfunction on P (as opposed to the faceted complexon corresponding to WP). Bound (6b): Let H = W P[S], and denote the corresponding complexon by W H. Following the proof of (Lov asz, 2012, Lemma 10.16), observe that both W H and W P are stepfunctions Limits of Dense Simplicial Complexes on m-partitions of [0, 1], with the same function values on corresponding steps. Each step of W P has measure 1/m, while each step of W H has measure |Pi S|/n = 1/m + ri for i [m], where ri denotes some residual. It is easy to see that δ (W P, W H; (αj)) j=1 αj(j + 1). By the estimate in the proof of (Lov asz, 2012, Lemma 10.16), we have E [δ (W P, W H; (αj))] E j=1 αj(j + 1) < Pd j=1 αj(j + 1) Bound (6c): By Lemma 25, we have |d ,d(W [S], W P[S]) d ,d(W , W P)| 5(d + 1)2 with probability at least 1 8 exp( n1/(d+1)/2). This implies that E [|d ,d(W [S], W P[S]) d ,d(W , W P)|] 5(d + 1)2 + 5 Applying the triangle inequality yields E [d ,d(W [S], W P[S])] E [|d ,d(W [S], W P[S]) d ,d(W , W P)|] + d ,d(W , W P) 5(d + 1)2 + 5 n1/(d+1) + 2d + 4 where the final inequality holds by assuming that n is large enough so that the bound in Lemma 7 is nontrivial. By linearity of expectation, this yields E [d (W [S], W P[S]; (αj))] Pd j=1 αj 2j+2 Bound (6d): By Lemma 30, we have E [d ,d(W [S], K(W[S]))] 3(d + 1)3/8 + 1 which implies by linearity of expectation E [d (W [S], K(W[S]); (αj))] Pd j=1 αj 3(j + 1)3/8 + 1 Roddenberry and Segarra Bounding expected cut-distance: Combining the respective bounds of (6a) to (6d) yields the following: E [δ (W , K(W[S]); (αj))] d + 1 + 6 2j log2 n + j + 2 + 3(j + 1)3/8 Since Pd j=1 αj is always a bound for δ(W , K(W[S]); (αj)), it is sufficient to consider n sufficiently large so that the bound in Lemma 7 is nontrivial. In this regime, we have E [δ (W , K(W[S]); (αj))] 12 2d Observing that f(F) = v(F) δ(F, W ; (αj))/ Pd j=1 αj is a reasonably smooth parameter for any (nonzero) finite, nonnegative sequences α1, . . . , αd, Theorem 24 implies δ (W , K(W[S]); (αj)) 12 2d + 1 p with probability at least 1 exp( n/(2 log2 n)). Since K(W[S]) is identically distributed to K(n, W), this yields the desired bound. Appendix B. Proof of Lemma 9 Given our interest in cut-distances that only consider the first d dimensions, we denote by Kd(n, W) the d-skeleton of the random simplicial complex K(n, W). Equivalently, if we define Wd as the complexon that is equal to W in all dimensions less than or equal to d, and 0 otherwise, Kd(n, W) is equal to K(n, Wd) in distribution. We state a weaker version of Lemma 31 in terms of total variation distances first, which we then strengthen to be in terms of homomorphism densities. Lemma 31. Let U, W be two complexons, and suppose that for some d 1, n 2, we have dvar(Kd(n, U), Kd(n, W)) < 1 2 exp n 2 log2 n where dvar denotes the total variation distance between probability distributions. Then, for all finite nonnegative sequences α1, . . . , αd, δ (U , W ; (αj)) 24 2d + 2 p Proof We prove this bound via the probabilistic method. By assumption, we can couple Kd(n, U) and Kd(n, W) so that with probability at least 2 exp( n/(2 log2 n)), Kd(n, U) = Kd(n, W). (7) Limits of Dense Simplicial Complexes By Lemma 7, we have with probability at least 1 exp( n/(2 log2 n)), δ (U , Kd(n, U); (αj)) 12 2d + 1 p j=1 αj. (8) Similarly, with probability at least 1 exp( n/(2 log2 n)), δ (W , Kd(n, W); (αj)) 12 2d + 1 p j=1 αj. (9) With positive probability, then, (7) to (9) hold simultaneously, which implies that δ (U , W ; (αj)) δ (U , Kd(n, U); (αj)) + δ (Kd(n, U), Kd(n, W); (αj)) + δ (Kd(n, W), W ; (αj)) 24 2d + 2 p as desired. The inverse counting lemma follows, by showing that closeness in homomorphism densities implies the condition of Lemma 31. Proof [Proof of Lemma 9] For convenience, put f(n, d) = 0.999 2 ((1+n)d+2). Let F be a finite, simplicial complex on n nodes of dimension d. By the inclusion-exclusion principle, the induced homomorphism density can be written as tind(F, Wd) = X G F :dim G d ( 1)|G| t(F G, W), with a similar expression holding for the complexon U. Since each F G yields a simplicial complex of dimension d, we have |t(F G, U) t(F G, W)| f(n, d) for all such F, G, by assumption. Observe that for any simplicial complex F on n nodes of dimension d, the number of antifacets of F of dimension at most d is bounded by n d+1 . This implies that |tind(F, Ud) tind(F, Wd)| 2( n d+1) f(n, d). Then, by Lemma 2, for any such F, |P {Kd(n, U) = F} P {Kd(n, W) = F}| 2( n d+1) f(n, d), keeping in mind that there exists at least one F such that strict inequality holds. Therefore, assuming n is sufficiently large so that the bound on δ (U , W ; (αj)) is nontrivial (that is, Roddenberry and Segarra less than P j αj), summing over all simplicial complexes F on [n] of dimension d yields the bound dvar(Kd(n, U), Kd(n, W)) = X F |P {K(n, U) = F} P {K(n, W) = F}| < 2 Pd+1 j=1 (n j)2( n d+1) f(n, d) 1 2 exp n 2 log2 n The lemma follows from Lemma 31. Appendix C. Proofs from Section 4.4 Proof [Proof of Lemma 11] Let ϵ > 0 and U W be given. Let Bϵ(U; (αj)) be the open ball centered at U of radius ϵ with respect to the pseudometric δ ( , ; (αj)). Similarly, let Bϵ(U; (βj)) be defined in the same way using the sequence (βj) to define the pseudometric. By assumption, there exists an M such that X j>M αj < ϵ/2 j>M βj < ϵ/2. Put ϵ = ϵ minj M βj 2M maxj M αj , noting that ϵ > 0, since both sequences are strictly positive. Let W Bϵ (U; (βj)) be given. Then, there exists a measure-preserving bijection φ : [0, 1] [0, 1] such that d (U, W φ; (βj)j 1) < ϵ . This implies that for all j M, d ,j(U, W φ) ϵ 2Mαj . One can then check that this implies d (U, W φ; (αj)j 1) < ϵ, so that Bϵ (U; (βj)) Bϵ(U; (αj)). Applying the same argument symmetrically and for arbitrary values of ϵ yields the fact that the topologies on W induced by both metrics are equal, as desired. We use a similar technique to prove Proposition 12. Proof [Proof of Proposition 12] We show that every complexon W W arises as the limit of a sequence in W0 in the canonical topology. By Lemma 11, we can choose an arbitrary strictly positive, summable sequence (αj)j 1 with which to define the pseudometric δ ( , ; (αj)). Let ϵ > 0 be given, and consider the sequence of random simplicial Limits of Dense Simplicial Complexes complexes (K(n, W))n 1. Let d be such that P j>d αj < ϵ/2. By Corollary 8, the sequence K(n, W) converges to W with respect to the metric δ ( , ; (αj)j d) with probability 1. Therefore, there is some m0 such that for all m > m0, it holds (with probability 1) that δ (K(n, W), W ; (αj)j d) < ϵ/2. Thus, we also have for all m > m0 that δ (K(n, W), W ; (αj)j 1) < ϵ. Observe that WK(n,W) = W K(n,W). Since ϵ was given arbitrarily, this establishes the convergence of K(n, W) to W in the canonical topology. Of course, by Lemma 5, each K(n, W) can be equivalently represented by an element of W0, establishing the desired result. Before proceeding with the proof of Theorem 13, we state the following useful result, which is a corollary of Theorem 20. Corollary 32. Let W be a complexon, and let 1 m < n, d 1. For every m-partition Q of [0, 1], there is an n-partition P refining Q such that, for all 0 j d, d ,j(W, WP) d + 1 log2 n/m. Lemma 33. The space W with the δ-topology is compact. Proof By Lemma 11, the δ-topology on W is the same for all strictly positive, summable sequences (αj)j 1. Thus, we can assume without loss of generality that αj = 2 j for all j 1, so that P j 1 αj = 1. To prove compactness, we show that every sequence W1, W2, . . . W has a convergent subsequence. Following the proof of (Lov asz and Szegedy, 2007, Theorem 5.1), for every n 1, we can construct via Theorem 20 partitions Pn,k of [0, 1] for k 1, with corresponding stepfunctions Wn,k = (Wn)Pn,k that satisfy (i) d (Wn Wn,k; (αj)) 1/k (ii) Pn,k is refined by Pn,k+1 for all n, k (iii) |Pn,k| = mk depends only on k. To do so, assume that for some k 0, Pn,k is an mk-partition of [0, 1] satisfying conditions (i) and (iii). Put ϵ = 1 2(k + 1) Mϵ = log2 ϵ + 1 mk+1 = mk 24(k+1)2(Mϵ+1). Applying Corollary 32 with d = Mϵ, there exists an mk+1-partition Pn,k+1 satisfying (i) and (ii). Moreover, mk+1 does not depend on the complexon Wn, so that Pn,k+1 also satisfies (iii). For the base case, let Pn,1 be the trivial (indiscrete) partition, which satisfies conditions (i) and (iii) with mk = 1. By induction, then, there exists a sequence of partitions {Pn,k}k 1 satisfying conditions (i) (iii). Roddenberry and Segarra Following the approach of (Lov asz and Szegedy, 2007, Theorem 5.1), we can rearrange the points of [0, 1] for each n by a measure-preserving bijection so that every partition in every Pn,k is an interval, while preserving the properties (i) (iii). Having done this, we replace the sequence (Wn)n 1 by a subsequence so that for every k, the sequence Wn,k converges almost everywhere to a stepfunction Uk with mk steps as n . To do so, select a subsequence of Wn such that the length of the ith interval in Pn,1 converges for all i [m1]. Then, take a further subsequence such that the value of Wn,1 converges on the product of the ith 1 and ith 2 interval, for i1, i2 [m1]. Repeat this for the products of the intervals indexed by i1, . . . , id+1 [m1] for all d 1. It follows that Wn,1 converges almost everywhere to a limit stepfunction U1 with m1 steps. Repeat this procedure for all k > 1, taking further and further subsequences so that for all k, Wn,k Uk almost everywhere, where each Uk is a stepfunction on mk steps. This yields the desired subsequence. The proof proceeds identically to that of (Lov asz and Szegedy, 2007, Theorem 5.1). Let Pk denote the partition of [0, 1] into the steps of Uk. By the condition (ii) of the array Wn,k, one can show that for every k < l, the partition Pn,l refines Pn,k, and furthermore that Pl refines Pk. Moreover, it holds that Uk = (Ul)Pk. (10) We now seek to establish the almost everywhere convergence of the sequence Uk. Define a function g : F d 1[0, 1]d+1 [0, 1] so that g([0, 1]d+1) = αd, for all d 1. Define a probability measure µ on F d 1[0, 1]d+1 such that, for any Borel set A F d 1[0, 1]d+1, where λ is the Lebesgue measure. Let X be a random point in F d 1[0, 1]d+1 chosen according to the measure µ. Then, the equation (10) implies that the sequence U1(X), U2(X), . . . is a martingale with bounded terms. By the martingale convergence theorem (see, for instance, Williams, 1991), this sequence is convergent with probability 1. That is to say, the sequence of functions U1, U2, . . . converges almost everywhere, with respect to the measure µ. Since µ is absolutely continuous with respect to λ, this implies convergence almost everywhere with respect to the Lebesgue measure. Denote the pointwise limit of this sequence by U. By the dominated convergence theorem, we have [0,1]j+1 |U(x) Uk(x)|dx = lim k Z |U Uk|dµ = 0, that is, Uk L1 U with respect to the probability measure µ. We now show that the subsequence Wn converges to U in the cut-distance. Let ϵ > 0 be given arbitrarily. Then, there is some k > 3/ϵ such that [0,1]j+1 |U(x) Uk(x)|dx = Z |U Uk|dµ < ϵ/3. Limits of Dense Simplicial Complexes For this k, there is an n0 such that for all n > n0, [0,1]j+1 |Uk(x) Wn,k(x)|dx = Z |Uk Wn,k|dµ < ϵ/3, again by dominated convergence. Then, for all n > n0, δ (U, Wn; (αj)j 1) δ (U, Uk; (αj)) + δ (Uk, Wn,k; (αj)) + δ (Wn,k, Wn; (αj)) + Z |Uk Wn,k|dµ + δ (Wn,k, Wn; (αj)) thus proving convergence of the subsequence, as desired. We define the map ( ) : W W that sends any W W to W , that is, the faceting map. Lemma 34. The faceting map ( ) : W W is continuous with respect to the δ-topology on W. Proof By Lemma 11, we can consider an arbitrary strictly positive summable sequence (αj)j 1, so that the cut-distance δ ( , ; (αj)j 1) induces the δ-topology on W. Let W1, W2, . . . be a convergent sequence W with respect to the cut-metric, with limiting complexon W. By Lemma 6, this implies that for all n 1, the sequence of distributions K(n, Wm) converges to K(n, W) in the total variation distance as m . We now show that W 1 , W 2 , . . . W in the cut-metric. Let ϵ > 0 be given arbitrarily. Let Mϵ be an integer such that P j>Mϵ αj < ϵ/2, and put n = l 2(8 5Mϵ+1/ϵ) 2m . By Lemma 7, we have that for any complexon W, δ (W , K(n, W); (αj)j 1) < ϵ with probability at least 1 exp( n/(2 log2 n)). Since the sequence K(n, Wm) K(n, W) converges in the total variation distance as m , this implies that for some sufficiently large m0, for each m > m0, there is some simplicial complex F such that simultaneously δ (W m, F; (αj)j 1) < ϵ 2 δ (W , F; (αj)j 1) < ϵ Roddenberry and Segarra By the triangle inequality, this implies δ (W m, W ; (αj)j 1) < ϵ for all m > m0. Thus, W 1 , W 2 , . . . W in the cut-distance, so that the map ( ) is continuous, as desired. Defining W as the image of W under ( ) , Lemmas 33 and 34 imply that W is a compact subspace of W with respect to the δ-topology. The compactness of the space W with the canonical topology follows. Proof [Proof of Theorem 13] From the pseudometric space (W, δ ), where δ is defined in terms of an arbitrary strictly positive, summable sequence, take the metric identification (W, δ ), and define the δ-topology on W be the topology induced by the metric δ . Since the metric identification preserves compactness W is a compact space under the δ-topology, by Lemma 33. Moreover, taking W to be the image of W under the map ( ) , we have that (W , δ ) is a compact metric space by Lemma 34. In other words, ( ) : W W is a quotient map. It is clear that the canonical topology on W is equal to the quotient topology by the map ( ) : W W . Since the space W with the quotient topology is compact, the canonical topology is compact, as desired. Appendix D. Complexons on Borel Probability Spaces We consider here how to transform complexons on Borel probability spaces to complexons on [0, 1] in a way that preserves reasonable notions of sampling and homomorphism density. One can extend the following discussion to more general spaces as discussed in (Lov asz, 2012, Section 13.1,Appendix A.3), but we restrict our attention to Borel probability spaces for convenience. A probability space (Ω, F, µ) is said to be a Borel probability space if for some sequence of constants {mn}n 0, it is isomorphic (up to sets of measure zero) to a probability space ( Ω, F, µ) such that 1. Ω= [0, m0] S n 1{pn}, where each {pn} is considered as an atom 2. µ is the Lebesgue measure on the Borel sets of the closed interval [0, m0], and for each n 1 we have µ(pn) = mn 3. F is the Borel σ-field on Ω. For instance, any probability measure on the Borel σ-field of a separable complete metric space yields a Borel probability space. With this definition, we now outline a procedure, as described by (Lov asz, 2012, Section 13.1), for describing a complexon on a Borel probability space by a complexon on [0, 1]. Let a Borel probability space (Ω, F, µ) be given with corresponding constants {mn}n 0. Let φ : Ω Ωbe the bijective isomorphism guaranteed by the definition of a Borel probability space. For a complexon WΩon Ω, define a complexon W on [0, 1] in the following way. For each n 0, define the interval Limits of Dense Simplicial Complexes so that I0 = [0, m0), S n 0 In = [0, 1), and each In has Lebesgue measure mn. For any x [0, 1), define the corresponding x Ωso that ( x x I0 pn x In. Then, define the complexon W so that for any points x1, . . . , xd+1 [0, 1) W(x1, . . . , xd+1) = WΩ(φ( x1), . . . , φ( xd+1)) . Since we characterize complexons in a way that does not depend on sets of measure zero, W can be trivially extended to points in [0, 1]. To justify this definition, we will show that this preserves a reasonable notion of homomorphism density. Let F be a simplicial complex. The (induced) homomorphism density of F in WΩis defined as (see (1)) t(F, WΩ) = Z σ F WΩ(xσ)dµV(F)(x) tind(F, WΩ) = Z σ F WΩ(xσ) Y τ F (1 WΩ(xτ)) dµV(F)(x), where µV(F) indicates the usual product measure on ΩV(F). By simple unrolling of definitions, one can check that t(F, WΩ) = t(F, W) tind(F, WΩ) = tind(F, W), for all simplicial complexes F. Thus, it is justifiable to use W as a representative of the complexon WΩ, particularly for the purposes of modeling sampling random simplicial complexes from WΩ. We refer to (Lov asz, 2012, Section 13.1) for a further discussion on how this can be extended to a broader class of probability spaces, as well as some typical conditions under which a probability space is guaranteed to be Borel. Robert J Adler, Omer Bobrowski, and Shmuel Weinberger. Crackle: The homology of noise. Discrete & Computational Geometry, 52(4):680 704, 2014. David J Aldous. Representations for partially exchangeable arrays of random variables. Journal of Multivariate Analysis, 11(4):581 598, 1981. Noga Alon, W Fernandez De La Vega, Ravi Kannan, and Marek Karpinski. Random sampling and approximation of MAX-CSPs. Journal of Computer and System Sciences, 67(2):212 243, 2003. Krishnakumar Balasubramanian. Nonparametric modeling of higher-order interactions via hypergraphons. The Journal of Machine Learning Research, 22(1):6464 6498, 2021. Roddenberry and Segarra Federico Battiston and Giovanni Petri. Higher-Order Systems. Springer, 2022. Omer Bobrowski and Dmitri Krioukov. Random Simplicial Complexes: Models and Phenomena, pages 59 96. Springer, 2022. Referenced quotations come from the preprint version https://arxiv.org/abs/2105.12914v1. Christian Borgs, Jennifer T Chayes, L aszl o Lov asz, Vera T S os, and Katalin Vesztergombi. Convergent sequences of dense graphs I: Subgraph frequencies, metric properties and testing. Advances in Mathematics, 219(6):1801 1851, 2008. Christian Borgs, Jennifer T Chayes, L aszl o Lov asz, Vera T S os, and Katalin Vesztergombi. Convergent sequences of dense graphs II. Multiway cuts and statistical physics. Annals of Mathematics, pages 151 219, 2012. Michael M Bronstein, Joan Bruna, Yann Le Cun, Arthur Szlam, and Pierre Vandergheynst. Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine, 34(4):18 42, 2017. Gunnar Carlsson. Topology and data. Bulletin of the American Mathematical Society, 46 (2):255 308, 2009. Fr ed eric Chazal, David Cohen-Steiner, Leonidas J Guibas, Facundo M emoli, and Steve Y Oudot. Gromov-Hausdorffstable signatures for shapes using persistence. In Computer Graphics Forum, volume 28, pages 1393 1403. Wiley Online Library, 2009. Fr ed eric Chazal, Marc Glisse, Catherine Labru ere, and Bertrand Michel. Convergence rates for persistence diagram estimation in topological data analysis. In International Conference on Machine Learning, pages 163 171. PMLR, 2014. Fr ed eric Chazal, Brittany Fasy, Fabrizio Lecci, Bertrand Michel, Alessandro Rinaldo, and Larry Wasserman. Subsampling methods for persistent homology. In International Conference on Machine Learning, pages 2143 2151. PMLR, 2015. Fr ed eric Chazal, Brittany Fasy, Fabrizio Lecci, Bertrand Michel, Alessandro Rinaldo, and Larry Wasserman. Robust topological inference: Distance to a measure and kernel distance. The Journal of Machine Learning Research, 18(1):5845 5884, 2017. Fabio Coppini. A note on Fokker Planck equations and graphons. Journal of Statistical Physics, 187(2):15, 2022. A Costa and M Farber. Large random simplicial complexes, I. Journal of Topology and Analysis, 8(03):399 429, 2016. A Costa and M Farber. Large random simplicial complexes, II; the fundamental group. Journal of Topology and Analysis, 9(03):441 483, 2017a. A Costa and M Farber. Large random simplicial complexes, III the critical dimension. Journal of Knot Theory and Its Ramifications, 26(02):1740010, 2017b. Persi Diaconis and Svante Janson. Graph limits and exchangeable random graphs. ar Xiv preprint ar Xiv:0712.2749, 2007. Limits of Dense Simplicial Complexes G abor Elek and Bal azs Szegedy. A measure-theoretic approach to the theory of dense hypergraphs. Advances in Mathematics, 231(3-4):1731 1772, 2012. Michael Farber and Lewis Mead. Random simplicial complexes in the medial regime. Topology and its Applications, 272:107065, 2020. Michael Farber, Lewis Mead, and Lewin Strauss. The Rado simplicial complex. Journal of Applied and Computational Topology, 5(2):339 356, 2021. Michael Farber, Lewis Mead, and Tahl Nowik. Random simplicial complexes, duality and the critical dimension. Journal of Topology and Analysis, 14(01):1 31, 2022. Alan Frieze and Ravi Kannan. Quick approximation to matrices and applications. Combinatorica, 19(2):175 220, 1999. W Timothy Gowers. Quasirandomness, counting and regularity for 3-uniform hypergraphs. Combinatorics, Probability and Computing, 15(1-2):143 184, 2006. W Timothy Gowers. Hypergraph regularity and the multidimensional Szemer edi theorem. Annals of Mathematics, 166(3):897 946, 2007. Jan Hladk y, Andr as M ath e, Viresh Patel, and Oleg Pikhurko. Poset limits can be totally ordered. Transactions of the American Mathematical Society, pages 4319 4337, 2015. Douglas N Hoover. Relations on probability spaces and arrays of random variables. Preprint, Institute for Advanced Study, Princeton, NJ, 2:275, 1979. Svante Janson. Poset limits and exchangeable random posets. Combinatorica, 31(5):529 563, 2011. Svante Janson. Graphons, cut norm and distance, couplings and rearrangements. NYJM Monographs, 4, 2013. Matthew Kahle. Topology of random clique complexes. Discrete mathematics, 309(6): 1658 1671, 2009. Nathan Linial and Roy Meshulam. Homological connectivity of random 2-complexes. Combinatorica, 26(4):475 487, 2006. L aszl o Lov asz. Graph homomorphisms: Open problems. 2008. URL http://www.cs.elte. hu/lovasz/problems.pdf. L aszl o Lov asz. Large networks and graph limits, volume 60. American Mathematical Society, 2012. L aszl o Lov asz and Bal azs Szegedy. Limits of dense graph sequences. Journal of Combinatorial Theory, Series B, 96(6):933 957, 2006. L aszl o Lov asz and Bal azs Szegedy. Szemer edi s lemma for the analyst. GAFA Geometric And Functional Analysis, 17(1):252 270, 2007. Roddenberry and Segarra Sohir Maskey, Ron Levie, and Gitta Kutyniok. Transferability of graph neural networks: an extended graphon approach. Applied and Computational Harmonic Analysis, 63:48 83, 2023. Roy Meshulam and Nathan Wallach. Homological connectivity of random k-dimensional complexes. Random Structures & Algorithms, 34(3):408 417, 2009. Madeline Navarro and Santiago Segarra. Joint network topology inference via a shared graphon model. IEEE Transactions on Signal Processing, 70:5549 5563, 2022. Takashi Owada and Omer Bobrowski. Convergence of persistence diagrams for topological crackle. Bernoulli, 26(3):2275 2310, 2020. T Mitchell Roddenberry, Madeline Navarro, and Santiago Segarra. Network topology inference with graphon spectral penalties. In IEEE International Conference on Acoustics, Speech and Signal Processing, pages 5390 5394. IEEE, 2021. Vojtˇech R odl and Jozef Skokan. Regularity lemma for k-uniform hypergraphs. Random Structures & Algorithms, 25(1):1 42, 2004. Luana Ruiz, Luiz Chamon, and Alejandro Ribeiro. Graphon neural networks and the transferability of graph neural networks. Advances in Neural Information Processing Systems, 33:1702 1712, 2020. Michael T Schaub, Yu Zhu, Jean-Baptiste Seby, T Mitchell Roddenberry, and Santiago Segarra. Signal processing on higher-order networks: Livin on the edge... and beyond. Signal Processing, 187:108 149, 2021. Benedikt Stufler. Graphon convergence of random cographs. Random Structures & Algorithms, 59(3):464 491, 2021. Terence Tao. A variant of the hypergraph removal lemma. Journal of combinatorial theory, Series A, 113(7):1257 1280, 2006. David Williams. Probability with martingales. Cambridge university press, 1991. Yufei Zhao. Hypergraph limits: a regularity approach. Random Structures & Algorithms, 47(2):205 226, 2015.