# optimizing_persistent_homology_based_functions__4dec5bdd.pdf Optimizing persistent homology based functions Mathieu Carri ere 1 Fr ed eric Chazal 2 Marc Glisse 2 Yuichi Ike 3 Hariprasad Kannan 2 Yuhei Umeda 3 Solving optimization tasks based on functions and losses with a topological flavor is a very active, growing field of research in data science and Topological Data Analysis, with applications in non-convex optimization, statistics and machine learning. However, the approaches proposed in the literature are usually anchored to a specific application and/or topological construction, and do not come with theoretical guarantees. To address this issue, we study the differentiability of a general map associated with the most common topological construction, that is, the persistence map. Building on real analytic geometry arguments, we propose a general framework that allows us to define and compute gradients for persistence-based functions in a very simple way. We also provide a simple, explicit and sufficient condition for convergence of stochastic subgradient methods for such functions. This result encompasses all the constructions and applications of topological optimization in the literature. Finally, we provide associated code, that is easy to handle and to mix with other non-topological methods and constraints, as well as some experiments showcasing the versatility of our approach. 1. Introduction Persistent homology is a central tool in Topological Data Analysis that allows to efficiently infer relevant topological features of complex data in a descriptor called persistence diagram. It has found many applications in Machine Learning (ML) where it initially played the role of a feature engineering tool, either through the direct use of persistence diagrams or through dedicated ML architectures that handle them see, e.g., (Hofer et al., 2017; Umeda, 2017; Carri ere et al., 2020; Dindin et al., 2020; Kim et al., 2020). For 1Universit e Cˆote d Azur, Inria, France 2Universit e Paris-Saclay, CNRS, Inria, Laboratoire de Math ematiques d Orsay, France 3Fujitsu Ltd., Kanagawa, Japan. Correspondence to: Mathieu Carri ere . Proceedings of the 38 th International Conference on Machine Learning, PMLR 139, 2021. Copyright 2021 by the author(s). the last few years, a growing number of works have successfully been using persistence theory in different ways, focusing on, for instance, better understanding, designing and improvement of neural network architectures see, e.g., (Rieck et al., 2019; Moor et al., 2020; Carlsson & Gabrielsson, 2020; Gabrielsson & Carlsson, 2019) or design regularization and loss functions incorporating topological terms and penalties for various ML tasks see, e.g., (Chen et al., 2019; Hofer et al., 2019; 2020; Clough et al., 2020). These new use cases of persistence generally involve minimizing functions that depend on persistence diagrams. Such functions are in general non-convex and not differentiable, and thus their theoretical and practical minimization can be difficult. In some specific cases, persistence-based functions can be designed to be differentiable and/or some effort have to be made to compute their gradient, so that standard gradient descent techniques can be used to minimize them see e.g., (Wang et al., 2020; Poulenard et al., 2018; Br uel-Gabrielsson et al., 2020). In the general case, recent attempts have been made to better understand their differential structures (Leygonie et al., 2021). Moreover, building on powerful tools provided by software libraries such as Py Torch or Tensor Flow, practical methods allowing to encode and optimize a large family of persistence-based functions have been proposed and experimented with (Br uel Gabrielsson et al., 2020; Solomon et al., 2021). However, in all these cases, the algorithms used to minimize these functions do not come with theoretical guarantees of convergence to a global or local minimum. Contributions and organization of the article. The aim of this article is to provide a general framework that includes almost all persistence-based functions from the literature, and for which stochastic subgradient descent algorithms are easy to implement and come with convergence guarantees. More precisely, we first observe that the persistence map, converting a filtration over a given simplicial complex 1 into a persistence diagram, can be thought of as a map between Euclidean spaces (Section 2). This observation allows us to prove that the persistence map is semi-algebraic and, using classical arguments from o-minimal geometry, to study the 1The presentation is restricted to simplicial complexes for simplicity, but this generalizes to other complexes as well. We present an example with cubical complexes in Supplementary Material. Optimizing persistent homology based functions differentiability of the persistence of parametrized families of filtrations (Section 3). Then, building on the recent work of (Davis et al., 2020), we consider the minimization problem of persistence-based functions and show that under mild assumptions, stochastic subgradient descent algorithms applied to such functions converge almost surely to a critical point (Section 4). We also provide a simple corresponding Python implementation for minimizing functions of persistence2, and we illustrate it with several examples from the literature (Section 5). 2. Filtrations and persistence diagrams In this section, we show that the persistence map is nothing but a permutation of the coordinates of a vector containing the filtration values. 2.1. Simplicial complexes and filtrations Recall that given a set V , a (finite) simplicial complex K is a collection of finite subsets of V that satisfies (1) {v} K for any v V , and (2) if σ K and τ σ then τ K. An element σ K with |σ| = k + 1 is called a k-simplex. Given a simplicial complex K and a subset R of R, a filtration of K is an increasing sequence (Kr)r R of subcomplexes of K with respect to the inclusion, i.e., Kr Ks for any r s, and such that S r R Kr = K. To each simplex σ K, one can associate its filtering index Φσ = inf{r R : σ Kr}. Thus, when K is finite, a filtration of K can be conveniently represented as a filtering function Φ: K R. Equivalently, it can be represented as a |K|-dimensional vector Φ = (Φσ)σ K in R|K| whose coordinates are the indices of the simplices of K and that satisfies the following condition: if σ, τ K and τ σ, then Φτ Φσ. As a consequence, if the vectorized filtration Φ depends on a parameter, the corresponding family of filtrations can be represented as a map from the space of parameters to R|K| in the following way. Definition 2.1. Let K be a simplicial complex and A a set. A map Φ: A R|K| is said to be a parametrized family of filtrations if for any x A and σ, τ K with τ σ, one has Φτ(x) Φσ(x). 2.2. Persistence computation from filtrations We briefly recall how the computation of the persistence diagram of a filtered simplicial complex decomposes into: (i) a purely combinatorial part only relying on the order on the simplices induced by the filtration, and (ii) a part relying on the filtration values. A detailed introduction to persistent homology and its computation can be found in, 2It is publicly available at https://github.com/ Mathieu Carriere/difftda e.g., (Edelsbrunner & Harer, 2010; Boissonnat et al., 2018). Let K be a simplicial complex endowed with a filtration and corresponding filtering function Φ R|K|, where |K| is the number of non-empty simplices of K. First part: combinatorial part (persistence pairs). The filtering function Φ induces a total preorder on the elements of K as follows: τ σ if Φτ Φσ. This preorder can be refined into a total order by breaking ties in some fairly arbitrary way, as long as it is consistent with the face relation, i.e., if τ σ, then τ σ. One way to break ties is to sort simplices that have the same filtration value by dimension, and then order the ones that are still equivalent according to some arbitrary indexing of the simplices. For instance, one can represent simplices by their decreasing list of vertices, and sort equivalent simplices using the lexicographic order on those lists. In the following, we will assume that the total order is a function of the preorder, in particular it is deterministic and does not depend on the exact values of Φ. Note that while different orders may yield different pairings, they all translate to the same persistence diagram in the second part. The basic algorithm to compute persistence iterates over the ordered set of simplices σ1 σ|K| is according to Algorithm 1 below see Section 11.5.2 in (Boissonnat et al., 2018) for a detailed description of the algorithm. Algorithm 1 Persistence pairs computation (sketch) Input: Ordered sequence of simplices σ1 σ|K| K0 = Pairs0 = Pairs1 = = Pairsd 1 = for j = 1 to |K| do k = dim σj Kj = Kj 1 σj if σj does not create a new k-dimensional homology class in Kj then A (k 1)-dimensional homology class created in Kl(j) by σl(j) for some l(j) < j becomes homologous to 0 in Kj. Pairsk 1 Pairsk 1 {(σl(j), σj)}; end if end for Output: Persistence pairs in each dimension Pairs0, Pairs1, . . . , Pairsd 1 Note that for each dimension k, some k-dimensional simplices may remain unpaired at the end of the algorithm; their number is equal to the k-dimensional Betti number of K. Second part: associated filtration values. The persistence diagram of the filter function Φ is now obtained by associating to each persistent pair (σl(j), σj) the point (Φσl(j), Φσj). Moreover, to each unpaired simplex σl is Optimizing persistent homology based functions associated the point (Φσl, + ). If p is the number of persistence pairs and q is the number of unpaired simplices, then |K| = 2p + q and the persistence diagram D(Φ) of the filtration Φ of K is made of p points in R2 (counted with multiplicity) and q points (also counted with multiplicity) with infinite second coordinate. Choosing the lexicographical order on R (R {+ }), the persistence diagram D(Φ) can be represented as a vector in R|K| and the output of the persistence algorithm can be simply seen as a permutation of the coordinates of the input vector Φ. Moreover, this permutation only depends on the order on the simplices of K induced by Φ. Definition 2.2. The subset of points of a persistence diagram D with finite coordinates (resp. infinite second coordinate) is called the regular part (resp. essential part) of D and denoted by Dreg (resp. Dess). With the notations defined above, Dreg and Dess can be represented as vectors in R2p and Rq, respectively. Note that, in practice, the above construction is usually done dimension by dimension to get a persistence diagram for each dimension in homology, by restricting to the subset of simplices of dimension k and k + 1. Without loss of generality, and to avoid unnecessary heavy notation, in the following we consider the whole persistence diagram, made of the union of the persistence diagrams in all dimensions k. 3. Differentiability of functions of persistence o-minimal geometry provides a well-suited setting to describe the parametrized families of filtrations encountered in practice and to exhibit interesting differentiability properties of their composition with the persistence map. 3.1. Background on o-minimal geometry In this section, we recall some elements of o-minimal geometry, which are needed in the next sections see, e.g., (Coste, 2000) for a more detailed introduction. Definition 3.1 (o-minimal structure). An o-minimal structure on the field of real numbers R is a collection (Sn)n N, where each Sn is a set of subsets of Rn such that: 1. S1 is exactly the collection of finite unions of points and intervals; 2. all algebraic subsets3 of Rn are in Sn; 3. Sn is a Boolean subalgebra of Rn for any n N; 4. if A Sn and B Sm, then A B Sn+m; 5. if π: Rn+1 Rn is the linear projection onto the first n coordinates and A Sn+1, then π(A) Sn. 3Recall that an algebraic set is the 0-level set of a polynomial. An element A Sn for some n N is called a definable set in the o-minimal structure. For a definable set A Rn, a map f : A Rm is said to be definable if its graph is a definable set in Rn+m. Definable sets are stable under various geometric operations. The complement, closure and interior of a definable set are definable sets. The finite unions and intersections of definable sets are definable. The image of a definable set by a definable map is itself definable. Sums and products of definable functions as well as compositions of definable functions are definable see Section 1.3 in (Coste, 2000). In particular, the max and min of finite sets of real-valued definable functions are also definable. An important property of definable sets and definable maps is that they admit a finite Whitney stratification (van den Dries & Miller, 1996). This implies that (i) any definable set A Rn can be decomposed into a finite disjoint union of smooth submanifolds of Rn and (ii) for any definable map Φ: A Rm, A can also be decomposed into a finite union of smooth manifolds such that the restriction of Φ on each of these manifolds is a differentiable function. The simplest example of o-minimal structures is given by the family of semi-algebraic subsets4 of Rn (n N). Although most of the classical parametrized families of filtrations are semi-algebraic, the o-minimal framework actually allows to consider larger families. In particular, the result of (Wilkie, 1996) says that the family of images of the sublevel sets of functions in R[x1, . . . , x N, exp(x1), . . . , exp(x N)] for some N N under linear projections is an o-minimal structure, which allows us to mix exponential functions with semi-algebraic functions. 3.2. Persistence diagrams of definable parametrized families of filtrations Let K be a simplicial complex and Φ: A R|K| be a parametrized family of filtrations that is definable in a given o-minimal structure. If for any x, x A, the preorders induced by Φ(x) and Φ(x ) on the simplices of K are the same, i.e., for any σ1, σ2 K, Φσ1(x) Φσ2(x) if and only if Φσ1(x ) Φσ2(x ), then the pairs of simplices (σi1, σj1), . . . , (σip, σjp), and the unpaired simplices σip+1, . . . , σip+q that are computed by the persistence Algorithm 1 are independent of x. Then, x A, the persistence diagram D = D(Φ(x)) of Φ(x) is k=1 (Φσik (x), Φσjk (x)) k=1 (Φσip+k (x), + ), where |K| = 2p + q. 4It is the family of all finite unions and intersections of level sets and sublevel sets of polynomials (Benedetti & Risler, 1991). Optimizing persistent homology based functions Given the lexicographic order on R (R {+ }), the points of any finite multi-set D R (R {+ }) with p points in R2 and q points in R {+ } can be ordered in non-decreasing order, and D can be represented as a vector in R2p+q. As a consequence, denoting by Filt K the set of vectors in R|K| that define a filtration on K, the persistence map Pers: Filt K R|K| that assigns to each filtration of K its persistence diagram consists of a permutation of the coordinates of R|K|. This permutation is constant on the set of filtrations that define the same preorder on the simplices of K. This leads to the following statement. Proposition 3.2. Given a simplicial complex K, the map Pers: Filt K R|K| R|K| is semi-algebraic, and thus definable in any o-minimal structure. Moreover, there exists a semi-algebraic partition of Filt K such that the restriction of Pers to each element of this partition is a Lipschitz map. Proof. See Supplementary Material. Since there exists a finite semi-algebraic partition of Filt K on which Pers is a locally constant permutation, the subdifferential (see Section 4 for the definition) of Pers is welldefined and obvious to compute: each coordinate in the output (i.e., the persistence diagram) is a copy of a coordinate in the input (i.e., the filtration values of the simplices). This implies that every partial derivative is either 1 or 0. The output can be seen as a reindexing of the input, and this is indeed how we implement it in our code, so that automatic differentiation frameworks (Py Torch, Tensor Flow, etc.) can process the function Pers directly and do not need explicit gradient formulas see Section 5. Note that the subdifferential depends on the arbitrary refinement of the preorder in Subsection 2.2. Corollary 3.3. Let K be a simplicial complex and Φ: A R|K| be a definable (in a given o-minimal structure) parametrized family of filtrations. The map Pers Φ: A R|K| is definable. Note that according to the remark following Proposition 3.2, if Φ is differentiable, the subdifferential of Pers Φ can be easily computed in terms of the partial derivatives of Φ using, for example, Equation (1). It also follows from standard finiteness and stratifiability properties of definable sets and maps that Pers Φ is differentiable almost everywhere. More precisely: Proposition 3.4. Let K be a simplicial complex and Φ: A R|K| a definable parametrized family of filtrations, where dim A = m. Then there exists a finite definable partition of A, A = S O1 Ok such that dim S < dim A := m and, for any i = 1, . . . , k, Oi is a definable manifold of dimension m and Pers Φ: Oi D is differentiable. 3.3. Examples of definable families of filtrations Vietoris-Rips filtrations. The family of Vietoris-Rips filtrations built on top of sets of n points x1, . . . , xn Rd is the semi-algebraic parametrized family of filtrations Φ: A = (Rd)n R| n| = R2n 1, where n is the simplicial complex made of all the faces of the (n 1)-dimensional simplex. It is defined, for any x = (x1, . . . , xn) A and any simplex σ {1, . . . , n}, by Φσ(x) = max i,j σ xi xj . One easily checks that the permutation induced by Pers is constant on the connected components of the complement of the union of the subspaces Si,j,k,l = {(x1, . . . , xn) : xi xj = xk xl } over all the 4-tuples (i, j, k, l) such that at least 3 of the 4 indices i, j, k, l are distinct. This example naturally extends to Vietoris-Rips-like filtrations in the following way. Let A Mn(R) be the set of n n symmetric matrices with non-negative entries and 0 on the diagonal. This is a semi-algebraic subset of the space of n-by-n matrices Mn(R) Rn2, of dimension m = (n 1)(n 2)/2. The map Φ: A R| n| = R2n defined by Φσ(M) = maxi,j σ mi,j for any M = (mi,j)1 i,j n A, is a semi-algebraic family of filtrations. Note that the set S of Proposition 3.4 can be chosen to be the set of matrices with at least 2 entries that are equal. Weighted Rips filtrations. Given a function f : Rd R, the family of weighted Rips filtrations Φ: A = (Rd)n R| n| = R2n associated with f is defined, for any x = (x1, . . . , xn) A and any simplex σ {1, . . . , n}, by Φσ(x) = 2f(xj) if σ = [j]; Φσ(x) = max(2f(xi), 2f(xj), xi xj + f(xi) + f(xj)), if σ = [i, j], i = j; Φσ(x) = max(Φ[i,j](x), i, j σ) if |σ| 3. Since Euclidean distances and max function are semialgebraic, this family of filtrations is definable as soon as the weight function f is definable. This example easily extends to the case where the weight function depends on the set of points x = (x1, . . . , xn): the weight at vertex y is defined by f(x, y) with f : (Rd)n Rd R. A particular example of such a family is given by the so-called DTM filtration (Anai et al., 2020), where f(x, y) is the average distance from y to its k-nearest neighbors in x. In this case, f is semi-algebraic, and the family of DTM filtrations is semi-algebraic. Optimizing persistent homology based functions The o-minimal framework also allows us to consider weight functions involving exponential functions (Wilkie, 1996), such as, for instance, kernel-based density estimates with Gaussian kernels. Sublevel sets filtrations. Let K be a simplicial complex with n vertices v1, . . . , vn. Any real-valued function f defined on the vertices of K can be represented as a vector (f(v1), . . . , f(vn)) Rn. The family of sublevel sets filtrations Φ: A = Rn R|K| of functions on the vertices of K is defined by Φσ(f) = maxi σ fi for any f = (f1, . . . , fn) A and any simplex σ {1, . . . , n}. This filtration is also known as the lower-star filtration of f. The function Φ is obviously semi-algebraic, and for Proposition 3.4 to hold it is sufficient to choose S = S 1 i