# optimizing_persistent_homology_based_functions__4dec5bdd.pdf

Optimizing persistent homology based functions

Mathieu Carri ere 1 Fr ed eric Chazal 2 Marc Glisse 2 Yuichi Ike 3 Hariprasad Kannan 2 Yuhei Umeda 3

Solving optimization tasks based on functions and losses with a topological ﬂavor is a very active, growing ﬁeld of research in data science and Topological Data Analysis, with applications in non-convex optimization, statistics and machine learning. However, the approaches proposed in the literature are usually anchored to a speciﬁc application and/or topological construction, and do not come with theoretical guarantees. To address this issue, we study the differentiability of a general map associated with the most common topological construction, that is, the persistence map. Building on real analytic geometry arguments, we propose a general framework that allows us to deﬁne and compute gradients for persistence-based functions in a very simple way. We also provide a simple, explicit and sufﬁcient condition for convergence of stochastic subgradient methods for such functions. This result encompasses all the constructions and applications of topological optimization in the literature. Finally, we provide associated code, that is easy to handle and to mix with other non-topological methods and constraints, as well as some experiments showcasing the versatility of our approach.

1. Introduction

Persistent homology is a central tool in Topological Data Analysis that allows to efﬁciently infer relevant topological features of complex data in a descriptor called persistence diagram. It has found many applications in Machine Learning (ML) where it initially played the role of a feature engineering tool, either through the direct use of persistence diagrams or through dedicated ML architectures that handle them see, e.g., (Hofer et al., 2017; Umeda, 2017; Carri ere et al., 2020; Dindin et al., 2020; Kim et al., 2020). For

1Universit e Cˆote d Azur, Inria, France 2Universit e Paris-Saclay, CNRS, Inria, Laboratoire de Math ematiques d Orsay, France 3Fujitsu Ltd., Kanagawa, Japan. Correspondence to: Mathieu Carri ere <mathieu.carriere@inria.fr>.

Proceedings of the 38 th International Conference on Machine Learning, PMLR 139, 2021. Copyright 2021 by the author(s).

the last few years, a growing number of works have successfully been using persistence theory in different ways, focusing on, for instance, better understanding, designing and improvement of neural network architectures see, e.g., (Rieck et al., 2019; Moor et al., 2020; Carlsson & Gabrielsson, 2020; Gabrielsson & Carlsson, 2019) or design regularization and loss functions incorporating topological terms and penalties for various ML tasks see, e.g., (Chen et al., 2019; Hofer et al., 2019; 2020; Clough et al., 2020). These new use cases of persistence generally involve minimizing functions that depend on persistence diagrams. Such functions are in general non-convex and not differentiable, and thus their theoretical and practical minimization can be difﬁcult. In some speciﬁc cases, persistence-based functions can be designed to be differentiable and/or some effort have to be made to compute their gradient, so that standard gradient descent techniques can be used to minimize them see e.g., (Wang et al., 2020; Poulenard et al., 2018; Br uel-Gabrielsson et al., 2020). In the general case, recent attempts have been made to better understand their differential structures (Leygonie et al., 2021). Moreover, building on powerful tools provided by software libraries such as Py Torch or Tensor Flow, practical methods allowing to encode and optimize a large family of persistence-based functions have been proposed and experimented with (Br uel Gabrielsson et al., 2020; Solomon et al., 2021). However, in all these cases, the algorithms used to minimize these functions do not come with theoretical guarantees of convergence to a global or local minimum.

Contributions and organization of the article. The aim of this article is to provide a general framework that includes almost all persistence-based functions from the literature, and for which stochastic subgradient descent algorithms are easy to implement and come with convergence guarantees.

More precisely, we ﬁrst observe that the persistence map, converting a ﬁltration over a given simplicial complex 1 into a persistence diagram, can be thought of as a map between Euclidean spaces (Section 2). This observation allows us to prove that the persistence map is semi-algebraic and, using classical arguments from o-minimal geometry, to study the

1The presentation is restricted to simplicial complexes for simplicity, but this generalizes to other complexes as well. We present an example with cubical complexes in Supplementary Material.

Optimizing persistent homology based functions

differentiability of the persistence of parametrized families of ﬁltrations (Section 3). Then, building on the recent work of (Davis et al., 2020), we consider the minimization problem of persistence-based functions and show that under mild assumptions, stochastic subgradient descent algorithms applied to such functions converge almost surely to a critical point (Section 4). We also provide a simple corresponding Python implementation for minimizing functions of persistence2, and we illustrate it with several examples from the literature (Section 5).

2. Filtrations and persistence diagrams

In this section, we show that the persistence map is nothing but a permutation of the coordinates of a vector containing the ﬁltration values.

2.1. Simplicial complexes and ﬁltrations

Recall that given a set V , a (ﬁnite) simplicial complex K is a collection of ﬁnite subsets of V that satisﬁes (1) {v} K for any v V , and (2) if σ K and τ σ then τ K. An element σ K with |σ| = k + 1 is called a k-simplex.

Given a simplicial complex K and a subset R of R, a ﬁltration of K is an increasing sequence (Kr)r R of subcomplexes of K with respect to the inclusion, i.e., Kr Ks for any r s, and such that S

r R Kr = K.

To each simplex σ K, one can associate its ﬁltering index Φσ = inf{r R : σ Kr}. Thus, when K is ﬁnite, a ﬁltration of K can be conveniently represented as a ﬁltering function Φ: K R. Equivalently, it can be represented as a |K|-dimensional vector Φ = (Φσ)σ K in R|K| whose coordinates are the indices of the simplices of K and that satisﬁes the following condition: if σ, τ K and τ σ, then Φτ Φσ. As a consequence, if the vectorized ﬁltration Φ depends on a parameter, the corresponding family of ﬁltrations can be represented as a map from the space of parameters to R|K| in the following way.

Deﬁnition 2.1. Let K be a simplicial complex and A a set. A map Φ: A R|K| is said to be a parametrized family of ﬁltrations if for any x A and σ, τ K with τ σ, one has Φτ(x) Φσ(x).

2.2. Persistence computation from ﬁltrations

We brieﬂy recall how the computation of the persistence diagram of a ﬁltered simplicial complex decomposes into: (i) a purely combinatorial part only relying on the order on the simplices induced by the ﬁltration, and (ii) a part relying on the ﬁltration values. A detailed introduction to persistent homology and its computation can be found in,

2It is publicly available at https://github.com/ Mathieu Carriere/difftda

e.g., (Edelsbrunner & Harer, 2010; Boissonnat et al., 2018).

Let K be a simplicial complex endowed with a ﬁltration and corresponding ﬁltering function Φ R|K|, where |K| is the number of non-empty simplices of K.

First part: combinatorial part (persistence pairs). The ﬁltering function Φ induces a total preorder on the elements of K as follows: τ σ if Φτ Φσ. This preorder can be reﬁned into a total order by breaking ties in some fairly arbitrary way, as long as it is consistent with the face relation, i.e., if τ σ, then τ σ. One way to break ties is to sort simplices that have the same ﬁltration value by dimension, and then order the ones that are still equivalent according to some arbitrary indexing of the simplices. For instance, one can represent simplices by their decreasing list of vertices, and sort equivalent simplices using the lexicographic order on those lists. In the following, we will assume that the total order is a function of the preorder, in particular it is deterministic and does not depend on the exact values of Φ. Note that while different orders may yield different pairings, they all translate to the same persistence diagram in the second part. The basic algorithm to compute persistence iterates over the ordered set of simplices σ1 σ|K| is according to Algorithm 1 below see Section 11.5.2 in (Boissonnat et al., 2018) for a detailed description of the algorithm.

Algorithm 1 Persistence pairs computation (sketch)

Input: Ordered sequence of simplices σ1 σ|K| K0 = Pairs0 = Pairs1 = = Pairsd 1 = for j = 1 to |K| do

k = dim σj Kj = Kj 1 σj if σj does not create a new k-dimensional homology class in Kj then

A (k 1)-dimensional homology class created in Kl(j) by σl(j) for some l(j) < j becomes homologous to 0 in Kj. Pairsk 1 Pairsk 1 {(σl(j), σj)}; end if end for Output: Persistence pairs in each dimension Pairs0, Pairs1, . . . , Pairsd 1

Note that for each dimension k, some k-dimensional simplices may remain unpaired at the end of the algorithm; their number is equal to the k-dimensional Betti number of K.

Second part: associated ﬁltration values. The persistence diagram of the ﬁlter function Φ is now obtained by associating to each persistent pair (σl(j), σj) the point (Φσl(j), Φσj). Moreover, to each unpaired simplex σl is

Optimizing persistent homology based functions

associated the point (Φσl, + ).

If p is the number of persistence pairs and q is the number of unpaired simplices, then |K| = 2p + q and the persistence diagram D(Φ) of the ﬁltration Φ of K is made of p points in R2 (counted with multiplicity) and q points (also counted with multiplicity) with inﬁnite second coordinate. Choosing the lexicographical order on R (R {+ }), the persistence diagram D(Φ) can be represented as a vector in R|K|

and the output of the persistence algorithm can be simply seen as a permutation of the coordinates of the input vector Φ. Moreover, this permutation only depends on the order on the simplices of K induced by Φ.

Deﬁnition 2.2. The subset of points of a persistence diagram D with ﬁnite coordinates (resp. inﬁnite second coordinate) is called the regular part (resp. essential part) of D and denoted by Dreg (resp. Dess).

With the notations deﬁned above, Dreg and Dess can be represented as vectors in R2p and Rq, respectively.

Note that, in practice, the above construction is usually done dimension by dimension to get a persistence diagram for each dimension in homology, by restricting to the subset of simplices of dimension k and k + 1. Without loss of generality, and to avoid unnecessary heavy notation, in the following we consider the whole persistence diagram, made of the union of the persistence diagrams in all dimensions k.

3. Differentiability of functions of persistence

o-minimal geometry provides a well-suited setting to describe the parametrized families of ﬁltrations encountered in practice and to exhibit interesting differentiability properties of their composition with the persistence map.

3.1. Background on o-minimal geometry

In this section, we recall some elements of o-minimal geometry, which are needed in the next sections see, e.g., (Coste, 2000) for a more detailed introduction.

Deﬁnition 3.1 (o-minimal structure). An o-minimal structure on the ﬁeld of real numbers R is a collection (Sn)n N, where each Sn is a set of subsets of Rn such that:

1. S1 is exactly the collection of ﬁnite unions of points and intervals;

2. all algebraic subsets3 of Rn are in Sn;

3. Sn is a Boolean subalgebra of Rn for any n N;

4. if A Sn and B Sm, then A B Sn+m;

5. if π: Rn+1 Rn is the linear projection onto the ﬁrst n coordinates and A Sn+1, then π(A) Sn.

3Recall that an algebraic set is the 0-level set of a polynomial.

An element A Sn for some n N is called a deﬁnable set in the o-minimal structure. For a deﬁnable set A Rn, a map f : A Rm is said to be deﬁnable if its graph is a deﬁnable set in Rn+m.

Deﬁnable sets are stable under various geometric operations. The complement, closure and interior of a deﬁnable set are deﬁnable sets. The ﬁnite unions and intersections of deﬁnable sets are deﬁnable. The image of a deﬁnable set by a deﬁnable map is itself deﬁnable. Sums and products of deﬁnable functions as well as compositions of deﬁnable functions are deﬁnable see Section 1.3 in (Coste, 2000). In particular, the max and min of ﬁnite sets of real-valued deﬁnable functions are also deﬁnable. An important property of deﬁnable sets and deﬁnable maps is that they admit a ﬁnite Whitney stratiﬁcation (van den Dries & Miller, 1996). This implies that (i) any deﬁnable set A Rn can be decomposed into a ﬁnite disjoint union of smooth submanifolds of Rn and (ii) for any deﬁnable map Φ: A Rm, A can also be decomposed into a ﬁnite union of smooth manifolds such that the restriction of Φ on each of these manifolds is a differentiable function.

The simplest example of o-minimal structures is given by the family of semi-algebraic subsets4 of Rn (n N). Although most of the classical parametrized families of ﬁltrations are semi-algebraic, the o-minimal framework actually allows to consider larger families. In particular, the result of (Wilkie, 1996) says that the family of images of the sublevel sets of functions in R[x1, . . . , x N, exp(x1), . . . , exp(x N)] for some N N under linear projections is an o-minimal structure, which allows us to mix exponential functions with semi-algebraic functions.

3.2. Persistence diagrams of deﬁnable parametrized families of ﬁltrations

Let K be a simplicial complex and Φ: A R|K| be a parametrized family of ﬁltrations that is deﬁnable in a given o-minimal structure. If for any x, x A, the preorders induced by Φ(x) and Φ(x ) on the simplices of K are the same, i.e., for any σ1, σ2 K, Φσ1(x) Φσ2(x) if and only if Φσ1(x ) Φσ2(x ), then the pairs of simplices (σi1, σj1), . . . , (σip, σjp), and the unpaired simplices σip+1, . . . , σip+q that are computed by the persistence Algorithm 1 are independent of x. Then, x A, the persistence diagram D = D(Φ(x)) of Φ(x) is

k=1 (Φσik (x), Φσjk (x))

k=1 (Φσip+k (x), + ),

where |K| = 2p + q.

4It is the family of all ﬁnite unions and intersections of level sets and sublevel sets of polynomials (Benedetti & Risler, 1991).

Optimizing persistent homology based functions

Given the lexicographic order on R (R {+ }), the points of any ﬁnite multi-set D R (R {+ }) with p points in R2 and q points in R {+ } can be ordered in non-decreasing order, and D can be represented as a vector in R2p+q. As a consequence, denoting by Filt K the set of vectors in R|K| that deﬁne a ﬁltration on K, the persistence map Pers: Filt K R|K| that assigns to each ﬁltration of K its persistence diagram consists of a permutation of the coordinates of R|K|. This permutation is constant on the set of ﬁltrations that deﬁne the same preorder on the simplices of K. This leads to the following statement.

Proposition 3.2. Given a simplicial complex K, the map Pers: Filt K R|K| R|K| is semi-algebraic, and thus deﬁnable in any o-minimal structure. Moreover, there exists a semi-algebraic partition of Filt K such that the restriction of Pers to each element of this partition is a Lipschitz map.

Proof. See Supplementary Material.

Since there exists a ﬁnite semi-algebraic partition of Filt K on which Pers is a locally constant permutation, the subdifferential (see Section 4 for the deﬁnition) of Pers is welldeﬁned and obvious to compute: each coordinate in the output (i.e., the persistence diagram) is a copy of a coordinate in the input (i.e., the ﬁltration values of the simplices). This implies that every partial derivative is either 1 or 0. The output can be seen as a reindexing of the input, and this is indeed how we implement it in our code, so that automatic differentiation frameworks (Py Torch, Tensor Flow, etc.) can process the function Pers directly and do not need explicit gradient formulas see Section 5. Note that the subdifferential depends on the arbitrary reﬁnement of the preorder in Subsection 2.2.

Corollary 3.3. Let K be a simplicial complex and Φ: A R|K| be a deﬁnable (in a given o-minimal structure) parametrized family of ﬁltrations. The map Pers Φ: A R|K| is deﬁnable.

Note that according to the remark following Proposition 3.2, if Φ is differentiable, the subdifferential of Pers Φ can be easily computed in terms of the partial derivatives of Φ using, for example, Equation (1).

It also follows from standard ﬁniteness and stratiﬁability properties of deﬁnable sets and maps that Pers Φ is differentiable almost everywhere. More precisely:

Proposition 3.4. Let K be a simplicial complex and Φ: A R|K| a deﬁnable parametrized family of ﬁltrations, where dim A = m. Then there exists a ﬁnite deﬁnable partition of A, A = S O1 Ok such that dim S < dim A := m and, for any i = 1, . . . , k, Oi is a deﬁnable manifold of dimension m and Pers Φ: Oi D is differentiable.

3.3. Examples of deﬁnable families of ﬁltrations

Vietoris-Rips ﬁltrations. The family of Vietoris-Rips ﬁltrations built on top of sets of n points x1, . . . , xn Rd is the semi-algebraic parametrized family of ﬁltrations

Φ: A = (Rd)n R| n| = R2n 1,

where n is the simplicial complex made of all the faces of the (n 1)-dimensional simplex. It is deﬁned, for any x = (x1, . . . , xn) A and any simplex σ {1, . . . , n}, by

Φσ(x) = max i,j σ xi xj .

One easily checks that the permutation induced by Pers is constant on the connected components of the complement of the union of the subspaces Si,j,k,l = {(x1, . . . , xn) : xi xj = xk xl } over all the 4-tuples (i, j, k, l) such that at least 3 of the 4 indices i, j, k, l are distinct. This example naturally extends to Vietoris-Rips-like ﬁltrations in the following way. Let A Mn(R) be the set of n n symmetric matrices with non-negative entries and 0 on the diagonal. This is a semi-algebraic subset of the space of n-by-n matrices Mn(R) Rn2, of dimension m = (n 1)(n 2)/2. The map Φ: A R| n| = R2n deﬁned by Φσ(M) = maxi,j σ mi,j for any M = (mi,j)1 i,j n A, is a semi-algebraic family of ﬁltrations. Note that the set S of Proposition 3.4 can be chosen to be the set of matrices with at least 2 entries that are equal.

Weighted Rips ﬁltrations. Given a function f : Rd R, the family of weighted Rips ﬁltrations Φ: A = (Rd)n R| n| = R2n associated with f is deﬁned, for any x = (x1, . . . , xn) A and any simplex σ {1, . . . , n}, by

Φσ(x) = 2f(xj) if σ = [j];

Φσ(x) = max(2f(xi), 2f(xj), xi xj + f(xi) + f(xj)), if σ = [i, j], i = j;

Φσ(x) = max(Φ[i,j](x), i, j σ) if |σ| 3.

Since Euclidean distances and max function are semialgebraic, this family of ﬁltrations is deﬁnable as soon as the weight function f is deﬁnable.

This example easily extends to the case where the weight function depends on the set of points x = (x1, . . . , xn): the weight at vertex y is deﬁned by f(x, y) with f : (Rd)n Rd R. A particular example of such a family is given by the so-called DTM ﬁltration (Anai et al., 2020), where f(x, y) is the average distance from y to its k-nearest neighbors in x. In this case, f is semi-algebraic, and the family of DTM ﬁltrations is semi-algebraic.

Optimizing persistent homology based functions

The o-minimal framework also allows us to consider weight functions involving exponential functions (Wilkie, 1996), such as, for instance, kernel-based density estimates with Gaussian kernels.

Sublevel sets ﬁltrations. Let K be a simplicial complex with n vertices v1, . . . , vn. Any real-valued function f deﬁned on the vertices of K can be represented as a vector (f(v1), . . . , f(vn)) Rn. The family of sublevel sets ﬁltrations Φ: A = Rn R|K| of functions on the vertices of K is deﬁned by Φσ(f) = maxi σ fi for any f = (f1, . . . , fn) A and any simplex σ {1, . . . , n}. This ﬁltration is also known as the lower-star ﬁltration of f. The function Φ is obviously semi-algebraic, and for Proposition 3.4 to hold it is sufﬁcient to choose S = S

1 i<j n{f = (f1, . . . , fn) A : fi = fj}.

4. Minimization of functions of persistence

Using the same notation as in the previous section, recall that the space of persistence diagrams associated with a ﬁltration of K is identiﬁed with R|K| = (R2)p Rq, where each point in the p copies of R2 is a point with ﬁnite coordinates in the persistence diagram and each coordinate in Rq is the x-coordinate of a point with inﬁnite persistence.

Deﬁnition 4.1. A function E : R|K| = (R2)p Rq R is said to be a function of persistence if it is invariant to permutations of the points of the persistence diagram, i.e., for any (p1, . . . , pp, e1, . . . , eq) (R2)p Rq and any permutations α, β of the sets {1, . . . , p} and {1, . . . , q}, respectively, one has E(pα(1), . . . , pα(p), eβ(1), . . . , eβ(q))

= E(p1, . . . , pp, e1, . . . , eq).

It follows from this permutation invariance and Proposition 3.2 that if a function of persistence E : R2p+q = R|K| R is locally Lipschitz, then the composition E Pers is also locally Lipschitz. Moreover, if E is deﬁnable in an o-minimal structure, then for any deﬁnable parametrized family of ﬁltrations Φ: A Rd R|K|, the composition L = E Pers Φ: A R is also deﬁnable. As a consequence, L has a well-deﬁned Clarke subdifferential L(z) := Conv{limzi z L(zi) : L is differentiable at zi}, since it is differentiable almost everywhere thanks to Proposition 3.4.

4.1. Stochastic gradient descent

To minimize L, we consider the differential inclusion

dt L(z(t)) for almost every t,

whose solutions z(t) are the trajectories of the subgradient of L. They can be approximated by the standard stochastic

subgradient algorithm given by the iterations of

xk+1 = xk αk(yk + ζk), yk L(xk), (2)

where the sequence (αk)k is the learning rate and (ζk)k is a sequence of random variables. In (Davis et al., 2020), the authors prove that under mild technical conditions on these two sequences, the stochastic subgradient algorithm converges almost surely to a critical point of L as soon as L is locally Lipschitz.

More precisely, consider the following assumptions, which correspond to Assumption C in (Davis et al., 2020):

1. for any k, αk 0, P k=1 αk = + and, P k=1 α2 k < + ; 2. supk xk < + , almost surely; 3. denoting by Fk the increasing sequence of σ-algebras Fk = σ(xj, yj, ζj, j < k), there exists a function p: Rd R which is bounded on bounded sets such that almost surely, for any k,

E[ζk|Fk] = 0 and E[ ζk 2|Fk] < p(xk).

These assumptions are standard and not very restrictive. Assumption 1 depends on the choice of the learning rate by the user and is easily satisﬁed, e.g., taking αk = 1/k. Assumption 2 is usually easy to check for most of the functions L encountered in practice. Assumption 3 is a standard condition, which states that, conditioned upon the past, the variables ζk have zero mean and controlled moments; e.g., this can be achieved by taking a sequence of independent and centered variables with bounded variance that are also independent of the xk s and yk s.

Under these assumptions, the following result is an immediate consequence of Corollary 5.9 in (Davis et al., 2020).

Theorem 4.2. Let K be a simplicial complex, A Rd, and Φ: A R|K| a parametrized family of ﬁltrations of K that is deﬁnable in an o-minimal structure. Let E : R|K| R be a deﬁnable function of persistence such that L = E Pers Φ is locally Lipschitz. Then, under the above assumptions 1, 2, and 3, almost surely the limit points of the sequence (xk)k obtained from the iterations of Equation (2) are critical points of L and the sequence (L(xk))k converges.

The above theorem provides explicit conditions ensuring the convergence of stochastic subgradient descent for functions of persistence. The main criterion to be checked is the locally Lipschitz condition for L. From the remark following Deﬁnition 4.1, it is sufﬁcient to check that Φ and E are Lipschitz. Regarding Φ, it is obvious for the examples of Subsection 3.3. However, this is not the case for some other examples, such as the so-called alpha-complex ﬁltration that can be made locally Lipschitz using a simple technical trick see Supplementary Material.

Optimizing persistent homology based functions

4.2. Examples of deﬁnable locally Lipschitz functions of persistence

Total persistence. Let E be the sum of the distances to the diagonal of each point of a persistence diagram with ﬁnite coordinates: given a persistence diagram represented as a vector in R2p+q, D = ((b1, d1), . . . , (bp, dp), e1, . . . , eq),

i=1 |di bi|.

Then E is obviously semi-algebraic, and thus deﬁnable in any o-minimal structure. It is also Lipschitz.

Wasserstein and bottleneck distance Given a persistence diagram D, and another target persistence diagram D , the bottleneck distance between the regular part of D and that of D (see Deﬁnition 2.2) is given by

E(D) = d B(Dreg, D reg) = min m max (p,p ) m ||p p || ,

where m is a partial matching between Dreg and D reg, i.e., a subset of (Dreg ) (D reg ), with = {(x, x) : x R} being the diagonal in R2, such that every point of Dreg \ and D reg \ appears exactly once in m. One can easily check that the map E is semi-algebraic, and thus deﬁnable in any o-minimal structure. It is also Lipschitz. This property also extends to the case where the bottleneck distance is replaced by the so-called Wasserstein distance Wp with p N (Cohen-Steiner et al., 2010), or its approximation, the Sliced Wasserstein distance (Carri ere et al., 2017). Optimization of these functions and other functions of bottleneck and Wasserstein distances have been used, for example, in shape matching (Poulenard et al., 2018). See also the example on 3D shape in Supplementary Material.

Persistence landscapes (Bubenik, 2015) To any given point p = (x, y) R2 with x = b+d

2 and y = d b

2 , associate the function Λp : R R deﬁned by

t b (t [b, b+d

2 ]) d t (t ( b+d

2 , d]) 0 (otherwise).

Given a persistence diagram D, the persistence landscape of D is a summary of the arrangement of the graphs of the functions Λp, p D:

λD(k, t) = k -max p D Λp(t), t [0, T], k Z+,

where k -max is the kth largest value in the set, or 0 when the set contains less than k points. Given a positive integer k, a ﬁnite set {t1, . . . , tn} R, and a simplicial complex K, the map that associates the vector

(λD(k, t1), . . . , λD(k, tn)) to each persistence diagram D of a ﬁltration of K is Lipschitz (Bubenik, 2015) and clearly semi-algebraic.

Other classical ways to vectorize persistence diagrams are the linear representations (Chazal & Divol, 2018) which are also deﬁnable in o-minimal structures, such as, e.g., persistence images (Adams et al., 2017) see Supplementary Material. In (Divol & Lacombe, 2020), the authors give explicit conditions for such representations to be locally Lipschitz.

5. Numerical illustrations

We showed in Sections 3 and 4 that the usual stochastic gradient descent procedure of Equation (2) enjoys some convergence properties for persistence-based functions. This means in particular that the algorithms available in standard libraries such as Tensor Flow and Py Torch, which implement stochastic gradient descent (among other optimization methods), can be leveraged and used as is for differentiating persistence diagrams, while still ensuring convergence. The purpose of this section is to illustrate that our code, which implements the general gradient deﬁned in Proposition 3.4 for persistence-based functions, and which is based on Gudhi5 and Tensor Flow, can be readily used for studying several different persistence optimization tasks. Along the way, we also suggest regularization terms that one can add to topological losses in order to avoid unwanted behaviors. We only present a few applications due to lack of space, and we refer the interested reader to Supplementary Material and the publicly available code for more examples.

Point cloud optimization. A toy example in persistence optimization is to modify the positions of the points in a point cloud so that its homology is maximized (Br uel Gabrielsson et al., 2020; Gameiro et al., 2016). In this experiment, we start with a point cloud X sampled uniformly from the unit square S = [0, 1]2, and then optimize the point coordinates so that the loss L(X) = P(X) + T(X) is minimized. Here T(X) := P

p D p π (p) 2 is a topological penalty, D is the 1-dimensional persistence diagram associated with the Vietoris-Rips ﬁltration of X, π stands for the projection onto the diagonal , and P(X) := P

x X d(x, S) is a penalty term ensuring that the point coordinates stay in the unit square. The topological penalty T(X) was used in (Br uel-Gabrielsson et al., 2020), and ensures that points in the persistence diagram D are as far away from the diagonal as possible, which in turns means that the corresponding holes in the point cloud are as large as possible. However, we point out that if one uses T(X) alone without the penalty P(X), as in (Br uel-Gabrielsson et al., 2020), then convergence is very difﬁcult to reach since

5See https://gudhi.inria.fr/

Optimizing persistent homology based functions

inﬂating the point cloud with dilations can make the topological penalty T(X) arbitrarily small. In contrast, using our second term P(X) in addition to T(X) constrains the points to stay in a ﬁxed region S of the Euclidean plane. Another effect of the penalty P(X) is to ﬂatten the boundary of the created holes along the boundary of S. See Figure 1 for an illustration.

Figure 1. Illustration of point cloud optimization. We initialize with a random point cloud (upper left), and we show the optimized point cloud (upper right) when optimization is done with topological and regularization losses. We also show the convergence of the total loss (lower right). When only topological loss is used, the optimized point cloud inﬂated some loops to minimize the loss (lower left). Note how the coordinates are now much larger.

Dimensionality reduction. In this experiment, we show how our general setup can be used to reduce dimension with the so-called topological autoencoders introduced in (Moor et al., 2020). In this family of autoencoders, a topological loss T(X, Z) between the input space X and latent space Z is used in addition to the usual loss D(X, Z) = P

i xi zi 2 2. This topological loss was computed in (Moor et al., 2020) by (i) computing the permutations induced by the persistence map (see Subsection 3.2) of the Vietoris-Rips complexes built from the input space X and the latent space Z, (ii) computing, for each simplex in these permutations, the corresponding edge that induces its ﬁltration value, and (iii) measuring, for all those edges, the differences between the edge lengths in X and the same edge lengths in Z. To sum up, the loss function is deﬁned as

L(X, Z) = MX[πX] MZ[πX] 2 2 + MX[πZ] MZ[πZ] 2 2,

where MX, MZ are the distance matrices of the input and the latent spaces respectively, and where πX, πZ denote the indices of the entries in MX, MZ that are picked by

the permutation induced by the persistence map to generate the Vietoris-Rips persistence diagrams of X and Z. Note that L is obviously semi-algebraic and thus ﬁts in our framework. Moreover, in our setup we can directly use the bottleneck and Wasserstein distances between the Vietoris Rips persistence diagrams of the input and latent spaces as the topological loss. This is relevant since in (Moor et al., 2020) the authors pointed out that looking at homology in dimension larger than 1 was not adding anything for their loss, and stuck to 0-dimensional homology. We show in Figure 2 an example in which 1-dimensional homology is also important, that is, a point cloud in R3 that is comprised of two nested circles, which is then non-linearly embedded in in R9 by converting each point p = (x, y, z) into the exponential of the 3x3 anti-symmetric matrix whose coefﬁcients are x, y and z. We then train an autoencoder made of four fully-connected layers with 32 neurons and Re LU activations, using the usual loss, the usual plus the topological loss described above, and the usual plus a topological loss computed as L(X, Z) = W1(DX, DZ), i.e., the 1-Wasserstein distance between the 1-dimensional Vietoris-Rips persistence diagrams of the input and latent spaces. It can be seen from Figure 2 that autoencoders without the Wasserstein loss cannot embed the point cloud in the plane perfectly, while using the Wasserstein loss between the 1-dimensional Vietoris-Rips persistence diagrams improves on the result by separating better the two intrinsic circles.

Figure 2. Example of dimension reduction with autoencoders. An initial point cloud made of two circles (upper left) is embedded in R9, and then fed to autoencoders that either do not use topology, or only use the distances induced by the persistence maps in dimension 0. The resulting embeddings (lower left, we only show one but the two are similar) cannot separate the circles, while using 1-dimensional topology induces a better embedding (lower right). Convergence of the loss function is also provided (upper right).

Filter selection. In this experiment, we address a very common issue in Topological Data Analysis, ﬁlter selection. Indeed, when computing persistence diagrams in order to

Optimizing persistent homology based functions

Dataset Baseline Before After Difference vs01 100.0 61.3 99.0 +37.6 vs02 99.4 98.8 97.2 -1.6 vs06 99.4 87.3 98.2 +10.9 vs09 99.4 86.8 98.3 +11.5 vs16 99.7 89.0 97.3 +8.3 vs19 99.6 84.8 98.0 +13.2 vs24 99.4 98.7 98.7 0.0 vs25 99.4 80.6 97.2 +16.6

Dataset Baseline Before After Difference vs26 99.7 98.8 98.2 -0.6 vs28 99.1 96.8 96.8 0.0 vs29 99.1 91.6 98.6 +7.0 vs34 99.8 99.4 99.1 -0.3 vs36 99.7 99.3 99.3 -0.1 vs37 98.9 94.9 97.5 +2.6 vs57 99.7 90.5 97.2 +6.7 vs79 99.1 85.3 96.9 +11.5

Table 1. Accuracy scores obtained from persistence diagrams before and after performing our optimization over the image ﬁltration. Note that the difference between the scores is almost always positive, i.e., there is almost always improvement after our optimization process. Scores do not have standard deviations since we use the train/test splits of the mnist.load data function in Tensor Flow 2.

generate topological features from a data set for further data analysis, the ﬁlter function that is being used to ﬁlter the data set always has to be speciﬁed a priori. Here, we provide a very simple heuristic to tune it if it comes from a parametrized family F of ﬁlters and if the learning task is supervised, which is the case in, e.g., classiﬁcation. We simply start from a random guess in F and then optimize the following criterion, inspired from (Zhao & Wang, 2019):

i,j:yi=yj=l Wp(Di(f), Dj(f)) P

i,j:yi=l Wp(Di(f), Dj(f)) , (3)

which amounts to minimizing the distances between persistence diagrams that share the same label, and increasing the distances between persistence diagrams with different labels. Note that the batch size that we use in this optimization process has a big inﬂuence on the computation time, since the larger the batch size, the more Wasserstein distances we will have to compute in our cost. To cope with this issue, we actually used the Sliced Wasserstein distance SW (Carri ere et al., 2017) instead of Wp, which, since it is computed with projections onto lines, can be deﬁned entirely with matrix operations that are usually available in any library with autodifferentiation. This drastically improves on computation time, while remaining in our framework since the Sliced Wasserstein distance is also a semi-algebraic function.

We classify images from the MNIST data set. We assign values to the pixels using a height function given by a direction (parametrized by an angle in the Euclidean plane), and we use 0-dimensional persistence diagrams computed after optimizing this direction using loss (3). See Figure 3.

We then compute the accuracy scores obtained with a random forest classiﬁer for the (binary) classiﬁcation tasks digit x vs. digit y for all pairs 0 x, y 9, using the ﬁrst ﬁve persistence landscapes with resolution 100 associated with the persistence diagrams before and after optimization. Even though our primary goal is to demonstrate that optimizing the ﬁlter almost always lead to an improvement, we also add a baseline score obtained by training a random

Figure 3. Example of images and directions inducing different height functions. Different directions generate different height functions and ﬁltrations and thus different persistence diagrams. In this experiment, we optimize over the direction so that the persistence diagrams are the most efﬁcient for image classiﬁcation.

forest classiﬁer directly on the images for proper comparison. Some of the scores are displayed in Table 1 (the full table can be found in Supplementary Material). Interestingly, when starting with a random direction, scores can be much worse than the baseline, but our optimization process is then able to select the best direction that induces the best persistence diagrams (with respect to the classiﬁcation task) without prior knowledge on the data set.

6. Conclusion

In this article we introduced a theoretical framework that encompasses most of the previous methods for optimizing topology-based functions. In particular, we obtained convergence results for very general classes of functions with topological ﬂavor computed with persistence theory, and provided corresponding code that one can use to reproduce previously introduced topological optimization tasks. For future work, we are planning to further investigate tasks related to classiﬁer regularization in ML (Chen et al., 2019), and to improve on computation time using, e.g., vineyards (Cohen-Steiner et al., 2006).

Optimizing persistent homology based functions

Adams, H., Emerson, T., Kirby, M., Neville, R., Peterson, C., Shipman, P., Chepushtanova, S., Hanson, E., Motta, F., and Ziegelmeier, L. Persistence images: A stable vector representation of persistent homology. Journal of Machine Learning Research, 18(1):218 252, 2017.

Anai, H., Chazal, F., Glisse, M., Ike, Y., Inakoshi, H., Tinarrage, R., and Umeda, Y. DTM-based Filtrations. In Topological Data Analysis, pp. 33 66. Springer, 2020.

Benedetti, R. and Risler, J.-J. Real algebraic and semialgebraic sets. Hermann, 1991.

Boissonnat, J.-D., Chazal, F., and Yvinec, M. Geometric and topological inference, volume 57. Cambridge University Press, 2018.

Br uel-Gabrielsson, R., Ganapathi-Subramanian, V., Skraba, P., and Guibas, L. J. Topology-aware surface reconstruction for point clouds. In Computer Graphics Forum, volume 39, pp. 197 207. Wiley Online Library, 2020.

Br uel-Gabrielsson, R., Nelson, B., Dwaraknath, A., Skraba, P., Guibas, L., and Carlsson, G. A topology layer for machine learning. In 23rd International Conference on Artiﬁcial Intelligence and Statistics (AISTATS 2020), pp. 1553 1563. PMLR, 2020.

Bubenik, P. Statistical topological data analysis using persistence landscapes. Journal of Machine Learning Research, 16(1):77 102, 2015.

Carlsson, G. and Gabrielsson, R. B. Topological approaches to deep learning. In Topological Data Analysis, pp. 119 146. Springer, 2020.

Carri ere, M., Cuturi, M., and Oudot, S. Sliced Wasserstein kernel for persistence diagrams. In 34th International Conference on Machine Learning (ICML 2017), volume 70, pp. 664 673. JMLR.org, 2017.

Carri ere, M., Chazal, F., Ike, Y., Lacombe, T., Royer, M., and Umeda, Y. Pers Lay: a neural network layer for persistence diagrams and new graph topological signatures. In 23rd International Conference on Artiﬁcial Intelligence and Statistics (AISTATS 2020), pp. 2786 2796. PMLR, 2020.

Chazal, F. and Divol, V. The density of expected persistence diagrams and its kernel based estimation. In 34th International Symposium on Computational Geometry (So CG 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.

Chen, C., Ni, X., Bai, Q., and Wang, Y. A topological regularizer for classiﬁers via persistent homology. In

22nd International Conference on Artiﬁcial Intelligence and Statistics (AISTATS 2019), volume 89, pp. 2573 2582. PMLR, 2019.

Clough, J., Byrne, N., Oksuz, I., Zimmer, V. A., Schnabel, J. A., and King, A. A topological loss function for deep-learning based image segmentation using persistent homology. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.

Cohen-Steiner, D., Edelsbrunner, H., and Morozov, D. Vines and vineyards by updating persistence in linear time. In Amenta, N. and Cheong, O. (eds.), 22nd Annual Symposium on Computational Geometry (So CG 2006), pp. 119 126. Association for Computing Machinery, 2006.

Cohen-Steiner, D., Edelsbrunner, H., Harer, J., and Mileyko, Y. Lipschitz functions have Lp-stable persistence. Foundations of computational mathematics, 10(2):127 139, 2010.

Coste, M. An introduction to o-minimal geometry. Istituti editoriali e poligraﬁci internazionali Pisa, 2000.

Davis, D., Drusvyatskiy, D., Kakade, S. M., and Lee, J. D. Stochastic subgradient method converges on tame functions. Foundations of Computational Mathematics, 20(1): 119 154, 2020.

Dindin, M., Umeda, Y., and Chazal, F. Topological data analysis for arrhythmia detection through modular neural networks. In Canadian Conference on Artiﬁcial Intelligence, pp. 177 188. Springer, 2020.

Divol, V. and Lacombe, T. Understanding the topology and the geometry of the persistence diagram space via optimal partial transport. Journal of Applied and Computational Topology, 5:1 53, 2020.

Edelsbrunner, H. and Harer, J. Computational topology: an introduction. American Mathematical Soc., 2010.

Gabrielsson, R. B. and Carlsson, G. Exposition and interpretation of the topology of neural networks. In 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), pp. 1069 1076. IEEE, 2019.

Gameiro, M., Hiraoka, Y., and Obayashi, I. Continuation of point clouds via persistence diagrams. Physica D: Nonlinear Phenomena, 334:118 132, 2016.

Hofer, C., Kwitt, R., Niethammer, M., and Uhl, A. Deep learning with topological signatures. In Advances in Neural Information Processing Systems 30 (NIPS 2017), pp. 1634 1644. Curran Associates, Inc., 2017.

Hofer, C., Kwitt, R., Niethammer, M., and Dixit, M. Connectivity-optimized representation learning via persistent homology. In 36th International Conference on

Optimizing persistent homology based functions

Machine Learning (ICML 2019), pp. 2751 2760. PMLR, 2019.

Hofer, C., Graf, F., Niethammer, M., and Kwitt, R. Topologically densiﬁed distributions. In 37th International Conference on Machine Learning (ICML 2020), volume 119, pp. 4304 4313. PMLR, 2020.

Kim, K., Kim, J., Zaheer, M., Kim, J., Chazal, F., and Wasserman, L. Efﬁcient topological layer based on persistent landscapes. In Advances in Neural Information Processing Systems 33 (Neur IPS 2020), 2020.

Leygonie, J., Oudot, S., and Tillmann, U. A framework for differential calculus on persistence barcodes. Foundations of Computational Mathematics (to appear), 2021.

Moor, M., Horn, M., Rieck, B., and Borgwardt, K. Topological autoencoders. In 37th International Conference on Machine Learning (ICML 2020), volume 119, pp. 7045 7054, 2020.

Poulenard, A., Skraba, P., and Ovsjanikov, M. Topological function optimization for continuous shape matching. In Computer Graphics Forum, volume 37, pp. 13 25. Wiley Online Library, 2018.

Rieck, B., Togninalli, M., Bock, C., Moor, M., Horn, M., Gumbsch, T., and Borgwardt, K. Neural persistence: a complexity measure for deep neural networks using algebraic topology. In 7th International Conference on Learning Representations (ICLR 2019). Open Reviews.net, 2019.

Solomon, Y., Wagner, A., and Bendich, P. A fast and robust method for global topological functional optimization. In 24th International Conference on Artiﬁcial Intelligence and Statistics (AISTATS 2021), volume 130, pp. 109 117. PMLR, 2021.

Umeda, Y. Time series classiﬁcation via topological data analysis. Information and Media Technologies, 12:228 239, 2017.

van den Dries, L. and Miller, C. Geometric categories and o-minimal structures. Duke Mathematical Journal, 84(2): 497 540, 1996.

Wang, F., Liu, H., Samaras, D., and Chen, C. Topogan: A topology-aware generative adversarial network. In European Conference on Computer Vision (ECCV), 2020.

Wilkie, A. J. Model completeness results for expansions of the ordered ﬁeld of real numbers by restricted pfafﬁan functions and the exponential function. Journal of the American Mathematical Society, 9(4):1051 1094, 1996.

Zhao, Q. and Wang, Y. Learning metrics for persistencebased summaries and applications for graph classiﬁcation. In Advances in Neural Information Processing Systems 32 (Neur IPS 2019), pp. 9855 9866. Curran Associates, Inc., 2019.