# unsupervised_deep_haar_scattering_on_graphs__7ad1cb5f.pdf Unsupervised Deep Haar Scattering on Graphs Xu Chen1,2, Xiuyuan Cheng2, and St ephane Mallat2 1Department of Electrical Engineering, Princeton University, NJ, USA 2D epartement d Informatique, Ecole Normale Sup erieure, Paris, France The classification of high-dimensional data defined on graphs is particularly difficult when the graph geometry is unknown. We introduce a Haar scattering transform on graphs, which computes invariant signal descriptors. It is implemented with a deep cascade of additions, subtractions and absolute values, which iteratively compute orthogonal Haar wavelet transforms. Multiscale neighborhoods of unknown graphs are estimated by minimizing an average total variation, with a pair matching algorithm of polynomial complexity. Supervised classification with dimension reduction is tested on data bases of scrambled images, and for signals sampled on unknown irregular grids on a sphere. 1 Introduction The geometric structure of a data domain can be described with a graph [11], where neighbor data points are represented by vertices related by an edge. For sensor networks, this connectivity depends upon the sensor physical locations, but in social networks it may correspond to strong interactions or similarities between two nodes. In many applications, the connectivity graph is unknown and must therefore be estimated from data. We introduce an unsupervised learning algorithm to classify signals defined on an unknown graph. An important source of variability on graphs results from displacement of signal values. It may be due to movements of physical sources in a sensor network, or to propagation phenomena in social networks. Classification problems are often invariant to such displacements. Image pattern recognition or characterization of communities in social networks are examples of invariant problems. They require to compute locally or globally invariant descriptors, which are sufficiently rich to discriminate complex signal classes. Section 2 introduces a Haar scattering transform which builds an invariant representation of graph data, by cascading additions, subtractions and absolute values in a deep network. It can be factorized as a product of Haar wavelet transforms on the graph. Haar wavelet transforms are flexible representations which characterize multiscale signal patterns on graphs [6, 10, 11]. Haar scattering transforms are extensions on graphs of wavelet scattering transforms, previously introduced for uniformly sampled signals [1]. For unstructured signals defined on an unknown graph, recovering the full graph geometry is an NP complete problem. We avoid this complexity by only learning connected multiresolution graph approximations. This is sufficient to compute Haar scattering representations. Multiscale neighborhoods are calculated by minimizing an average total signal variation over training examples. It involves a pair matching algorithm of polynomial complexity. We show that this unsupervised learning algorithms computes sparse scattering representations. This work was supported by the ERC grant Invariant Class 320959. x S1x S2x S3x Figure 1: A Haar scattering network computes each coefficient of a layer Sj+1x by adding or subtracting a pair of coefficients in the previous layer Sjx. For classification, the dimension of unsupervised Haar scattering representations are reduced with supervised partial least square regressions [12]. It amounts to computing a last layer of reduced dimensionality, before applying a Gaussian kernel SVM classifier. The performance of a Haar scattering classification is tested on scrambled images, whose graph geometry is unknown. Results are provided for MNIST and CIFAR-10 image data bases. Classification experiments are also performed on scrambled signals whose samples are on an irregular grid of a sphere. All computations can be reproduced with a software available at www.di.ens.fr/data/scattering/haar. 2 Orthogonal Haar Scattering on a Graph 2.1 Deep Networks of Permutation Invariant Operators We consider signals x defined on an unweighted graph G = (V, E), with V = {1, ..., d}. Edges relate neighbor vertices. We suppose that d is a power of 2 to simplify explanations. A Haar scattering is calculated by iteratively applying the following permutation invariant operator (α, β) (α + β, |α β|) . (1) Its values are not modified by a permutation of α and β, and both values are recovered by max(α, β) = 1 2 α + β + |α β| and min(α, β) = 1 2 α + β |α β| . (2) An orthogonal Haar scattering transform computes progressively more invariant signal descriptors by applying this invariant operator at multiple scales. This is implemented along a deep network illustrated in Figure 1. The network layer j is a two-dimensional array Sjx(n, q) of d = 2 jd 2j coefficients, where n is a node index and q is a feature type. The input network layer is S0x(n, 0) = x(n). We compute Sj+1x by regrouping the 2 jd nodes of Sjx in 2 j 1d pairs (an, bn), and applying the permutation invariant operator (1) to each pair (Sjx(an, q), Sjx(bn, q)): Sj+1x(n, 2q) = Sjx(an, q) + Sjx(bn, q) (3) and Sj+1x(n, 2q + 1) = |Sjx(an, q) Sjx(bn, q)| . (4) This transform is iterated up to a maximum depth J log2(d). It computes SJx with Jd/2 additions, subtractions and absolute values. Since Sjx 0 for j > 0, one can put an absolute value on the sum in (3) without changing Sj+1x. It results that Sj+1x is calculated from the previous layer Sjx by applying a linear operator followed by a non-linearity as in most deep neural network architectures. In our case this non-linearity is an absolute value as opposed to rectifiers used in most deep networks [4]. For each n, the 2j scattering coefficients {Sjx(n, q)}0 q<2j are calculated from the values of x in a vertex set Vj,n of size 2j. One can verify by induction on (3) and (4) that V0,n = {n} for 0 n < d, and for any j 0 Vj+1,n = Vj,an Vj,bn . (5) Figure 2: A connected multiresolution is a partition of vertices with embedded connected sets Vj,n of size 2j. (a): Example of partition for the graph of a square image grid, for 1 j 3. (b): Example on an irregular graph. The embedded subsets {Vj,n}j,n form a multiresolution approximation of the vertex set V . At each scale 2j, different pairings (an, bn) define different multiresolution approximations. A small graph displacement propagates signal values from a node to its neighbors. To build nearly invariant representations over such displacements, a Haar scattering transform must regroup connected vertices. It is thus computed over multiresolution vertex sets Vj,n which are connected in the graph G. It results from (5) that a necessary and sufficient condition is that each pair (an, bn) regroups two connected sets Vj,an and Vj,bn. Figure 2 shows two examples of connected multiresolution approximations. Figure 2(a) illustrates the graph of an image grid, where pixels are connected to 8 neighbors. In this example, each Vj+1,n regroups two subsets Vj,an and Vj,bn which are connected horizontally if j is even and connected vertically if j is odd. Figure 2(b) illustrates a second example of connected multiresolution approximation on an irregular graph. There are many different connected multiresolution approximations resulting from different pairings at each scale 2j. Different multiresolution approximations correspond to different Haar scattering transforms. In the following, we compute several Haar scattering transforms of a signal x, by defining different multiresolution approximations. The following theorem proves that a Haar scattering preserves the norm and that it is contractive up to a normalization factor 2j/2. The contraction is due to the absolute value which suppresses the sign and hence reduces the amplitude of differences. The proof is in Appendix A. Theorem 2.1. For any j 0, and any x, x defined on V Sjx Sjx 2j/2 x x , Sjx = 2j/2 x . 2.2 Iterated Haar Wavelet Transforms We show that a Haar scattering transform can be written as a cascade of orthogonal Haar wavelet transforms and absolute value non-linearities. It is a particular example of scattering transforms introduced in [1]. It computes coefficients measuring signal variations at multiple scales and multiple orders. We prove that the signal can be recovered from Haar scattering coefficients computed over enough multiresolution approximations. A scattering operator is contractive because of the absolute value. When coefficients have an arbitrary sign, suppressing the sign reduces by a factor 2 the volume of the signal space. We say that SJx(n, q) is a coefficient of order m if its computation includes m absolute values of differences. The amplitude of scattering coefficients typically decreases exponentially when the scattering order m increases, because of the contraction produced by the absolute value. We verify from (3) and (4) that SJx(n, q) is a coefficient of order m = 0 if q = 0 and of order m > 0 if k=1 2J jk for 0 jk < jk+1 J . It results that there are J m 2 Jd coefficients SJx(n, q) of order m. We now show that Haar scattering coefficients of order m are obtained by cascading m orthogonal Haar wavelet tranforms defined on the graph G. A Haar wavelet at a scale 2J is defined over each Vj,n = Vj 1,an Vj 1,bn by ψj,n = 1Vj 1,an 1Vj 1,bn . For any J 0, one can verify [10, 6] that {1VJ,n}0 n<2 Jd {ψj,n}0 n<2 jd,0 j