# tensor_balancing_on_statistical_manifold__10de1cc7.pdf Tensor Balancing on Statistical Manifold Mahito Sugiyama 1 2 Hiroyuki Nakahara 3 Koji Tsuda 4 5 6 We solve tensor balancing, rescaling an Nth order nonnegative tensor by multiplying N tensors of order N 1 so that every fiber sums to one. This generalizes a fundamental process of matrix balancing used to compare matrices in a wide range of applications from biology to economics. We present an efficient balancing algorithm with quadratic convergence using Newton s method and show in numerical experiments that the proposed algorithm is several orders of magnitude faster than existing ones. To theoretically prove the correctness of the algorithm, we model tensors as probability distributions in a statistical manifold and realize tensor balancing as projection onto a submanifold. The key to our algorithm is that the gradient of the manifold, used as a Jacobian matrix in Newton s method, can be analytically obtained using the M obius inversion formula, the essential of combinatorial mathematics. Our model is not limited to tensor balancing, but has a wide applicability as it includes various statistical and machine learning models such as weighted DAGs and Boltzmann machines. 1. Introduction Matrix balancing is the problem of rescaling a given square nonnegative matrix A Rn n 0 to a doubly stochastic matrix RAS, where every row and column sums to one, by multiplying two diagonal matrices R and S. This is a fundamental process for analyzing and comparing matrices in a wide range of applications, including input-output analysis in economics, called the RAS approach (Parikh, 1979; Miller & Blair, 2009; Lahr & de Mesnard, 2004), seat assignments in elections (Balinski, 2008; Akartunalı & 1National Institute of Informatics 2JST PRESTO 3RIKEN Brain Science Institute 4Graduate School of Frontier Sciences, The University of Tokyo 5RIKEN AIP 6NIMS. Correspondence to: Mahito Sugiyama . Proceedings of the 34 th International Conference on Machine Learning, Sydney, Australia, PMLR 70, 2017. Copyright 2017 by the author(s). Every fiber sums to 1 Given tensor A Multistochastic tensor A Submanifold 퓢(β) Probability distribution P Statistical manifold 퓢 (dually flat Riemannian manifold) Projection Tensor balancing Projected distribution Pβ Projected distribution Pβ Figure 1. Overview of our approach. Knight, 2016), Hi-C data analysis (Rao et al., 2014; Wu & Michor, 2016), the Sudoku puzzle (Moon et al., 2009), and the optimal transportation problem (Cuturi, 2013; Frogner et al., 2015; Solomon et al., 2015). An excellent review of this theory and its applications is given by Idel (2016). The standard matrix balancing algorithm is the Sinkhorn Knopp algorithm (Sinkhorn, 1964; Sinkhorn & Knopp, 1967; Marshall & Olkin, 1968; Knight, 2008), a special case of Bregman s balancing method (Lamond & Stewart, 1981) that iterates rescaling of each row and column until convergence. The algorithm is widely used in the above applications due to its simple implementation and theoretically guaranteed convergence. However, the algorithm converges linearly (Soules, 1991), which is prohibitively slow for recently emerging large and sparse matrices. Although Livne & Golub (2004) and Knight & Ruiz (2013) tried to achieve faster convergence by approximating each step of Newton s method, the exact Newton s method with quadratic convergence has not been intensively studied yet. Another open problem is tensor balancing, which is a generalization of balancing from matrices to higher-order multidimentional arrays, or tensors. The task is to rescale an Nth order nonnegative tensor to a multistochastic tensor, in which every fiber sums to one, by multiplying (N 1)th order N tensors. There are some results about mathematical properties of multistochastic tensors (Cui et al., 2014; Chang et al., 2016; Ahmed et al., 2003). However, there is no result for tensor balancing algorithms with guaranteed convergence that transforms a given tensor to a multistochastic tensor until now. Tensor Balancing on Statistical Manifold Here we show that Newton s method with quadratic convergence can be applied to tensor balancing while avoiding solving a linear system on the full tensor. Our strategy is to realize matrix and tensor balancing as projection onto a dually flat Riemmanian submanifold (Figure 1), which is a statistical manifold and known to be the essential structure for probability distributions in information geometry (Amari, 2016). Using a partially ordered outcome space, we generalize the log-linear model (Agresti, 2012) used to model the higher-order combinations of binary variables (Amari, 2001; Ganmor et al., 2011; Nakahara & Amari, 2002; Nakahara et al., 2003), which allows us to model tensors as probability distributions in the statistical manifold. The remarkable property of our model is that the gradient of the manifold can be analytically computed using the M obius inversion formula (Rota, 1964), the heart of combinatorial mathematics (Ito, 1993), which enables us to directly obtain the Jacobian matrix in Newton s method. Moreover, we show that (n 1)N entries for the size n N of a tensor are invariant with respect to one of the two coordinate systems of the statistical manifold. Thus the number of equations in Newton s method is O(n N 1). The remainder of this paper is organized as follows: We begin with a low-level description of our matrix balancing algorithm in Section 2 and demonstrate its efficiency in numerical experiments in Section 3. To guarantee the correctness of the algorithm and extend it to tensor balancing, we provide theoretical analysis in Section 4. In Section 4.1, we introduce a generalized log-linear model associated with a partial order structured outcome space, followed by introducing the dually flat Riemannian structure in Section 4.2. In Section 4.3, we show how to use Newton s method to compute projection of a probability distribution onto a submanifold. Finally, we formulate the matrix and tensor balancing problem in Section 5 and summarize our contributions in Section 6. 2. The Matrix Balancing Algorithm Given a nonnegative square matrix A = (aij) Rn n 0 , the task of matrix balancing is to find r, s Rn that satisfy (RAS)1 = 1, (RAS)T 1 = 1, (1) where R = diag(r) and S = diag(s). The balanced matrix A = RAS is called doubly stochastic, in which each entry a ij = aijrisj and all the rows and columns sum to one. The most popular algorithm is the Sinkhorn-Knopp algorithm, which repeats updating r and s as r = 1/(As) and s = 1/(AT r). We denote by [n] = {1, 2, . . . , n} hereafter. In our algorithm, instead of directly updating r and s, we update two parameters θ and η defined as j j θi j , ηij = j j pi j (2) a11 a12 a13 a14 a21 a22 a23 a24 a31 a32 a33 a34 a41 a42 a43 a44 η11 η12 η13 η14 η21 θ22 θ23 θ24 η31 θ32 θ33 θ34 η41 θ42 θ43 θ44 Matrix Constraints for balancing Figure 2. Matrix balancing with two parameters θ and η. for each i, j [n], where we normalized entries as pij = aij/ ij aij so that ij pij = 1. We assume for simplicity that each entry is strictly larger than zero. The assumption will be removed in Section 5. The key to our approach is that we update θ(t) ij with i = 1 or j = 1 by Newton s method at each iteration t = 1, 2, . . . while fixing θij with i, j = 1 so that η(t) ij satisfies the following condition (Figure 2): η(t) i1 = (n i + 1)/n, η(t) 1j = (n j + 1)/n. Note that the rows and columns sum not to 1 but to 1/n due to the normalization. The update formula is described as θ(t+1) 11... θ(t+1) 1n θ(t+1) 21... θ(t+1) n1 θ(t) 11... θ(t) 1n θ(t) 21... θ(t) n1 η(t) 11 (n 1 + 1)/n ... η(t) 1n (n n + 1)/n η(t) 21 (n 2 + 1)/n ... η(t) n1 (n n + 1)/n where J is the Jacobian matrix given as J(ij)(i j ) = η(t) ij θ(t) i j = ηmax{i,i } max{j,j } n2ηijηi j , (4) which is derived from our theoretical result in Theorem 3. Since J is a (2n 1) (2n 1) matrix, the time complexity of each update is O(n3), which is needed to compute the inverse of J. After updating to θ(t+1) ij , we can compute p(t+1) ij and η(t+1) ij by Equation (2). Since this update does not ensure the condition ij p(t+1) ij = 1, we again update θ(t+1) 11 as θ(t+1) 11 = θ(t+1) 11 log ij p(t+1) ij and recompute p(t+1) ij and η(t+1) ij for each i, j [n]. By iterating the above update process in Equation (3) until convergence, A = (aij) with aij = npij becomes doubly stochastic. 3. Numerical Experiments We evaluate the efficiency of our algorithm compared to the two prominent balancing methods, the standard Sinkhorn Knopp algorithm (Sinkhorn, 1964) and the state-of-the-art Tensor Balancing on Statistical Manifold Number of iterations 10 50 500 5000 Running time (sec.) 10 50 500 5000 Newton (proposed) Sinkhorn BNEWT Figure 3. Results on Hessenberg matrices. The BNEWT algorithm (green) failed to converge for n 200. Number of iterations 0 500 1000 1500 2000 2500 3000 Sinkhorn Newton Figure 4. Convergence graph on H20. algorithm BNEWT (Knight & Ruiz, 2013), which uses Newton s method-like iterations with conjugate gradients. All experiments were conducted on Amazon Linux AMI release 2016.09 with a single core of 2.3 GHz Intel Xeon CPU E5-2686 v4 and 256 GB of memory. All methods were implemented in C++ with the Eigen library and compiled with gcc 4.8.31. We have carefully implemented BNEWT by directly translating the MATLAB code provided in (Knight & Ruiz, 2013) into C++ with the Eigen library for fair comparison, and used the default parameters. We measured the residual of a matrix A = (a ij) by the squared norm (A 1 1, A T 1 1) 2, where each entry a ij is obtained as npij in our algorithm, and ran each of three algorithms until the residual is below the tolerance threshold 10 6. Hessenberg Matrix. The first set of experiments used a Hessenberg matrix, which has been a standard benchmark for matrix balancing (Parlett & Landis, 1982; Knight & Ruiz, 2013). Each entry of an n n Hessenberg matrix Hn = (hij) is given as hij = 0 if j < i 1 and hij = 1 otherwise. We varied the size n from 10 to 5, 000, and measured running time (in seconds) and the number of iterations of each method. Results are plotted in Figure 3. Our balancing algorithm with the Newton s method (plotted in blue in the figures) 1An implementation of algorithms for matrices and third order tensors is available at: https://github.com/ mahito-sugiyama/newton-balancing Number of iterations 20 100 200 300 Running time (sec.) 20 100 200 300 Newton (proposed) Sinkhorn BNEWT Figure 5. Results on Trefethen matrices. The BNEWT algorithm (green) failed to converge for n 200. is clearly the fastest: It is three to five orders of magnitude faster than the standard Sinkhorn-Knopp algorithm (plotted in red). Although the BNEWT algorithm (plotted in green) is competitive if n is small, it suddenly fails to converge whenever n 200, which is consistent with results in the original paper (Knight & Ruiz, 2013) where there is no result for the setting n 200 on the same matrix. Moreover, our method converges around 10 to 20 steps, which is about three and seven orders of magnitude smaller than BNEWT and Sinkhorn-Knopp, respectively, at n = 100. To see the behavior of the rate of convergence in detail, we plot the convergence graph in Figure 4 for n = 20, where we observe the slow convergence rate of the Sinkhorn Knopp algorithm and unstable convergence of the BNEWT algorithm, which contrasts with our quick convergence. Trefethen Matrix. Next, we collected a set of Trefethen matrices from a collection website2, which are nonnegative diagonal matrices with primes. Results are plotted in Figure 5, where we observe the same trend as before: Our algorithm is the fastest and about four orders of magnitude faster than the Sinkhorn-Knopp algorithm. Note that larger matrices with n > 300 do not have total support, which is the necessary condition for matrix balancing (Knight & Ruiz, 2013), while the BNEWT algorithm fails to converge if n = 200 or n = 300. 4. Theoretical Analysis In the following, we provide theoretical support to our algorithm by formulating the problem as a projection within a statistical manifold, in which a matrix corresponds to an element, that is, a probability distribution, in the manifold. We show that a balanced matrix forms a submanifold and matrix balancing is projection of a given distribution onto the submanifold, where the Jacobian matrix in Equation (4) is derived from the gradient of the manifold. 2http://www.cise.ufl.edu/research/sparse/ matrices/ Tensor Balancing on Statistical Manifold 4.1. Formulation We introduce our log-linear probabilistic model, where the outcome space is a partially ordered set, or a poset (Gierz et al., 2003). We prepare basic notations and the key mathematical tool for posets, the M obius inversion formula, followed by formulating the log-linear model. 4.1.1. M OBIUS INVERSION A poset (S, ), the set of elements S and a partial order on S, is a fundamental structured space in computer science. A partial order is a relation between elements in S that satisfies the following three properties: For all x, y, z S, (1) x x (reflexivity), (2) x y, y x x = y (antisymmetry), and (3) x y, y z x z (transitivity). In what follows, S is always finite and includes the least element (bottom) S; that is, x for all x S. We denote S \ { } by S+. Rota (1964) introduced the M obius inversion formula on posets by generalizing the inclusion-exclusion principle. Let ζ : S S {0, 1} be the zeta function defined as ζ(s, x) = { 1 if s x, 0 otherwise. The M obius function µ: S S Z satisfies ζµ = I, which is inductively defined for all x, y with x y as 1 if x = y, x s 2 or x E}, with S = 2V . Let ˆP be an empirical distribution estimated from a given dataset. The learned model is the mprojection of the empirical distribution ˆP onto SB, where the resulting distribution Pβ is given as { θPβ(x) = 0 if |x| > 2 or x E, ηPβ(x) = η ˆ P (x) if |x| = 1 or x E. 4.3.2. COMPUTATION Here we show how to compute projection of a given probability distribution. We show that Newton s method can be used to efficiently compute the projected distribution Pβ by iteratively updating P (0) β = P as P (0) β , P (1) β , P (2) β , . . . until converging to Pβ. Let us start with the m-projection with initializing P (0) β = P. In each iteration t, we update θ(t) Pβ(x) for all x domβ while fixing η(t) Pβ(x) = ηP (x) for all x S+ \ dom(β), which is possible from the orthogonality of θ and η. Using Newton s method, η(t+1) Pβ (x) should satisfy ( θ(t) Pβ(x) β(x) ) + y dom(β) Jxy ( η(t+1) Pβ (y) η(t) Pβ(y) ) = 0, Tensor Balancing on Statistical Manifold for every x dom(β), where Jxy is an entry of the |dom(β)| |dom(β)| Jacobian matrix J and given as Jxy = θ(t) Pβ(x) η(t) Pβ(y) = s S µ(s, x)µ(s, y)p(t) β (s) 1 from Theorem 3. Therefore, we have the update formula for all x dom(β) as η(t+1) Pβ (x) = η(t) Pβ(x) y dom(β) J 1 xy ( θ(t) Pβ(y) β(y) ) . In e-projection, update η(t) Pβ(x) for x dom(β) while fix- ing θ(t) Pβ(x) = θP (x) for all x S+ \ dom(β). To ensure η(t) Pβ( ) = 1, we add to dom(β) and β( ) = 1. We update θ(t) Pβ(x) at each step t as θ(t+1) Pβ (x) = θ(t) Pβ(x) y dom(β) J 1 xy ( η(t) Pβ(y) β(y) ) , J xy = η(t) Pβ(x) θ(t) Pβ(y) = s S ζ(x, s)ζ(y, s)p(t) β (s) |S|η(t) Pβ(x)η(t) Pβ(y). In this case, we also need to update θ(t) Pβ( ) as it is not guaranteed to be fixed. Let us define p (t+1) β (x) = p(t) β (x) exp ( θ(t+1) Pβ (s) ) exp ( θ(t) Pβ(s) ) ζ(s, x). Since we have p(t+1) β (x) = exp ( θ(t+1) Pβ ( ) ) exp ( θ(t) Pβ( ) ) p (t+1) β (x), it follows that θ(t+1) Pβ ( ) θ(t) Pβ( ) exp ( θ(t) Pβ( ) ) + x S+ p (t+1) β (x) The time complexity of each iteration is O(|dom(β)|3), which is required to compute the inverse of the Jacobian matrix. Global convergence of the projection algorithm is always guaranteed by the convexity of a submanifold S(β) defined in Equation (14). Since S(β) is always convex with respect to the θand η-coordinates, it is straightforward to see that our e-projection is an instance of the Bregman algorithm onto a convex region, which is well known to always converge to the global solution (Censor & Lent, 1981). 5. Balancing Matrices and Tensors Now we are ready to solve the problem of matrix and tensor balancing as projection on a dually flat manifold. 5.1. Matrix Balancing Recall that the task of matrix balancing is to find r, s Rn that satisfy (RAS)1 = 1 and (RAS)T 1 = 1 with R = diag(r) and S = diag(s) for a given nonnegative square matrix A = (aij) Rn n 0 . Let us define S as S = {(i, j) | i, j [n] and aij = 0}, (15) where we remove zero entries from the outcome space S as our formulation cannot treat zero probability, and give each probability as p((i, j)) = aij/ ij aij. The partial order of S is naturally introduced as x = (i, j) y = (k, l) i j and k l, (16) resulting in = (1, 1). In addition, we define ιk,m for each k [n] and m {1, 2} such that ιk,m = min{ x = (i1, i2) S | im = k }, where the minimum is with respect to the order . If ιk,m does not exist, we just remove the entire kth row if m = 1 or kth column if m = 2 from A. Then we switch rows and columns of A so that the condition ι1,m ι2,m ιn,m (17) is satisfied for each m {1, 2}, which is possible for any matrices. Since we have η(ιk,m) η(ιk+1,m) = { n j=1 p((k, j)) if m = 1, n i=1 p((i, k)) if m = 2 if the condition (17) is satisfied, the probability distribution is balanced if for all k [n] and m {1, 2} η(ιk,m) = n k+1 Therefore, we obtain the following result. Matrix balancing as e-projection: Given a matrix A Rn n with its normalized probability distribution P S such that p((i, j)) = aij/ ij aij. Define the poset (S, ) by Equations (15) and (16) and let S(β) be the submanifold of S such that S(β) = {P S | ηP (x) = β(x) for all x dom(β)}, where the function β is given as dom(β) = {ιk,m S | k [n], m {1, 2}}, β(ιk,m) = n k+1 Matrix balancing is the e-projection of P onto the submanifold S(β), that is, the balanced matrix (RAS)/n is the distribution Pβ such that { θPβ(x) = θP (x) if x S+ \ dom(β), ηPβ(x) = β(x) if x dom(β), which is unique and always exists in S, thanks to its dually flat structure. Moreover, two balancing vectors r and s are k=1 θPβ(ιk,m) θP (ιk,m) = { ri if m = 1, ai if m = 2, for every i [n] and r = rn/ Tensor Balancing on Statistical Manifold 5.2. Tensor Balancing Next, we generalize our approach from matrices to tensors. For an Nth order tensor A = (ai1i2...i N ) Rn1 n2 n N and a vector b Rnm, the m-mode product of A and b is defined as (A m b)i1...im 1im+1...i N = im=1 ai1i2...i N bim. We define tensor balancing as follows: Given a tensor A Rn1 n2 n N with n1 = = n N = n, find (N 1) order tensors R1, R2, . . . , RN such that A m 1 = 1 ( Rn1 nm 1 nm+1 n N ) (18) for all m [N], i.e., n im=1 a i1i2...i N = 1, where each entry a i1i2...i N of the balanced tensor A is given as a i1i2...i N = ai1i2...i N m [N] Rm i1...im 1im+1...i N . A tensor A that satisfies Equation (18) is called multistochastic (Cui et al., 2014). Note that this is exactly the same as the matrix balancing problem if N = 2. It is straightforward to extend matrix balancing to tensor balancing as e-projection onto a submanifold. Given a tensor A Rn1 n2 n N with its normalized probability distribution P such that p(x) = ai1i2...i N / j1j2...j N aj1j2...j N (19) for all x = (i1, i2, . . . , i N). The objective is to obtain Pβ such that n im=1 pβ((i1, . . . , i N)) = 1/(n N 1) for all m [N] and i1, . . . , i N [n]. In the same way as matrix balancing, we define S as S = { (i1, i2, . . . , i N) [n]N ai1i2...i N = 0 } with removing zero entries and the partial order as x = (i1 . . . i N) y = (j1 . . . j N) m [N], im jm. In addition, we introduce ιk,m as ιk,m = min{ x = (i1, i2, . . . , i N) S | im = k }. and require the condition in Equation (17). Tensor balancing as e-projection: Given a tensor A Rn1 n2 n N with its normalized probability distribution P S given in Equation (19). The submanifold S(β) of multistochastic tensors is given as S(β) = {P S | ηP (x) = β(x) for all x dom(β)}, where the domain of the function β is given as dom(β) = { ιk,m | k [n], m [N] } and each value is described using the zeta function as l [n] ζ(ιk,m, ιl,m) 1 n N 1 . Tensor balancing is the e-projection of P onto the submanifold S(β), that is, the multistochastic tensor is the distribution Pβ such that { θPβ(x) = θP (x) if x S+ \ dom(β), ηPβ(x) = β(x) if x dom(β), which is unique and always exists in S, thanks to its dually flat structure. Moreover, each balancing tensor Rm is Rm i1...im 1im+1...i N k=1 θPβ(ιk,m ) θP (ιk,m ) for every m [N] and R1 = R1n N 1/ j1...j N aj1...j N to recover a multistochastic tensor. Our result means that the e-projection algorithm based on Newton s method proposed in Section 4.3 converges to the unique balanced tensor whenever S(β) = holds. 6. Conclusion In this paper, we have solved the open problem of tensor balancing and presented an efficient balancing algorithm using Newton s method. Our algorithm quadratically converges, while the popular Sinkhorn-Knopp algorithm linearly converges. We have examined the efficiency of our algorithm in numerical experiments on matrix balancing and showed that the proposed algorithm is several orders of magnitude faster than the existing approaches. We have analyzed theories behind the algorithm, and proved that balancing is e-projection in a special type of a statistical manifold, in particular, a dually flat Riemannian manifold studied in information geometry. Our key finding is that the gradient of the manifold, equivalent to Riemannian metric or the Fisher information matrix, can be analytically obtained using the M obius inversion formula. Our information geometric formulation can model several machine learning applications such as statistical analysis on a DAG structure. Thus, we can perform efficient learning as projection using information of the gradient of manifolds by reformulating such models, which we will study in future work. Acknowledgements The authors sincerely thank Marco Cuturi for his valuable comments. This work was supported by JSPS KAKENHI Grant Numbers JP16K16115, JP16H02870 (MS), JP26120732 and JP16H06570 (HN). The research of K.T. was supported by JST CREST JPMJCR1502, RIKEN Post K, KAKENHI Nanostructure and KAKENHI JP15H05711. Tensor Balancing on Statistical Manifold Agresti, A. Categorical data analysis. Wiley, 3 edition, 2012. Ahmed, M., De Loera, J., and Hemmecke, R. Polyhedral Cones of Magic Cubes and Squares, volume 25 of Algorithms and Combinatorics, pp. 25 41. Springer, 2003. Akartunalı, K. and Knight, P. A. Network models and biproportional rounding for fair seat allocations in the UK elections. Annals of Operations Research, pp. 1 19, 2016. Amari, S. Information geometry on hierarchy of probability distributions. IEEE Transactions on Information Theory, 47(5):1701 1711, 2001. Amari, S. Information geometry and its applications: Convex function and dually flat manifold. In Nielsen, F. (ed.), Emerging Trends in Visual Computing: LIX Fall Colloquium, ETVC 2008, Revised Invited Papers, pp. 75 102. Springer, 2009. Amari, S. Information geometry of positive measures and positive-definite matrices: Decomposable dually flat structure. Entropy, 16(4):2131 2145, 2014. Amari, S. Information Geometry and Its Applications. Springer, 2016. Balinski, M. Fair majority voting (or how to eliminate gerrymandering). American Mathematical Monthly, 115(2): 97 113, 2008. Censor, Y. and Lent, A. An iterative row-action method for interval convex programming. Journal of Optimization Theory and Applications, 34(3):321 353, 1981. Chang, H., Paksoy, V. E., and Zhang, F. Polytopes of stochastic tensors. Annals of Functional Analysis, 7(3): 386 393, 2016. Cui, L.-B., Li, W., and Ng, M. K. Birkhoff von Neumann theorem for multistochastic tensors. SIAM Journal on Matrix Analysis and Applications, 35(3):956 973, 2014. Cuturi, M. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems 26, pp. 2292 2300, 2013. Frogner, C., Zhang, C., Mobahi, H., Araya, M., and Poggio, T. A. Learning with a Wasserstein loss. In Advances in Neural Information Processing Systems 28, pp. 2053 2061, 2015. Ganmor, E., Segev, R., and Schneidman, E. Sparse loworder interaction network underlies a highly correlated and learnable neural population code. Proceedings of the National Academy of Sciences, 108(23):9679 9684, 2011. Gierz, G., Hofmann, K. H., Keimel, K., Lawson, J. D., Mislove, M., and Scott, D. S. Continuous Lattices and Domains. Cambridge University Press, 2003. Idel, M. A review of matrix scaling and sinkhorn s normal form for matrices and positive maps. ar Xiv:1609.06349, 2016. Ito, K. (ed.). Encyclopedic Dictionary of Mathematics. The MIT Press, 2 edition, 1993. Knight, P. A. The Sinkhorn Knopp algorithm: Convergence and applications. SIAM Journal on Matrix Analysis and Applications, 30(1):261 275, 2008. Knight, P. A. and Ruiz, D. A fast algorithm for matrix balancing. IMA Journal of Numerical Analysis, 33(3): 1029 1047, 2013. Lahr, M. and de Mesnard, L. Biproportional techniques in input-output analysis: Table updating and structural analysis. Economic Systems Research, 16(2):115 134, 2004. Lamond, B. and Stewart, N. F. Bregman s balancing method. Transportation Research Part B: Methodological, 15(4):239 248, 1981. Livne, O. E. and Golub, G. H. Scaling by binormalization. Numerical Algorithms, 35(1):97 120, 2004. Marshall, A. W. and Olkin, I. Scaling of matrices to achieve specified row and column sums. Numerische Mathematik, 12(1):83 90, 1968. Miller, R. E. and Blair, P. D. Input-Output Analysis: Foundations and Extensions. Cambridge University Press, 2 edition, 2009. Moon, T. K., Gunther, J. H., and Kupin, J. J. Sinkhorn solves sudoku. IEEE Transactions on Information Theory, 55(4):1741 1746, 2009. Nakahara, H. and Amari, S. Information-geometric measure for neural spikes. Neural Computation, 14(10): 2269 2316, 2002. Nakahara, H., Nishimura, S., Inoue, M., Hori, G., and Amari, S. Gene interaction in DNA microarray data is decomposed by information geometric measure. Bioinformatics, 19(9):1124 1131, 2003. Nakahara, H., Amari, S., and Richmond, B. J. A comparison of descriptive models of a single spike train by information-geometric measure. Neural computation, 18 (3):545 568, 2006. Tensor Balancing on Statistical Manifold Parikh, A. Forecasts of input-output matrices using the R.A.S. method. The Review of Economics and Statistics, 61(3):477 481, 1979. Parlett, B. N. and Landis, T. L. Methods for scaling to doubly stochastic form. Linear Algebra and its Applications, 48:53 79, 1982. Rao, S. S. P., Huntley, M. H., Durand, N. C., Stamenova, E. K., Bochkov, I. D., Robinson, J. T., Sanborn, A. L., Machol, I., Omer, A. D., Lander, E. S., and Aiden, E. L. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell, 159(7): 1665 1680, 2014. Rota, G.-C. On the foundations of combinatorial theory I: Theory of M obius functions. Z. Wahrseheinlichkeitstheorie, 2:340 368, 1964. Sinkhorn, R. A relationship between arbitrary positive matrices and doubly stochastic matrices. The Annals of Mathematical Statistics, 35(2):876 879, 06 1964. Sinkhorn, R. and Knopp, P. Concerning nonnegative matrices and doubly stochastic matrices. Pacific Journal of Mathematics, 21(2):343 348, 1967. Solomon, J., de Goes, F., Peyr e, G., Cuturi, M., Butscher, A., Nguyen, A., Du, T., and Guibas, L. Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains. ACM Transactions on Graphics, 34(4):66:1 66:11, 2015. Soules, G. W. The rate of convergence of sinkhorn balancing. Linear Algebra and its Applications, 150:3 40, 1991. Sugiyama, M., Nakahara, H., and Tsuda, K. Information decomposition on structured space. In 2016 IEEE International Symposium on Information Theory, pp. 575 579, July 2016. Wu, H.-J. and Michor, F. A computational strategy to adjust for copy number in tumor Hi-C data. Bioinformatics, 32 (24):3695 3701, 2016.