# representation_learning_of_compositional_data__978f9e41.pdf Representation Learning of Compositional Data Marta Avalos-Fernandez Richard Nock Cheng Soon Ong Julien Rouar Ke Sun Université de Bordeaux, Data61, the Australian National University and the University of Sydney first.last@{u-bordeaux.fr,data61.csiro.au} We consider the problem of learning a low dimensional representation for compositional data. Compositional data consists of a collection of nonnegative data that sum to a constant value. Since the parts of the collection are statistically dependent, many standard tools cannot be directly applied. Instead, compositional data must be first transformed before analysis. Focusing on principal component analysis (PCA), we propose an approach that allows low dimensional representation learning directly from the original data. Our approach combines the benefits of the log-ratio transformation from compositional data analysis and exponential family PCA. A key tool in its derivation is a generalization of the scaled Bregman theorem, that relates the perspective transform of a Bregman divergence to the Bregman divergence of a perspective transform and a remainder conformal divergence. Our proposed approach includes a convenient surrogate (upper bound) loss of the exponential family PCA which has an easy to optimize form. We also derive the corresponding form for nonlinear autoencoders. Experiments on simulated data and microbiome data show the promise of our method. 1 Introduction Compositional data analysis (Co DA) is a subfield of statistics introduced more than three decades ago [3, 2, 1, 29]. Compositional data consist of a collection of nonnegative measurements that sum to a constant value, typically, proportions that sum to 1. Because knowing the sum, one component can be determined from the sum of the remainder, the parts that make up the composition are mathematically and statistically dependent. This distinct structure complicates analysis and does not allow standard statistical analyses. Ignoring the underlying nature of the data studied might give rise to misleading conclusions. Among others, [1] and [13] provided a framework to perform Co DA by mapping data from the constrained simplex space to the Euclidian space using nonlinear log-ratio transforms. In this paper, we focus on Principal Components Analysis (PCA), one of the main tools for exploratory analysis of compositional data. Just like in standard Euclidean data, it is particularly useful when the first few principal components explain enough variability to be considered as representative. Unfortunately, any operation of centering or scaling destroys the compositional nature of the data, which complicates a direct application of PCA. Our motivation for studying Co DA comes from the recent explosion of microbiome studies [14, 15]. Indeed, spectacular advances in 16S r RNA gene sequencing of the bacterial component of the human microbial community (microbiota) have enabled researchers to investigate human health and disease, leading to new insights into the role of these microbial communities. The microbiota sequencing data are measured as read counts interpreted as a species abundance in a microbial community. To make the microbial abundance comparable across samples, data are normalized to the relative abundances Authors in alphabetical order 32nd Conference on Neural Information Processing Systems (Neur IPS 2018), Montréal, Canada. 0.4540 0.4583 #comp1 0.4320 12345678910 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 4.753 3.976 #comp1 4.93 12345678910 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798799 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 clr-PCA 26.2 27.3 #comp1 27.19 123456789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197198199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898899 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 83.1 67.4 #comp1 79.1 12345678910 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 949596979899 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498499 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 66.3 62.5 #comp1 64.1 6789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Figure 1: 2D Visualization of the low dimensional representation A on the arms dataset. of all bacteria observed. On the other hand, because high-throughput experiments produce large amounts of data, multivariate analysis is indispensable [27, 21]. There is a stress to understand the soundness of models [5]. In this paper, we propose to learn a low dimensional representation of Co DA from the original data. To account for the nonlinearity due to the compositional nature of the data, we start from exponential family PCA [12] that we augment with the compositional constraint and then simplify the loss to be optimized via a generalization of a recent result [25] on Bregman divergences, which may be of independent interest. We also propose a nonlinear autoencoder (AE) version to learn the low dimensional representation. Let us examine a toy example to illustrate our approach. We generate the arms dataset in S19 by evenly interpolating the simplex center and each of the 20 vertices with 100 points, therefore yielding a matrix X20 2000. Figure 1 shows the 2D representation A2 2000 computed by five methods: the standard PCA; clr-PCA computes the standard PCA after performing clr; Co DA-PCA and Co DA-AE are our proposed methods; t-SNE [32] is a popular nonlinear dimensionality reduction method and is applied on X directly. In the PCA plot, the black segments indicate that the PCA reconstruction is outside of the simplex. PCA cannot be directly adapted to Co DA because the projection on the principal components may go beyond the convex hull of the vertices. It is clear that only Co DA-PCA and Co DA-AE uncover the true structure, where all the arms are clearly presented, and their connections are faithfully presented. 2 Compositional Data Analysis We briefly review some definitions of Co DA. Compositional data are proportions: X is a compositional dataset if and only if X ℜd m such that i [1, ..., m] the vector column xi of X is in the simplex Sd = x ℜd : j, xj > 0; Pd j=1 xj = κ , where κ > 0 is a constant, classically 1. Here the superscript d does not denote the dimensionality as dim(Sd) = d 1. For a dataset X which contains counts of strictly positive values, we reduce it to a compositional dataset by dividing out the totals, that is we compute the Co DA set X such that: xi = x i 1 Pd j=1 x ji is the vector of proportions for individual i. Using Bregman divergences makes explicit a dual affine [6] coordinate space which is in fact the log coordinates of Aitchison [1]. It is in this space that we have affine constraints, which are therefore non-linear in the "primal", ambient space. To manage this nonlinear structure, it has been proposed [4] to first apply a log-ratio transformation to transpose the data into real Euclidean space. For instance, the additive log-ratio transformation (alr) applies log-ratio between a component and a reference component; the centered log-ratio transformation (clr) scales each subject vector by its geometric mean; and the isometric log-ratio transformation (ilr) is associated with an orthogonal coordinate system in the simplex. Afterwards, standard PCA is performed. By definition, the clr transformation is c KL(x) = log x = Cclr log(x) = log(x) log(x)1d, (1) where g(x) = (Qd j=1 xj)1/d the geometric mean of x, x = 1 d Pd j=1 xj is the arithmetic mean of x, 1d is the 1 d vector of all ones, and Cclr = Id 1 d1d1 d. The purpose of the log-ratio transformation (centered or not) is to go back to ℜd from Sd without losing information. Notice that log(xj), log(x) ( , 0) so the compositional data is embedded in ℜd under the clr transformation. The reverse operation: x = c 1 KL (x ) = exp(x ) Qd j=1 exp( 1 1 dx j) embeds ℜd into Sd. See the table below for a comparison of clr, alr and ilr, presented as different transformations C log(x). They are equivalent up to linear transformations. Without loss of generality we focus on the clr. clr alr ilr Cclr d d = Id 1 d1d1 d Calr (d 1) d = [Id 1, 1d 1] Cilr (d 1) d RCclr : RR = Id 1 However interpreting the resulting coordinates is still challenging [13, 24]: alr transformation is no distance-preserving; clr leads to degenerate distributions and singular covariance matrices; ilr avoids the precedent drawbacks, but still, results from complicated nonlinear transformations are difficult to interpret. Currently, there seems to be no consensus about the best practices ([16] versus [31]) and, in all cases, log-transforming is not a remedy for all the difficulties arisen by Co DA [20]. 3 Exponential Family Principal Component Analysis Another way to apply dimension reduction is to perform a generalized PCA on crude count data. Based on the same ideas as the generalised linear model, [12] described a generalized PCA model for distributions from the exponential family. We first recall the standard PCA setting. 3.1 Principal Component Analysis For simplicity suppose that the data matrix X is already centered that can be easily achieved by appending to A matrix a row of ones. (Traditional PCA) We have a dataset X ℜd m that we approximate as X V A by minimizing the following loss wrt the constraints A ℜℓ m , V ℜℓ d , VV = Iℓ: ℓPCA(X; A, V) .= X V A 2 F . (2) Hence, observations are column-wise. V : ℜd ℜℓis surjective with V V defining a rank-ℓ projection, assuming in general ℓ< d. A is the representation of data points. The goodness of fit of the representation is measured by the squared Frobenious norm. We summarise the different transformations and loss functions in Table 1. Observe that instead of finding a linear representation A and its corresponding linear loadings V, we can consider nonlinear functions for encoding and decoding the latent representation. When the nonlinear encoder and decoder are implemented as feed-forward neural networks, we arrive at the autoencoder setting. 3.2 Bregman Divergence and ϕ-PCA As mentioned in the introduction, compositional data do not live in Euclidean space. Count data are naturally linked to the Poisson distribution, and therefore we should consider an exponential family model for count data. From the Bayesian viewpoint, the PCA goal is to minimize a distance (L2 for usual PCA) which is equivalent to a Bregman divergence minimization (or a Likelihood function maximization). Definition 1 (Bregman Divergence) Let ϕ : ℜd ℜconvex differentiable. The Bregman divergence Dϕ with generator ϕ is Dϕ(x x ) .= ϕ(x) ϕ(x ) (x x ) ϕ(x ). (3) A Bregman divergence is just a truncation of the Taylor expansion of a function. It can therefore be defined for any differentiable function, not just the convex ones. If ϕ is not convex, we call Dϕ a Bregman distortion, which is a signed dissimilarity. We denote by ϕ (x) .= supy{x y ϕ(y)} the convex conjugate of the generator ϕ [9]. Table 1: Summary of methods in this paper Method Original Reconstruction Distortion Notes PCA X V A 2 F classical PCA(2) ϕ-PCA X ϕ(V A) Dϕ( ) exponential family PCA (4) clr-PCA c KL(X) V A 2 F Co DA with clr (5) gauged-ϕ-PCA ϕ(ˇX) V A Dϕ ( ) General Bregman PCA (8) Co DA PCA c KL(X) V A Dexp( ) (11) is a special case of (8) S-CODA-PCA ˇxi ˇ KL(exp(V ai)) inner product upper bound (17) Co DA AE X gθ hΦ(X) Dexp( ) neural networks gθ and hΦ PCA has been generalized to the exponential families in a way that makes fitting occur in the natural parameter space [12, 19] (and references therein). The optimization problem is non-convex. The algorithmic strategy proposed by [12] is to use an alternating sequence of convex minimizations under constraints. Alternatively, [19] proposed maximizing the deviance (as a generalized notion of variance) and [10] proposed maximizing the likelihood function via a variational algorithm and gradient descent. We denote exponential family PCA as ϕ-PCA, where ϕ is the cumulant of the exponential family, which is strictly convex differentiable with convex conjugate ϕ , and uniquely determines the exponential family under mild conditions [7]. Note that for ϕ-PCA, X is not neccessarily in a vector space (e.g. X ℜd m). (ϕ-PCA) We have a dataset Xd m that we approximate as X ϕ (V A) with A ℜℓ m, V ℜℓ d, VV = Iℓ, through minimizing the Bregman loss ℓϕ-PCA(X; A, V) .= X i Dϕ(xi ϕ (V ai)) = Dϕ(X ϕ (V A)). (4) Vectors are column-vectors: xi, ai are respectively column observation i in the ambient and principal spaces, respectively. This formulation has a major advantage that linear algebra may be used to fit A, V while X may not lie in a vector space, see for example [12, 19] and references therein. We remark that because of the dual symmetry of Bregman divergences, we have Dϕ(X ϕ (V A)) = Dϕ (V A ϕ(X)) [8]. Notice there exists a little "hole" in the ϕ-PCA definition, as X is not necessarily easy to center when it is not in a vector space. ϕ-PCA includes standard PCA as a special case as when ϕ(x) = 1 2 x 2 F and the corresponding Bregman divergence becomes Dϕ(x x ) = 1 2 x x 2 F . 4 Exponential family PCA on Compositional Data Co DA has found a workaround for the centering problem, centered log-ratio coordinates. From [3, Def. 4.6, Chap. 8] the associated loss is the standard PCA loss on clr transformed data: ℓclr-PCA(X; A, V) .= 1 2 c KL(X) V A 2 F = Dϕ(c KL(X) ϕ (V A)), (5) where c KL(X) is the centered log-ratio transform defined in Equation (1) and ϕ(x) = 1 2 x 2 F. Recall from the previous section that we could deal with crude count data by using exponential family PCA. However if we wish to perform PCA on the crude count data, while maintaining the clr transform, we need an additional normalization term, which requires us to obtain a gauged version of the Bregman divergence. 4.1 Scaled Bregman Theorem with Remainder In this section we generalize the Scaled Bregman Theorem from [25, Theorem 1] to allow for a remainder term. We use it in this paper to deal with the perspective transform required for Co DA, but it may be of independent interest. Recall that ϕ is the generator of the Bregman distortion (Definition 1). We additionally define a perspective (or gauge) function g to deal with the fact that we are considering data on the simplex. Whenever ϕ and g are differentiable, the following is immediate from [25, Theorem 1]. Theorem 2 (Scaled Bregman Theorem with Remainder) For any ϕ : X ℜand g : X ℜ (ℜ = ℜ\ {0}) that are both differentiable, denoting ˇx .= x g(x) and ˇϕ(x) .= g(x) ϕ x the following holds true: g(x) Dϕ (ˇx ˇy) = D ˇϕ (x y) + Rϕ,g(x y) , x, y X , (7) where Rϕ,g(x y) .= ϕ ( ϕ(ˇy)) Dg(x y) is called the remainder. We can abstract Theorem 2 by saying that for any ϕ, g differentiable, we have perspective-Bregman(ϕ, g) = Bregman(perspective(ϕ)) + conformal-Bregman(g, ϕ), where perspective(ϕ) is ˇϕ in (6), and conformal divergences are defined and analyzed in [26]. General classes of perspective transforms of convex functions are introduced in [22, 23]. The notion of perspective transform of a Bregman divergence was introduced in [25]. In [25, Theorem 1], conditions are assumed that make Rϕ,g(x y) = 0, resulting in the scaled Bregman theorem. Notice that Dϕ is a Bregman distortion but not necessarily a Bregman divergence if ϕ is not convex. For reasons explained in [25], we call g a gauge. In the following we assume that ϕ is separable, so that we can use both notations ϕ and ϕ to denote the gradient and derivatives involving ϕ. By Theorem 2, as long as g(x) is homogeneous of degree one, Dϕ (ˇx ˇy) and 1 g(x) [D ˇϕ (x y) + Rϕ,g(x y)] are both invariant to re-scaling of x and y and can therefore be used to deal with compositional data. A general formulation of g satisfying this condition can be g(x) = Qd j=1 xwj j , where j, wj 0 and Pd j=1 wj = 1. In this paper, we focus on the special case j, wj = 1 d so that Dϕ (ˇx ˇy) can be expressed in terms of the widely used clr transformation. Setting w to be a one-hot vector (1, 0, , 0) can express Dϕ (ˇx ˇy) with the alr. This latter case will be omitted here. 4.2 Exponential Family Co DA We are now in a position to derive the exponential family version of the loss in (5). Let ˇX denote the matrix of the column vectors ˇxi. It turns out that in the same way as (2) is an approximation of (4), the loss in (5) is an approximation of the gauged loss: ℓgauged-ϕ-PCA(X; A, V) .= Dϕ (V A ϕ(ˇX)) = Dϕ(ˇX ϕ (V A)). (8) Note that the above expression is in terms of the normalised matrix ˇX. To unpack it in terms of the original data X, we apply Theorem 2. In the Co DA case, ϕ (z) .= exp z, the convex dual of ϕ(z) .= z log z z. Indeed, after remarking that ϕ(ˇX) = c KL(X), it follows ℓgauged-KL-PCA(X; A, V) = Dexp(V A c KL(X)) = DKL ˇX exp(V A) = 1 exp(V A)1 trace ˇX V A + constant. (9) In other words, the Co DA PCA is in fact fitting natural parameters from centered log-ratios being natural coordinates as well. From (9) we observe that both of them live in the same space. Therefore V A is centered in the same way as c KL(X), and so V1d ker(A ) A V1 = 0m . (10) Remark that a centering assumption is also explicit in [3, Chapter 8, Eq. 8.1]. Hence, we can define the Co DA PCA problem as follows. (Co DA PCA) We have a dataset X (Sd)m that we approximate as c KL(X) V A by minimizing the following loss wrt the constraints A ℜℓ m , V ℜℓ d , VV = Iℓ, A V1 = 0: ℓCo DA-PCA(X; A, V) = Dexp(V A c KL(X)). (11) Regarding c KL(x), ˇx and x as different coordinate systems of Sd, we use the Fisher information metric (FIM) [6], whose formulation is well studied on the x coordinates, to define the corresponding pullback metric G under the c KL(x) and ˇx coordinates, meaning that these metrics correspond to the same underlying geometry of Sd. We have the following proposition (proof omitted; see [30] for similar derivations). Proposition 3 The FIM that uniquely defines the geometry of c c KL(x) : x Sd is given by Gij(c) = δij exp(ci) Pd i=1 exp(ci) exp(ci+cj) ( Pd i=1 exp(ci)) 2 ; the FIM under the coordinates ˇx is given by Gij(ˇx) = δij 1 ˇxi Pd i=1 ˇxi 1 (Pd i=1 ˇxi)2 , where δij = 1 if i = j otherwise δij = 0. Intuitively, the metric G measures the local distance dˇx G(ˇx)dˇx of a tiny shift dˇx. It is not everywhere identity as in a Euclidean space. Therefore the distance should not be measured by the Frobenious norm as in (5). In contrast, our loss ℓCo DA-PCA(X; A, V) is based on the KL divergence which locally agrees with the FIM [6]. 4.3 Relating Co DA PCA to ϕ-PCA We now define and analyze a generalized perspective transform of the generator of KL divergence: let ˇ KL(x) .= g(x) Pd j=1 ϕ(xj/g(x)) where ϕ(z) .= z log(z) z and g(x) .= (Q Lemma 4 (Properties of ˇ KL) ˇ KL satisfies the following properties: (1) ˇ KL is convex; (2) the general term of the Hessian H of ˇ KL is Hij .= Hij( ˇ KL(x)) = 1 dxj uji if j = i P k =j ukj otherwise , (12) where uab .= 1 + xa/xb. Furthermore, z Hz = 1 2d ij (xi + xj) zi 2 , z ℜd. (13) Hence, z Hz 0, x ℜd ++, z ℜd and z Hz = 0 only when z x; (3) function ˇ KL exp is 1-homogeneous on span({1}) . (Proof in SM, Section D) A consequence of Theorem 2 is the following Corollary. Corollary 5 For any A, V such that A V1 = 0, we have ℓCo DA-PCA(X; A, V) ℓ .= X 1 g(xi) D ˇ KL(xi exp(V ai)). (14) Hence, the Co DA PCA loss is upperbounded by a weighted generalized ϕ-PCA loss. Furthermore, D ˇ KL(xi exp(V ai)) = ˇ KL(xi) x i ˇ KL(exp(V ai)) (15) Proof Since g is concave (Example 1 in Supplement), Dg(x y) = D g(x y) 0, and (14) follows from Theorem 7 and the fact that ri 0, i, which shows (14). (15) is a consequence of the analytical construct of Bregman divergences (Definition 1) and point (3) in Lemma 4 and the fact that V ai span({1}) by assumption. Remark In [3, Chapter 8], Co DA PCA is presented as a (centered) regular PCA over data that been subject to two transforms via the centered log-ratio coordinates. What Corollary 5 shows is that we can solve the problem via a surrogate formulation using non transformed data but minimizing a loss which is that of a ϕ-PCA transformed twice: first taking a perspective transform of the KL generator ( ˇ KL) and then having a weighted Bregman divergence minimization (g 1(.)). We remark that weights can also be folded in the arguments as we have: i KL(ˇxi) ˇx i ˇ KL(exp(V ai)) . (16) Furthermore, the leftmost argument in (16) plays no role in its minimization, and therefore we get the Surrogate Co DA PCA (S-CODA-PCA) by replacing (11) with a simple inner product: ℓS-CODA-PCA(X; A, V) .= X i ˇx i ˇ KL(exp(V ai)) . (17) 5 Implementations Both the CODA-PCA in (11) and the S-CODA-PCA in (17) can be equivalently written as the following unconstrained problems (CODA-PCA) argmin B,U 1 d exp(Y)1m trace ˇX Y , (18) (S-CODA-PCA) argmin B,U trace ˇX exp( Y) 1d1 d d exp(Y) Y , (19) where means element-wise product, exp( ) is element-wise exponential, and Y = CU B with C ℜd d, U ℜℓ d, B ℜℓ m. C is a constant centering matrix satisfying rank(C) = d 1, C 1 = 0, so that Y s columns are automatically centered and Y 1 = B UC 1 = 0. Any C satisfying this condition corresponds to a valid re-parametrization of the feasible space, for example C = Id 1 d1d1 d or C = Id I d (I d circularly raises the diagonal entries of Id by 1 row). U s rows form a nonorthogonal basis of ℜd. B s columns are the sample coordinates in such a basis. After optimization, we take the QR decomposition C(U ) = (V ) T , where V s rows are orthonormal. Therefore C(U ) B = (V ) T B and A = T B is the corresponding coordinates. An optimal solution of the original constrained PCA problem is given by (V , A ). Although the losses in (18) and (19) are non-convex, they are both bi-convex. Fixing U, the loss is a strictly convex function of B that is decomposed into a sum of per-sample convex functions of bi; fixing B, it is a strictly convex function of U. These convex functions have the general form f(ξ) = P i exp(α i ξ + βi) + ζ ξ. Its gradient and Hessian are both in simple closed form: f = P i exp(α i ξ+βi)αi+ζ; 2f = P i exp(2α i ξ+2βi)αiα i . One can apply an off-the-shelf convex optimizer, which in the simplest case can be the Newton method, to alternately minimize B and U until convergence. Our implementation simply uses L-BFGS [9] based on the gradient of the loss. In summary, we have the following result. Proposition 6 The CODA-PCA and the S-CODA-PCA are both equivalent to an unconstrained bi-convex optimization problem. As an alternative implementation, we assume a parametric mapping bi = gΘ(xi) that is the ℓdimensional output of a feed-forward neural network with input c KL(xi) and xi (or ˇxi) and connection weights Θ. Then we minimize the cost function in (18) with respect to U and Θ. If gΘ is flexible enough, then the minimization recovers the CODA-PCA projection. This approach could be favored as it learns an out-of-sample mapping gΘ( ) with a compact parametric structure that does not scale with the sample size m; and it can be adapted to an online learning scenario. However, it requires tuning of the neural network architecture and the optimizer. In our experiments, the encoding map is modeled by a feed-forward neural network with two hidden layers of ELU [11] units, each of size 100. To distinguish between the two implementations, the method to directly optimize U and B clr-PCA Co DA-PCA SCo DA-PCA clr-AE Co DA-AE -102.3-106.2-103.4-97.9 -98.9 -97.3 -96.0 -102.7 L2-clr (test) 20.0 23.3 33.3 39.7 42.6 41.9 39.3 1 10 #PCs 0.15 17.0 23.0 29.0 30.6 -19.7 -21.4 L2-clr (test) 29.5 30.7 32.4 27.2 31.0 1 10 #PCs 0.10 Figure 2: Testing errors (y-axis) against the number of principal components (x-axis) based on three different distance measures (from left to right) on the Atlas data (first row) and the diet swap data (second row). The numbers along the clr-PCA curves show the percentage of improvement (green) or disimprovement (red), comparing Co DA-PCA against clr-PCA. without assuming the neural network mapping is called non-parametric CODA-PCA, and the latter parametric version is simply called CODA-PCA. The above implementation resembles an auto-encoder structure: xi Θ bi U yi, where the decoder is simply a linear mapping yi = CU bi. In the general case, we apply a non-linear decoder bi Φ yi in the form yi = ChΦ(bi), where hΦ( ) is a neural network with parameters Φ and d output dimensions. At the same time, we add a small random noise to the encoder input so as to avoid overfitting. In this way we obtain a denoising CODA-AUTOENCODER. In contrast to the CODA-PCA, the CODA-AUTOENCODER can only be trained by gradient-based optimizers. In practice, the input matrix X may contain zeros that lie on the boundary of Sd. In this case c KL(x) and ˇx are undefined. A simple way to tackle the zero entries is to replace them with a small positive number ϵ > 0. Alternatively, one can redefine the gauge as g(x) = Q j:xj>0(xj)1/ρ, where ρ = |{j : xj > 0}| so that g(x) is always positive and ˇx is well defined on Sd Sd. 6 Experiments We compare the following methods: clr-PCA means PCA applied on the centered log-ratio coordinates; Co DA-PCA is the proposed CODA-PCA in (11); SCo DA-PCA is the proposed S-CODA-PCA in (17); clr-AE is an autoencoder with L2 loss applied on the clr transformation; Co DA-AE is the proposed CODA-AUTOENCODER in subsection 5. Both clr-AE and Co DA-AE use exactly the same structure with one hidden layer of 100 ELU [11] units in their decoders. The baselines are assessed based on an array of measures including (L2-clr) the L2-distance c KL(x) c KL(x ) F between the input data x Sd and the reconstruction x Sd in the clr space; (JSD) the Jensen-Shannon divergence 1 2 KL(x : x+x 2 KL(x : x+x 2 ); (TV) the total variation distance 1 2 Pd i=1 |xi x i|. These measurements are all invariant to scaling or permutation of x and x . See the supplementary material for more baselines and performance indicators. We consider the following datasets available in the microbiome R package [18], each of which is randomly split into a training set (90%) and a testing set (10%). The HITChip Atlas dataset [17] contains 130 genus-level taxonomic groups that cover the majority of the known bacterial diversity of the human intestine. The data come from 1006 western adults from 15 western countries (Europe and the United States). Sample sets were analysed with three different DNA extraction methods. The two-week diet swap study between western (USA) and traditional (rural Africa) diets was reported in [28]. In this study, a two-week food exchange was performed in subjects from the same populations, where African Americans were fed a high-fibre, low-fat African-style diet and rural Africans a high-fat, low-fibre western-style diet. The group diet was indicated by HE (home environment days), DI (dietary intervention days) and ED (initial and final endoscopy days). Each subject served as his/her own control, given the known wide individual variation in colonic microbiota composition. Fig. 2 shows the typical testing results. We observe that on most performance indicators Co DA-PCA and Co DA-AE show a much smaller testing error as compared to clr-PCA and clr-AE, respectively. The only exception is on L2-clr, where clr-PCA and clr-AE appear to be favored against our Co DA variants. This is because L2-clr is exactly the cost function of those two methods. We found that Co DA-AE is more robust against overfitting as compared to clr-AE. The performance of SCo DA-PCA is close to Co DA-PCA on most of the indicators and is better than Co DA-PCA on L2-clr. The source codes to reproduce our experimental results are available online2. 7 Conclusion We propose an approach for learning a low dimensional representation directly on raw count data, which is compositional in nature. Our proposed algorithm generalizes PCA in two ways, first by going to the exponential family via the Bregman divergence, and second by converting the normalization of data to a change in the Bregman divergence. The key theorem used for transforming the Bregman divergence generalizes a recent result, and may be of independent interest. Acknowledgements The authors gratefully thank Perrine Soret, Frank Nielsen, Xinhua Zhang, and the anonymous NIPS reviewers, for their helpful and constructive feedback. This work was done while MAF was visiting Data61, CSIRO in Canberra, Australia. [1] J. Aitchison. The statistical analysis of compositional data (with discussion). Journal of the Royal Statistical Society B, 44(2):139 177, 1982. [2] J. Aitchison. Principal component analysis of compositional data. Biometrika, 70(1):57 65, 1983. [3] J. Aitchison. The Statistical Analysis of Compositional Data. Chapman and Hall, New York, 1986. [4] J. Aitchison. Principles of compositional data analysis. Multivariate Analysis and its Applications, 24:73 81, 1994. [5] J. Aitchison and J.-J. Egozcue. Compositional data analysis: Where are we and where should we be heading? Mathematical Geology Journal, 37:829 850, 2005. [6] S.-I. Amari. Information Geometry and Its Applications. Springer-Verlag, Berlin, 2016. [7] O. Barndorff-Nielsen. Information and Exponential Families in Statistical Theory. Wiley Publishers, 1978. 2https://bitbucket.org/Richard Nock/coda [8] J.-D. Boissonnat, F. Nielsen, and R. Nock. Bregman Voronoi diagrams. Discrete Comput. Geom., 44(2):281 307, 2010. [9] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. [10] J. Chiquet, M. Mariadassous, and Stéphane Robin. Variational inference for probabilistic Poisson PCA. Annals of Applied Statistics (to appear), 2018. [11] D.-A. Clevert, T. Unterthiner, and S. Hochreiter. Fast and accurate deep network learning by exponential linear units (ELUs). In 4th ICLR, 2016. [12] M. Collins, S. Das Gupta, and R. Schapire. A generalization of principal components analysis to the exponential family. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, NIPS*15, 2002. [13] J.-J. Egozcue, V. Pawlowsky-Glahn, G. Mateu-Figueras, and C. Barceló-Vidal. Isometric logratio transformations for compositional data analysis. Mathematical Geology Journal, 35:279 300, 2003. [14] G. Reid G. B. Gloor. Compositional analysis: a valid approach to analyze microbiome highthroughput sequencing data. Can J Microbiol., 12:1 12, 2016. [15] G. B. Gloor, J. R. Wu, V. Pawlowsky-Glahn, and J. J. Egozcue. It s all relative: analyzing microbiome data as compositions. Ann Epidemiol., 26:322 9, 2016. [16] M. Greenacre. Compositional Data Analysis in Practice. Chapman and Hall, New York, 2018. [17] L. Lahti, J. Salojärvi, A. Salonen, M. Scheffer, and W. M. de Vos. Tipping elements in the human intestinal ecosystem. Nat. Commun., 5:4344, 2014. [18] L. Lahti, S. Sudarshan, T. Blake, and J. Salojarvi. Microbiome R package. version 1.1.10012, 2017. [19] A. J. Landgraf. Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters. Ph D thesis, Ohio State University, 2015. [20] D. Lovell, W. Müller, J. Taylor, A. Zwart, and C. Helliwell. Caution! compositions! can constraints on omics data lead analyses astray? Technical Report EP10994, CSIRO, 2010. [21] A.F. Andersson L.W. Hugerth. Analysing microbial community composition through amplicon sequencing: From sampling to hypothesis testing. Frontiers in Microbiology, 8:1561, 2017. [22] P. Maréchal. On a functional operation generating convex functions, part 1: duality. J. of Optimization Theory and Applications, 126:175 189, 2005. [23] P. Maréchal. On a functional operation generating convex functions, part 2: algebraic properties. J. of Optimization Theory and Applications, 126:375 366, 2005. [24] J. A. Martín-Fernández, V. Pawlowsky-Glahn, J. J. Egozcue, and R. Tolosona-Delgado. The statistical analysis of compositional data (with discussion). Math Geosci, 50:273 298, 2018. [25] R. Nock, A.-K. Menon, and C.-S. Ong. A scaled Bregman theorem with applications. In NIPS*29, pages 19 27, 2016. [26] R. Nock, F. Nielsen, and S.-I. Amari. On conformal divergences and their population minimizers. IEEE Trans. IT, 62:1 12, 2016. [27] V. Shankar O. Paliy. Application of multivariate statistical techniques in microbial ecology. Molecular ecology, 25:1032 57, 2016. [28] S. J. D. O Keefe, J. V. Li, L. Lahti, J. Ou, F. Carbonero, K. Mohammed, J. M. Posma, J. Kinross, E. Wahl, E. Ruder, K. Vipperla, V. Naidoo, L. Mtshali, S. Tims, P. G. B. Puylaert, J. De Lany, A. Krasinskas, A. C. Benefiel, H. O. Kaseb, K. Newton, J. K. Nicholson, W. M. de Vos, H. R. Gaskins, and E. G. Zoetendal. Fat, fiber and cancer risk in African Americans and rural Africans. Nat. Commun., 6:6342, 2015. [29] V. Pawlowsky-Glahn and A. Buccianti. Compositional Data Analysis, theory and applications. Wiley, 2011. [30] Ke Sun and Stéphane Marchand-Maillet. An information geometry of statistical manifold learning. In 31st ICML, pages 1 9, 2014. [31] R. Tolosona-Delgado V. Pawlowsky-Glahn, J. J. Egozcue. Lecture notes on compositional data analysis. Technical report, Girona University, 2007. [32] L. van der Maaten and G. E. Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 9(Nov):2579 2605, 2008.