# treemoco_contrastive_neuron_morphology_representation_learning__e9a9772a.pdf

Tree Mo Co: Contrastive Neuron Morphology Representation Learning

Hanbo Chen , Tencent AI Lab hanbochen@tencent.com

Jiawei Yang , University of California, Los Angeles jiawei118@ucla.edu

Daniel Maxim Iascone University of Pennsylvania, Philadelphia daniel.iascone@pennmedicine.upenn.edu

Lijuan Liu Southeast University, Nanjing juan-liu@seu.edu.cn

Lei He University of California, Los Angeles lhe@ee.ucla.edu

Hanchuan Peng Southeast University, Nanjing h@braintell.org

Jianhua Yao Tencent AI Lab jianhuayao@tencent.com

Morphology of neuron trees is a key indicator to delineate neuronal cell-types, analyze brain development process, and evaluate pathological changes in neurological diseases. Traditional analysis mostly relies on heuristic features and visual inspections. A quantitative, informative, and comprehensive representation of neuron morphology is largely absent but desired. To fill this gap, in this work, we adopt a Tree-LSTM network to encode neuron morphology and introduce a self-supervised learning framework named Tree Mo Co to learn features without the need for labels. We test Tree Mo Co on 2403 high-quality 3D neuron reconstructions of mouse brains from three different public resources. Our results show that Tree Mo Co is effective in both classifying major brain cell-types and identifying sub-types. To our best knowledge, Tree Mo Co is the very first to explore learning the representation of neuron tree morphology with contrastive learning. It has a great potential to shed new light on quantitative neuron morphology analysis. Code is available at https: //github.com/Tencent AILab Healthcare/Neuron Representation.

1 Introduction

The central nervous system is composed of hundreds to thousands of neuronal subtypes. Subtype is the critical determinant of neuronal function within local and long-range circuits and is identified by a combination of gene expression patterns, dendrite/axon morphology, synaptic connectivity, and electrophysiological properties. In particular, dendritic morphology is a central node of neuronal subtype identity, as dendritic branches receive the vast majority of the presynaptic input onto each neuron and directly shape the electrophysiological properties of the neuron that governs its firing output. Delineating neuron cell-types based on their dendritic morphology is one of the core problems of neuroscience for the past century [51].

: Equal contribution. : Work done when authors were at Tencent AI Lab. : Corresponding author.

36th Conference on Neural Information Processing Systems (Neur IPS 2022).

Figure 1: Distribution of neurons in the embedding space of Tree Mo Co visualized by t-SNE [39]. Thumbnails of dendrites are randomly colored. (b-e) Zoom-in view of regions in (a) showing clustering of isocortex layer 5 cells (b), layer 2/3 cells (c), basket cells (d), and Martinotti cells (e).

Neuron dendritic morphology can be digitally represented by a 3D neuron reconstruction (Fig. 2(a)), a tree graph with 3D coordinates of each node; the root is placed at soma. With recent advances in imaging [9, 11, 47] and computational methods [20, 27, 42], 3D reconstruction of whole neuron trees with high-throughput is now accessible for mouse brain [28, 44]. However, due to the lack of effective measurement, most analyses of these 3D reconstructions are still largely based on biologists visual inspection. Considering the enormous number of neurons in the brain and the 3D formation of neuron morphology, such visual inspection is subjective, biased, and has low throughput.

The past decade has witnessed the success of deep learning in learning representations of almost all kinds of data such as image, language, sound, and graph [2]. However, as a special type of graph or graphic data, most of the quantitative analysis of neuron trees is still based on simple heuristic features [33]. Only a few attempts have been made in learning deep representations of neuron morphology. In [52], the authors proposed to encode neuron morphology with tree recurrent neural network (TRNN). However, this method relies on manual annotations, limiting its application to supervised training only. In [22], the authors proposed a generative self-supervised learning (SSL) method for neuron morphology representation learning, named Morph VAE, to reconstruct original neuron trees via a variational autoencoder and random walk sampling. Since this method

cannot encode tree topology structure, its classification performance is limited and the method still relies on label information to generate reasonable predictions. In [32], the authors adopt multi-view based SSL to learn representations of neuron morphology. Though the work mainly focuses on the local components of neuron branch which is not available in whole neuron reconstruction data, its promising results advocate the possibility of learning morphology representation without labels.

Inspired by the contrastive learning successes in visual representation, we experiment with adapting Mo Co for neuron morphology representation learning, namely Tree Mo Co. It can learn embedding of 3D neuron reconstructions without needing prior knowledge such as cell type annotation. It first maps neuron trees input to a deep embedding space by our proposed neuron encoder, and then incorporates an SSL pipeline composed of neuron tree augmentations and a contrastive loss to learn representations. By testing the framework on 2403 neuron reconstructions from 3 different data resources, we are excited to find that Tree Mo Co can clearly delineate neuron morphology in the embedding space (Fig. 1). Comparison with prior arts shows that Tree Mo Co can achieve premium results than supervised methods even without annotations.

2 Background and Related Works

Heuristic features of neuron morphology. To quantitatively describe the neuron morphology, a set of heuristic features and related tools namely L-measure [33] has been proposed and widely used. These features can be summarized as 4 categories including (1) distance-based measurements (e.g., branch length, distance to soma); (2) angle-based measurements (e.g., bifurcation angle); (3) topology-based measurements (e.g., number of bifurcations, branch order); and (4) size-based measurements (e.g., branch radius, surface area). Statistics of these features such as mean or standard deviation are usually adopted to describe neuron morphology. Such representation is easy to compute and explain. However, they cannot differentiate complex morphologies (examples in Appx. A.1).

Cell-type and neuron morphology classification. As the primary function of dendritic branches is to compartmentalize functionally related synaptic inputs, dendritic morphology can be used to categorize neuronal subtypes based on total dendritic length, amount of branching across the dendritic trees, and localization of dendritic branch fields to different brain regions (such as layers of neocortex). Most existing cell-type classification algorithms rely on hand-crafted features [33] to predict or cluster cell types [40, 24]. In some recent works, researchers started to adopt tree-based model [52, 23] and sequence-based model [22] to classify neuron morphology. Despite competing performance than classic methods, training of these models requires manual annotations of cell types.

Contrastive learning. Contrastive learning (CL) is a SSL method that aims at bringing positive sample pairs closer and spreading away the negative sample pairs in the feature embedding space. This method dates back to [13] or earlier, and has recently drawn significant interest across different communities [45, 25, 17, 48, 16, 6, 5]. Some research efforts have been made in graph contrastive learning (GCL) [37, 30, 15, 49, 46]. Particularly, Graph CL [49] studies the graph data augmentations systematically for GCL, and Info GCL [46] provides a theoretical analysis for it. However, to our best knowledge, no work has studied the tree graph, a special instance of graph, within the context of CL.

3.1 Tree graph representation of neuron

We define a neuron reconstruction as T (V, E), where V = {vk}; vk = (xk, yk, zk) are nodes with 3D coordinates and E = {ek}; ek = (vk, vparent k ) are edges connecting each node with its parent. As shown in Figure 2-(e), the number of 1-degree nodes could vary by reconstruction and image resolutions, while the bifurcation/tip nodes preserve major topology information. For computational efficiency, we remove 1-degree nodes and directly connect the root node, the bifurcation nodes, and the tip nodes in T (Fig. 2(b)). To preserve the shape information of 1-degree node sequences (branch in Fig. 2(e)), we calculate the related features and concatenate them with the coordinate property of its downstream node, as illustrated in Figure 2-(e,f). The processed tree graph is defined as Υ Process(T) (X, G), where G represents the tree topology and X = {xk}N k=1 is an N (3 + Q) matrix that is from N nodes with 3D coordinates and Q-dim branch features.

: Euclidean distance 𝐷

: edge vector sequence 𝐹𝐸

shift and rotate

Branch features Processed neuron tree graph Example neuron reconstruction Topology augmentation

(i) Original (ii) Dropped

(iii) Skipped (iv) Swapped

[𝑥, 𝑦, 𝑧, 𝐿, 𝐶, 𝐹𝐸]

bifurcation/tip node

1-degree node

low resolution high resolution

Figure 2: Illustration of (a) neuron reconstruction, (b) processed tree graph, (c) branch feature computation, (d) topology augmentations. (ii) to (iv) correspond to (9) to (11) in Table 1. (e) and (f) illustrate neuron tree processing for computation efficiency and resolution normalization purposes.

Input Tree Υ = (X, 풢)

Neuron Encoder

Contrasting

X = ϕ(X) z = η( X, 풢)

h(z) ||h(z)||2 Projection Augmentation

LSTM-based Tree Path Processor

Child Nodes i, j Parent Node k

Figure 3: The overview of Tree Mo Co: (a) Tree Mo Co pipeline. (b) The zoom-in view of how the LSTM-based tree path processor η propagates information from child nodes i and j to a parent node k. Blue and green colors denote the corresponding information flows for two LSTM cells, respectively. The symbol Σ denotes summation. The dashed rectangles mark our double-cell LSTM processor.

Branch feature is a Q-dim vector, of which two dimensions denote length and contraction. Length L is the accumulated distance between adjacent nodes along a branch: L = Pstart i<end i ||vi vi+1||2, which measures size. Contraction is the ratio between the Euclidean distance D = ||vstart vend||2 and the length L of a branch: C = D/L, which measures the curviness. Notably, both measurements are invariant to the order of edges and thus cannot differentiate certain patterns (details in Appx. A.1). To allow a comprehensive encoding of branch pattern, we resample each branch with a fixed number of vertices (8 in this paper) and take the sequence of the edge vectors Ei = vi vi+1 as the its shape feature: FE R24. In addition, the branch orientation is unified before extracting feature vectors such that the start and end points are on the x-axis and the principle direction is aligned with y-axis (Fig. 2(c)). Together, there are 29 features for each neuron node: x = [x, y, z; L; C; FE].

3.2 Neuron Encoder

We next propose a neuron encoder f to project the neuron tree graph into a feature space. Our model has two components: a trainable MLP (multi-layer perceptron) embedding layer (ϕ) that maps point coordinates and other node-related features into the embedding space, and a LSTM-based tree-path processor (η) that exploits tree topology information. Formally, we define the neuron encoder as f η ϕ. Figure 3 (a) shows an overview of our neuron encoder.

Embedding Layer (ϕ). The embedding layer maps input features vector x into d-dim embedding vectors x. It is implemented by a three-layer perceptron with Batch Norm and Re LU attached to the input and hidden layers. We refer to the outputs of this embedding layer as node embeddings. Formally, we compute xk = ϕ(xk), where xk is the k-th node of a neuron, or jointly as X = ϕ(X).

LSTM-based tree-path processor (η). The original LSTM [18] architecture applies for strictly sequential data. However, a neuron tree structure is more complicated than a simple sequence. Therefore, we adopt a child-sum Tree-LSTM architecture, similar to [38], to process such data. Recall the hidden state and the cell state at time t and layer i are usually denoted as ht,i and ct,i in LSTMs [18]. Here we denote them similarly, but indexed by tree node, as hk,i and ck,i at node k. To aggregate information from multiple child nodes, we simply sum over the hidden states and cell states. The transition equations can be formulated as followings:

j C(k) hj,1, c k,1 = P

j C(k) cj,1 (1)

hk,1, ck,1 = LSTM1( xk, (h k,1, c k,1))) (2)

j C(k) hj,2, z k = P

j C(k) zj (3)

hk,2, zk = LSTM2(ck,1, (h k,2, z k)) (4)

where C(k) denotes the set of child nodes of node k, and zk is the final cell state. We use two layers of LSTM cells (non-shared weights) to better capture the topology information (Eqns. 2, 4). Our tree-path processor is designed in a bottom-up manner where the information is aggregated from each individual leaf nodes and flowed to the root node. We then use the final cell state of the root node z, referred to as neuron representation, for downstream tasks, e.g., predicting class labels by an additional single linear layer classifier or projected by a projection head [5, 16] for contrastive learning. Formally, we compute the representation of a neuron tree by z = η( X, G) (the propagation path is determined by G). Figure 3 (b) demonstrates how information is propagated along the path.

3.3 Contrastive learning of tree graph

Recent progress in self-supervised contrastive learning, stemmed from visual representation learning [45, 16, 6, 5, 12], has been seen in many other fields, including graph contrastive learning [46, 49] an overall under-explored direction. Attempts have been made to study self-supervised graph representation learning in terms of data augmentation [49], network architecture, and overall pipelines [46]. Although the neuron tree is a special case of graph structure, we argue that it is too special to share the same setups as traditional graphs in terms of contrastive learning [49]. Below we first introduce our Tree Mo Co pipeline, then elaborate on the subtle but important difference between a neuron tree and a general graph, subsequently introducing our data augmentations for tree structure.

Tree Contrastive Learning. Figure 3 illustrates the overall pipeline of Tree Mo Co. The core idea of contrastive learning is to maximize the similarity between positive sample pairs and minimize it among negative sample pairs by optimizing a contrastive loss function [45, 5, 13]. We follow a successful exemplar, Mo Co [16, 6], in visual contrastive learning to build our framework.

Conceptually, Mo Co [16] abstracts contrastive learning as a dictionary look-up problem. For each encoded query q, there is a set of encoded keys {k0, k1, k2, ...} in a dictionary. The task for a model is to pull closer q and its matched positive key k+ in the dictionary while to spread q away from all other negative keys k . To that end, we follow the successful experience in [5, 16] to use the dot-product as similarity measurement, and the Info NCE [45] contrastive loss:

Lq = log exp(q k+/τ) exp(q k+/τ) + P

k exp(q k /τ) (5)

where τ is a temperature hyper-parameter [45]. This loss is defined for one input instance, and the total loss is averaged over a batch.

To construct a pair of query and key, given a set of trees D and a set of possible augmentations T , we first uniformly sample a tree Υ D and transform it into two augmented views Υq = t(Υ) and Υk = t (Υ), referred to as the query view and the key view, by applying two sampled augmentations t T , t T . Then, we compute their neuron representations using a neuron encoder defined in Sec. 3.2, i.e., zq = f(Υq), zk = f (Υk). Following the improvement in Sim CLR [6], we further map the neuron representations to another latent space using a non-linear projection head h( ) and normalize them by q = h(zq)/||h(zq)||2 and k = h (zk)/||h (zk)||2. Here, f and h stand for the momentum copies of f and h, named momentum encoder [16]. Therefore, q and k+ in Equation 5 are naturally generated from a single sample, and k s are from a queue [16] that stores features from previous batches.

Discussion: data augmentation on graphs, trees, and neuron morphology. Data augmentation is essential for self-supervised contrastive learning [5]. Current successful data augmentations for graphs [49] mainly include (1) node dropping that randomly drops graph nodes and their associated edges, (2) edge perturbation that randomly adds or drops edges, (3) attribute masking that acts like dropout [36] to node features, and (4) subgraph sampling that uses random walk to sample a partial graph. These augmentations can apply to general graph data without loss of generality. However, some of them are unsuitable for tree-structure graphs without special consideration. For example, the node dropping at a rate of 0.2 (default in [49]) would produce a pruned tree since all the subtrees would be removed once their parent node was dropped. Besides, edge perturbation is unlikely to be feasible in a tree structure if no strategical design is considered. Two nodes in a neuron tree may be spatially close to each other while having long distance in terms of topology (e.g., number of hops from one node to another) and vice versa. The perturbed edges would create cyclic graphs on the one hand, and break the consistency between space (3d coordinates) and structure (nodes connectivity) on the other hand. Therefore, we argue the special consideration is needed for the design of data augmentations on tree-structured data.

Data augmentation for neuron morphology representation learning. Regarding the characteristics of neuron data, we propose 11 available augmentations for neuron morphology representation learning, which are briefly summarized in Table 1 and some implementations are provided in Appendix B.3. The rationale behind our design is that neuron representation should be invariant to augmentations the golden rule applicable in many cases. For neuron data, such augmentations may occur in (i) node coordinates, (ii) overall morphology, (iii) attribute masking and (iv) topology. Thus we can augment neuron data from different perspectives as Table 1 shows. Jittering (4, 6) means adding random noise in a small range; deforming (7) means scaling each branch independently. We also include attribute masking (8) used in general graph augmentation. For (iv) topology, we propose three different augmentations. The first is (9) truncated dropping subtrees that recursively visits nodes from the root to leaves and drops a subtree with a small probability (e.g., 0.05). The name truncated means at most m revisesubtrees will be dropped. Similarly, we have (10) skipping parent node that at most m parent nodes will be replaced by one of their subtrees (respectively), and (11) swapping sibling child that picks a subtree from a sibling node and swaps that subtree with another subtree in the current node; all pickings are random. For simplicity, we empirically use the same m = 10 for (9)-(11) in this paper. Figure 2 (d) illustrates how topology augmentation acts like.

Table 1: Overview of data augmentations for neuron morphology representation learning.

Category Target Augmentations

(i) Point Transformation Coordinates 1) Scaling, 2) Rotating, 3) Shifting, 4) Jittering, 5) Flipping

(ii) Morphology Branch feats 6) Jittering branches, 7) Deforming branches

(iii) Attribute Masking Branch feats 8) Masking attributes

(iv) Topology Edges 9) Truncated dropping subtrees 10) Truncated skipping parent node 11) Truncated swapping sibling child

4 Experiments

4.1 Experiment setups

Data. We download data from 3 public resources: (1) BICCN f MOST data from Brain Image Library (BIL, https://download.brainimagelibrary.org/biccn/zeng/luo/f MOST/) [28], (2) Janelia Mouse Light (JML, http://mouselight.janelia.org/) [44], and (3) Allen Cell Types (ACT, https://celltypes.brain-map.org/) [1]. In total, 3358 three-dimensional mouse brain neuron reconstructions are collected (refer to Table 2 for details). To facilitate analysis, we exclude cell types with less than 50 reconstructions and the neurons with obvious reconstruction errors, yielding 2403 neurons in total. Some of the samples include both dendrite and axon reconstructions. For simplicity, we only keep dendritic arbors in this paper and leave the analysis of axons for future works. All three datasets have registered reconstructions to the 3D Allen mouse brain Common Coordinate Framework (CCFv3 [41]) and thus their scales are the same. Meanwhile, the absolute coordinates and orientations of registered neurons could also carry cell-type information that are

related to brain anatomy instead of neuron morphology. To eliminate such potential information leakage, all dendrites are shifted and rotated such that the root is placed at the origin and the X/Y/Z coordinates are determined based on principle component analysis (PCA). Data are then processed and analyzed following the steps introduced in the Section 3.1. In the interest of evaluating Tree Mo Co s capability of classifying known cell types across data sources, we specifically use the cell types with rich coverage among 3 datasets only (grayed cell types in Table 2). We denote these sub-datasets as BIL-6, JML-4 and ACT, respectively. For supervised training, 80% samples in each dataset are randomly picked as training set and 20% are reserved for testing. For Tree Mo Co pre-training, we pre-train on the joint training sets of BIL-6, JML-4, and ACT when evaluating the supervised downstream tasks and train on all samples except those from class others when analyzing the embedding space.

Table 2: Demographic table of reconstructions. Refer to Appendix A.2 for full names of brain regions.

Dataset Number of neuron reconstructions in each brain regions

Total L2/3 L4 L5 L6 VPM CP VPL SUB PRE MG Others

BIL [28] 1741 126 56 315 93 378 311 80 4 - 50 328 JML [44] 1107 64 - 179 114 12 - 5 70 60 - 602 ACT [1] 510 111 123 164 97 - - - - - - 15 Total 3358 301 179 658 304 390 311 85 74 60 50 945

Implementations and evaluation. We postpone the detailed implementations and settings of hyper-parameters of Tree Mo Co to Appendix B.1. For implementations of all the compared methods, we directly use the official codes and the default settings of Morph VAE [22], TRNN [52] and Graph CL [52] to obtain the results. Please refer to Appendix B.2 for their implementation details. All models share the same 29-d input features and are trained for 100 epochs. For evaluation, we report the sample-wise accuracy. During pre-training, we evaluate models every 5 epochs via a K Nearest-Neighbor (KNN) classifier for unsupervised methods or their inherent classification heads for supervised methods and report the best test set performance for all the studied methods. This strategy mainly follows [45, 49]. We set the number of neighbors to 20 for BIL-6 and ACT datasets and 5 for the JML-4 dataset since a minority class might only have 12 samples in total in JML-4.

4.2 Tree Mo Co delineates neuron morphology in the embedding space

Visual inspection of cell types delineation. We first demonstrate that Tree Mo Co can learn neuron tree representations that manifest different characteristics of neuron morphology. Self-supervised training is conducted on the full dataset, and the distribution of the embeddings is visualized by t-SNE [39] in Figures 1 and 4. By visual inspection, it is evidenced that Tree Mo Co can separate major cell-types such as pyramidal neuron (Figs. 1(b-c)) and interneuron (Figs. 1(d-e)). Pyramidal neurons are the primary long-range output neurons of the brain [35] while interneurons mainly sculpt local network dynamics through a variety of circuit motifs. The asymmetric long projecting apical tuft is a unique feature that differentiates pyramidal neuron s morphology with interneuron s. Tree Mo Co can further delineate subtypes of pyramidal neuron and interneurons. In Figure 1 and 4(a), we can clearly see the separation of layer 2/3 (Fig. 1(c)) and layer 5 (Fig. 1(b)) pyramidal neurons. Layer 2/3 neurons can be distinguished by their shorter apical dendrites with extensive oblique branching [43] while layer 5 neurons feature extended apical dendrites with extensive horizontal branching in layer 1 and sparse oblique branching in layer 2/3 [14]. The two primary classes of inhibitory interneurons (parvalbumin-expressing basket cells and somatostatin-expressing Martinotti cells) are also clearly distinguished with Tree Mo Co. Parvalbumin-expressing basket cells have extensive branching field that extends radially around the soma (Fig. 1(d)) [29]. In contrast, somatostatin-expressing Martinotti cells tend to have unipolar or bipolar dendritic branching that extends vertically above and below the soma (Fig. 1(e)) [34]. It should be noted that such delineation is purely self-driven without any human labels or any prior knowledge of cell types. In contrast, the delineation is less clear in Graph CL s result and not observed in Morph VAE s result (details in Appx. D).

Quantitative evaluation. Here we quantitatively evaluate the quality of learned representations of different methods in classifying the known cell-types. For supervised methods, we directly train them on each dataset. For unsupervised methods, we pre-train them on the joint training sets. Table 3 reports the results, with the best test KNN accuracy during unsupervised pre-training or the best test

Table 3: Comparisons to state-of-the-arts neuron cell-type classification. We report the sample-wise accuracy (%). The numbers of classes in BIL-6, JML-4 and ACT are 6, 4, and 4, respectively. Results obtained from officially released codes.

Methods BIL-6 JML-4 ACT

Supervised training in individual datasets. Morph VAE [22] 66.80 40.00 41.05 TRNN [52] 76.56 51.43 46.32

Unsupervised pre-training on joint training sets. Graph CL [49] frozen KNN 69.14 54.29 58.95 Tree Mo Co frozen KNN 78.91 64.29 55.79 Tree Mo Co fine-tuned 83.98 67.14 68.42

Table 4: Ablation on augmentations. We report the accuracy (%) of KNN on BIL-6 test set with models pre-trained on the full BIL-6 dataset with different augmentations.

None Point Mask Morpho Drop Skip Swap

Swap 68.75 79.5 67.97 71.29 71.09 71.88 67.78

69.34 76.37 71.49 69.73 73.25 75.0 71.88

71.09 71.88 70.51 73.64 71.1 73.25 71.09

70.12 77.54 71.88 69.34 73.64 69.73 71.29

67.97 79.69 68.17 71.88 70.51 71.49 67.97

77.34 78.12 79.69 77.54 71.88 76.37 79.5

66.61 77.34 67.97 70.12 71.09 69.34 68.75

(c) Average Length

(d) Total Length

(e) Average Contraction

(f) Std. of Contraction

(g) # of Nodes

(h) Tree Height (b) Datasets

(a) Classes

1 2 3 4 5 6 7

Figure 4: Thumbnails of neurons in the same embedding space of Figure 1 colored by different attributes. It is evidenced Tree Mo Co encodes cell types (a) and neuron morphology features (c)-(h), without human labels. Refer to Appendix D for high-resolution figures.

accuracy during supervised training included. It can be observed that Tree Mo Co trained without label information can already achieve better performance than supervised Morph VAE [22] and TRNN [52]. Fine-tuning the pre-trained neuron encoder further improves cell-type classification. This is mainly because Morph VAE [22] is based on random walk to capture topology information, which ignores tree architecture. While TRNN [52] has less description power than our neuron encoder in capturing high-order interaction between features. Graph CL [49] outperforms Tree Mo Co in the ACT dataset but falls behind in BIL-6 and JML-4 datasets. This coincides with the data quality differences (ACT reconstruction is less complete than BIL and JML, more details in Appx. A.2). One explanation is that Graph CL pays more attention to local context and thus is less sensitive to global changes. It should be emphasized that although the overall test set performance is improved during training, we observe that the test KNN accuracy of both Tree Mo Co and Graph CL fluctuates and suffers variations, especially for the smaller JML-4 and ACT datasets. Graph CL fails to improve models over time in the JML-4 dataset under some scenarios. Please refer to Appendix C for more details and discussions, where we provide the performance curves of different methods over training.

Tree Mo Co manifests neuron morphology distribution. Figure 4 shows the t-SNE plots of neuron representations. Each plot is colored by a different morphology feature, i.e., average/total branch length (c, d), average / standard deviation of contraction (e, f), number of nodes in a neuron (g) and the height of a neuron tree (h). Neurons that share similar morphology features are well clustered in

the embedding space (see (1) to (7)), demonstrating the good representations learnt by Tree Mo Co. These morphology attributes have not been explicitly provided as input features but it turns out that Tree Mo Co can mine such information by itself, without human labels. In addition, we can see it by cross-referencing Figure 4 (a) and (b) that neurons of a same class but from different datasets are clustered together in Tree Mo Co s representation space, showing Tree Mo Co s unbiasedness to data sources a much desirable property for neuron morphology representation learning.

4.3 Ablation on augmentation methods and network architectures

In this subsection, we first assess the importance of data augmentations in contrastive neuron morphology learning. Then we ablate on several design choices in our neuron encoder.

To systematically study the effects of different categories of data augmentations, we first group the proposed augmentations in Table 1 by their categories, except for the topology augmentation, the one we are more interested in. Then, we apply them individually or in pairs to Tree Mo Co framework with a higher applying probability due to the reduced number of augmentations (details in Appx. B.2.) when pre-training. Note that this ablation does not aim to figure out the best compositions of presented augmentations, but to unravel how different augmentations contribute to the final performance. For ablation purposes, in this experiment we train on the full BIL-6 dataset for 50 epochs. Table 4 shows the KNN evaluation results under each individual category or a composition of data augmentations. Here we do not specify the order of augmentations, thus resulting a symmetric performance matrix. Unlike the observation in visual or graph representation contrastive learning that no single augmentation performs the best [5, 49], we find in neuron morphology representation learning certain kinds of augmentations are good enough for pre-training (e.g., point transformation, skipping parent node). Topology augmentations do not necessarily bring better performance. This could be caused by the mechanism of Tree-LSTM, which aggregates the whole tree to generate embeddings and thus is sensitive to the global changes caused by topology augmentation. Better neuron morphology representations might be excepted if a more proposer set of augmentations or a better network architecture were applied under the Tree Mo Co framework.

Then, we study the effects of different components in our neuron encoder. For this experiment, we train neuron encoders for 100 epochs on three datasets and report their best test performance. Table 5 (left) shows the results, where a steady increasing trend of model performance can be observed. We also include the KNN accuracy (Tab. 5 right) for 50-epoch pre-training on the joint dataset and evaluated in BIL-6 dataset just for ablation interest. Here we find that batch normalization (BN) is beneficial for both contrastive self-supervised learning and supervised learning. No BN in the neuron encoder would give significantly worse results (first line of right table). Further adding BN in the projection head is helpful for Tree Mo Co. This behavior has also been reported in non-contrastive SSL methods before, e.g., BYOL [12, 31], where removing BN causes model collapse.

Table 5: Ablation on model designs. (left) Classification accuracy of supervised training for 100-epoch. (right) KNN accuracy of Tree Mo Co pre-training for 50 epochs and evaluated in BIL-6 dataset. BN matters more in Tree Mo Co. Gray rows indicate our default setting.

Settings Classification Accuracy (%)

BN MLP # LSTM Cells BIL-6 JML-4 ACT

- - 1 80.86 54.29 43.16 - 1 83.98 57.14 50.53 1 84.77 58.57 63.16 2 85.55 64.29 62.11

Encoder BN Projector BN KNN accuracy (%) BIL-6

- - 69.14 - 75.00 76.56

5 Conclusion and Limitations

In this work, we have explored contrastive learning for neuron morphology representation learning via Tree Mo Co. Our results on 2403 neuron reconstructions show that Tree Mo Co has the unique potential to identify unbiased cell-types. Modern cell-type classification is heavily dependent on transcriptomic data [50, 8]. However, even neurons within the same mouse possessing identical genetic identity and birth-date demonstrate extremely variable dendritic morphology and synaptic distribution [19]. This variability underlies the need for high-throughput neuronal morphology classification techniques to probe for the existence of novel subtypes within classically established

neuronal identities [3]. Tree Mo Co has a great potential to fill this need and to explore dendritic architecture. These findings could lead to more rigorous output measurements for drug discovery experiments and aid in the development of treatments for neurological disorders characterized by progressive dendrite degeneration such as Alzheimer s disease and depression [10, 7].

We note that Tree Mo Co is still a preliminary effort of its kind. Certain limitations can be observed and need to be solved in the future. For instance, Tree-LSTM aggregates the whole graph and thus is sensitive to global changes. This results in its performance decay on incomplete neuron reconstruction (ACT dataset) and its favor of node-feature-based augmentation over topology-based augmentation. Moreover, compared to most image contrastive learning frameworks, Tree Mo Co s performance on downstream tasks could oscillate after long training epochs on small datasets. An effective early stopping criterion is needed but absent when label information is not available.

Nevertheless, the tree graph is a common data structure representing objects varying from artery and lung vessels [21] to program source code [26] and HTML documents [4]. Another direction of our future work is to extend Tree Mo Co into a more general framework for learning representations beyond neuron morphology. Following the success of contrastive learning in image analysis problems, we believe Tree Mo Co could be a powerful tool to solve unsupervised and few-shot learning problems related to a wide range of tree graph data.

Acknowledgments and Disclosure of Funding

Daniel Maxim Iascone is sponsored by NIH (F32 MH125600-01).

Lijuan Liu is sponsored by a MOST (China) Brain Research Project, "Mammalian Whole Brain Mesoscopic Stereotaxic 3D Atlas" (2022ZD0205200, 2022ZD0205204).

[1] Allen cell types database. allen institute for brain science. http://celltypes.brain-map.org/data.

[2] Yoshua Bengio, Yann Lecun, and Geoffrey Hinton. Deep learning for ai. Communications of the ACM, 64(7):58 65, 2021.

[3] Mark S Cembrowski and Nelson Spruston. Heterogeneity within classical cell types is the rule: lessons from hippocampal pyramidal neurons. Nature Reviews Neuroscience, 20(4):193 204, 2019.

[4] Chia-Hui Chang, Mohammed Kayed, Moheb R Girgis, and Khaled F Shaalan. A survey of web information extraction systems. IEEE transactions on knowledge and data engineering, 18(10):1411 1428, 2006.

[5] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597 1607. PMLR, 2020.

[6] Xinlei Chen, Haoqi Fan, Ross Girshick, and Kaiming He. Improved baselines with momentum contrastive learning. ar Xiv preprint ar Xiv:2003.04297, 2020.

[7] David Cotter, Daniel Mackay, Sabine Landau, Robert Kerwin, and Ian Everall. Reduced glial cell density and neuronal size in the anterior cingulate cortex in major depressive disorder. Archives of general psychiatry, 58(6):545 553, 2001.

[8] Javier De Felipe, Pedro L López-Cruz, Ruth Benavides-Piccione, Concha Bielza, Pedro Larrañaga, Stewart Anderson, Andreas Burkhalter, Bruno Cauli, Alfonso Fairén, Dirk Feldmeyer, et al. New insights into the classification and nomenclature of cortical gabaergic interneurons. Nature Reviews Neuroscience, 14(3):202 216, 2013.

[9] Michael N Economo, Nathan G Clack, Luke D Lavis, Charles R Gerfen, Karel Svoboda, Eugene W Myers, and Jayaram Chandrashekar. A platform for brain-wide imaging and reconstruction of individual neurons. Elife, 5:e10566, 2016.

[10] Eric Falke, Jonathan Nissanov, Thomas W Mitchell, David A Bennett, John Q Trojanowski, and Steven E Arnold. Subicular dendritic arborization in alzheimer s disease correlates with neurofibrillary tangle density. The American journal of pathology, 163(4):1615 1621, 2003.

[11] Hui Gong, Dongli Xu, Jing Yuan, Xiangning Li, Congdi Guo, Jie Peng, Yuxin Li, Lindsay A Schwarz, Anan Li, Bihe Hu, et al. High-throughput dual-colour precision imaging for brainwide connectome with cytoarchitectonic landmarks at the cellular level. Nature communications, 7(1):1 12, 2016.

[12] Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, et al. Bootstrap your own latent-a new approach to self-supervised learning. Advances in Neural Information Processing Systems, 33:21271 21284, 2020.

[13] Raia Hadsell, Sumit Chopra, and Yann Le Cun. Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 06), volume 2, pages 1735 1742. IEEE, 2006.

[14] Kenneth D Harris and Thomas D Mrsic-Flogel. Cortical connectivity and sensory coding. Nature, 503(7474):51 58, 2013.

[15] Kaveh Hassani and Amir Hosein Khasahmadi. Contrastive multi-view representation learning on graphs. In International Conference on Machine Learning, pages 4116 4126. PMLR, 2020.

[16] Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729 9738, 2020.

[17] R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, and Yoshua Bengio. Learning deep representations by mutual information estimation and maximization. ar Xiv preprint ar Xiv:1808.06670, 2018.

[18] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735 1780, 1997.

[19] Daniel Maxim Iascone, Yujie Li, Uygar Sümbül, Michael Doron, Hanbo Chen, Valentine Andreu, Finola Goudy, Heike Blockus, Larry F Abbott, Idan Segev, et al. Whole-neuron synaptic mapping reveals spatially precise excitatory/inhibitory balance limiting dendritic and somatic spiking. Neuron, 106(4):566 578, 2020.

[20] Lida Kanari, Paweł Dłotko, Martina Scolamiero, Ran Levi, Julian Shillcock, Kathryn Hess, and Henry Markram. A topological representation of branching neuronal morphologies. Neuroinformatics, 16(1):3 13, 2018.

[21] Cemil Kirbas and Francis Quek. A review of vessel extraction techniques and algorithms. ACM Computing Surveys (CSUR), 36(2):81 121, 2004.

[22] Sophie C. Laturnus and Philipp Berens. Morphvae: Generating neural morphologies from 3d-walks using a variational autoencoder with spherical latent space. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 6021 6031. PMLR, 18 24 Jul 2021.

[23] Congping Lin, Yuanfei Huang, Tingwei Quan, and Yiwei Zhang. Modelling brain-wide neuronal morphology via rooted cayley trees. Scientific reports, 8(1):1 10, 2018.

[24] Bojan Mihaljevi c, Pedro Larrañaga, Ruth Benavides-Piccione, Sean Hill, Javier De Felipe, and Concha Bielza. Towards a supervised classification of neocortical interneuron morphologies. BMC bioinformatics, 19(1):1 22, 2018.

[25] Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. ar Xiv preprint ar Xiv:1807.03748, 2018.

[26] Han Peng, Ge Li, Wenhan Wang, Yunfei Zhao, and Zhi Jin. Integrating tree path in transformer for code representation. Advances in Neural Information Processing Systems, 34:9343 9354, 2021.

[27] Hanchuan Peng, Jianyong Tang, Hang Xiao, Alessandro Bria, Jianlong Zhou, Victoria Butler, Zhi Zhou, Paloma T Gonzalez-Bellido, Seung W Oh, Jichao Chen, et al. Virtual finger boosts three-dimensional imaging and microsurgery as well as terabyte volume image visualization and analysis. Nature communications, 5(1):1 13, 2014.

[28] Hanchuan Peng, Peng Xie, Lijuan Liu, Xiuli Kuang, Yimin Wang, Lei Qu, Hui Gong, Shengdian Jiang, Anan Li, Zongcai Ruan, et al. Morphological diversity of single neurons in molecularly defined cell types. Nature, 598(7879):174 181, 2021.

[29] Frédéric Pouille and Massimo Scanziani. Enforcement of temporal fidelity in pyramidal cells by somatic feed-forward inhibition. Science, 293(5532):1159 1163, 2001.

[30] Jiezhong Qiu, Qibin Chen, Yuxiao Dong, Jing Zhang, Hongxia Yang, Ming Ding, Kuansan Wang, and Jie Tang. Gcc: Graph contrastive coding for graph neural network pre-training. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1150 1160, 2020.

[31] Pierre H Richemond, Jean-Bastien Grill, Florent Altché, Corentin Tallec, Florian Strub, Andrew Brock, Samuel Smith, Soham De, Razvan Pascanu, Bilal Piot, et al. Byol works even without batch statistics. ar Xiv preprint ar Xiv:2010.10241, 2020.

[32] Philipp J Schubert, Sven Dorkenwald, Michał Januszewski, Viren Jain, and Joergen Kornfeld. Learning cellular morphology with neural networks. Nature communications, 10(1):1 12, 2019.

[33] Ruggero Scorcioni, Sridevi Polavaram, and Giorgio A Ascoli. L-measure: a web-accessible tool for the analysis, comparison and search of digital reconstructions of neuronal morphologies. Nature protocols, 3(5):866 876, 2008.

[34] Gilad Silberberg and Henry Markram. Disynaptic inhibition between neocortical pyramidal cells mediated by martinotti cells. Neuron, 53(5):735 746, 2007.

[35] Nelson Spruston. Pyramidal neurons: dendritic structure and synaptic integration. Nature Reviews Neuroscience, 9(3):206 221, 2008.

[36] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929 1958, 2014.

[37] Fan-Yun Sun, Jordan Hoffmann, Vikas Verma, and Jian Tang. Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization. ar Xiv preprint ar Xiv:1908.01000, 2019.

[38] Kai Sheng Tai, Richard Socher, and Christopher D Manning. Improved semantic representations from tree-structured long short-term memory networks. ar Xiv preprint ar Xiv:1503.00075, 2015.

[39] Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.

[40] Xavier Vasques, Laurent Vanel, Guillaume Villette, and Laura Cif. Morphological neuron classification using machine learning. Frontiers in neuroanatomy, 10:102, 2016.

[41] Quanxin Wang, Song-Lin Ding, Yang Li, Josh Royall, David Feng, Phil Lesnar, Nile Graddis, Maitham Naeemi, Benjamin Facer, Anh Ho, et al. The allen mouse brain common coordinate framework: a 3d reference atlas. Cell, 181(4):936 953, 2020.

[42] Yimin Wang, Qi Li, Lijuan Liu, Zhi Zhou, Zongcai Ruan, Lingsheng Kong, Yaoyao Li, Yun Wang, Ning Zhong, Renjie Chai, et al. Teravr empowers precise reconstruction of complete 3-d neuronal morphology in the whole brain. Nature communications, 10(1):1 9, 2019.

[43] Verena C Wimmer, Randy M Bruno, Christiaan PJ De Kock, Thomas Kuner, and Bert Sakmann. Dimensions of a projection column and architecture of vpm and pom axons in rat vibrissal cortex. Cerebral cortex, 20(10):2265 2276, 2010.

[44] Johan Winnubst, Erhan Bas, Tiago A Ferreira, Zhuhao Wu, Michael N Economo, Patrick Edson, Ben J Arthur, Christopher Bruns, Konrad Rokicki, David Schauder, et al. Reconstruction of 1,000 projection neurons reveals new cell types and organization of long-range connectivity in the mouse brain. Cell, 179(1):268 281, 2019.

[45] Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua Lin. Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3733 3742, 2018.

[46] Dongkuan Xu, Wei Cheng, Dongsheng Luo, Haifeng Chen, and Xiang Zhang. Infogcl: Information-aware graph contrastive learning. Advances in Neural Information Processing Systems, 34, 2021.

[47] Fang Xu, Yan Shen, Lufeng Ding, Chao-Yu Yang, Heng Tan, Hao Wang, Qingyuan Zhu, Rui Xu, Fengyi Wu, Yanyang Xiao, et al. High-throughput mapping of a whole rhesus monkey brain at micrometer resolution. Nature Biotechnology, 39(12):1521 1528, 2021.

[48] Mang Ye, Xu Zhang, Pong C Yuen, and Shih-Fu Chang. Unsupervised embedding learning via invariant and spreading instance feature. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6210 6219, 2019.

[49] Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and Yang Shen. Graph contrastive learning with augmentations. Advances in Neural Information Processing Systems, 33:5812 5823, 2020.

[50] Rafael Yuste, Michael Hawrylycz, Nadia Aalling, Argel Aguilar-Valles, Detlev Arendt, Ruben Armañanzas, Giorgio A Ascoli, Concha Bielza, Vahid Bokharaie, Tobias Borgtoft Bergmann, et al. A community-based transcriptomics classification and nomenclature of neocortical cell types. Nature neuroscience, 23(12):1456 1468, 2020.

[51] Hongkui Zeng and Joshua R Sanes. Neuronal cell-type classification: challenges, opportunities and the path forward. Nature Reviews Neuroscience, 18(9):530 546, 2017.

[52] Tielin Zhang, Yi Zeng, Yue Zhang, Xinhe Zhang, Mengting Shi, Likai Tang, Duzhen Zhang, and Bo Xu. Neuron type classification in rat brain based on integrative convolutional and tree-based recurrent neural networks. Scientific reports, 11(1):1 14, 2021.

1. For all authors...

(a) Do the main claims made in the abstract and introduction accurately reflect the paper s contributions and scope? [Yes] (b) Did you describe the limitations of your work? [Yes]

(c) Did you discuss any potential negative societal impacts of your work? [N/A] (d) Have you read the ethics review guidelines and ensured that your paper conforms to them? [Yes] 2. If you are including theoretical results...

(a) Did you state the full set of assumptions of all theoretical results? [N/A] (b) Did you include complete proofs of all theoretical results? [N/A] 3. If you ran experiments...

(a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] (c) Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? [No] (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] 4. If you are using existing assets (e.g., code, data, models) or curating/releasing new assets...

(a) If your work uses existing assets, did you cite the creators? [Yes] (b) Did you mention the license of the assets? [Yes]

(c) Did you include any new assets either in the supplemental material or as a URL? [No] (d) Did you discuss whether and how consent was obtained from people whose data you re using/curating? [N/A] (e) Did you discuss whether the data you are using/curating contains personally identifiable information or offensive content? [N/A] 5. If you used crowdsourcing or conducted research with human subjects...

(a) Did you include the full text of instructions given to participants and screenshots, if applicable? [N/A] (b) Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? [N/A] (c) Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation? [N/A]