# simplicial_hopfield_networks__00788793.pdf Published as a conference paper at ICLR 2023 SIMPLICIAL HOPFIELD NETWORKS Thomas F. Burns Neural Coding and Brain Computing Unit OIST Graduate University, Okinawa, Japan thomas.burns@oist.jp Tomoki Fukai Neural Coding and Brain Computing Unit OIST Graduate University, Okinawa, Japan tomoki.fukai@oist.jp Hopfield networks are artificial neural networks which store memory patterns on the states of their neurons by choosing recurrent connection weights and update rules such that the energy landscape of the network forms attractors around the memories. How many stable, sufficiently-attracting memory patterns can we store in such a network using N neurons? The answer depends on the choice of weights and update rule. Inspired by setwise connectivity in biology, we extend Hopfield networks by adding setwise connections and embedding these connections in a simplicial complex. Simplicial complexes are higher dimensional analogues of graphs which naturally represent collections of pairwise and setwise relationships. We show that our simplicial Hopfield networks increase memory storage capacity. Surprisingly, even when connections are limited to a small random subset of equivalent size to an all-pairwise network, our networks still outperform their pairwise counterparts. Such scenarios include non-trivial simplicial topology. We also test analogous modern continuous Hopfield networks, offering a potentially promising avenue for improving the attention mechanism in Transformer models. 1 INTRODUCTION Hopfield networks (Hopfield, 1982)1 store memory patterns in the weights of connections between neurons. In the case of pairwise connections, these weights translate to the synaptic strength between pairs of neurons in biological neural networks. In such a Hopfield network with N neurons, there will be N 2 of these pairwise connections, forming a complete graph. Each edge is weighted by a procedure which considers P memory patterns and which, based on these patterns, seeks to minimise a defined energy function such that the network s dynamics are attracted to and ideally exactly settles in the memory pattern which is nearest to the current states of the neurons. The network therefore acts as a content addressable memory given a partial or noise-corrupted memory, the network can update its states through recurrent dynamics to retrieve the full memory. Since its introduction, the Hopfield network has been extended and studied widely by neuroscientists (Griniasty et al., 1993; Schneidman et al., 2006; Sridhar et al., 2021; Burns et al., 2022), physicists (Amit et al., 1985; Agliari et al., 2013; Leonetti et al., 2021), and computer scientists (Widrich et al., 2020; Millidge et al., 2022). Of particular interest to the machine learning community is the recent development of modern Hopfield networks (Krotov & Hopfield, 2016) and their close correspondence (Ramsauer et al., 2021) to the attention mechanism of Transformers (Vaswani et al., 2017). An early (Amit et al., 1985; Mc Eliece et al., 1987) and ongoing (Hillar & Tran, 2018) theme in the study of Hopfield networks has been their memory storage capacity, i.e., determining the number of memory patterns which can be reliably stored and later recalled via the dynamics. As discussed in Appendix A.1, this theoretical and computational exercise serves two purposes: (i) improving the memory capacity of such models for theoretical purposes and computational applications; and (ii) gaining an abstract understanding of neurobiological mechanisms and their implications for biological memory systems. Traditional Hopfield networks with binary neuron states, in the limit of N and P , maintain associative memories for up to approximately 0.14N patterns (Amit et al., 1After the proposal of Marr (1971), many similar models of associative memory were proposed, e.g., those of Nakano (1972), Amari (1972), Little (1974), and Stanley (1976) all before Hopfield (1982). Nevertheless, much of the research literature refers to and seems more proximally inspired by Hopfield (1982). Many of these models can also be considered instances of the Lenz-Ising model (Brush, 1967) with infinite-range interactions. Published as a conference paper at ICLR 2023 1985; Mc Eliece et al., 1987), and fewer if the patterns are statistically or spatially correlated (Löwe, 1998). However, by a clever reformulation of the update rule based on the network energy, this capacity can be improved to N d 1, where d 2 (Krotov & Hopfield, 2016), and even further to 2N/2 (Demircigil et al., 2017). Networks using these types of energy-based update rules are called modern Hopfield networks. Krotov & Hopfield (2016) (like Hopfield (1984)) also investigated neurons which took on continuous states. Upon generalising this model by using the softmax activation function, Ramsauer et al. (2021) showed a connection to the attention mechanism of Transformers (Vaswani et al., 2017). However, to the best of our knowledge, these modern Hopfield networks have not been extended further to include explicit setwise connections between neurons, as has been studied and shown to improve memory capacity in traditional Hopfield networks (Peretto & Niez, 1986; Lee et al., 1986; Baldi & Venkatesh, 1987; Newman, 1988). Indeed, Krotov & Hopfield (2016), who introduced modern Hopfield networks, make a mathematical analogy between their energy-based update rule and setwise connections given their energy-based update rule can be interpreted as allowing individual pairs of preand post-synaptic neurons to make multiple synapses with each other making pairwise connections mathematically as strong as equivalently-ordered setwise connections2. Demircigil et al. (2017) later proved this analogy to be accurate in terms of theoretical memory capacity. By adding explicit setwise connections to modern Hopfield networks, we essentially allow all connections (pairwise and higher) to increase their strength following the same interpretation, this can be thought of as allowing both pairwise and setwise connections between all neurons, any of which may be precisely controlled. Functionally, setwise connections appear in abundance in biological neural networks. What s more, these setwise interactions often modulate and interact with one another in highly complex and nonlinear fashions, adding to their potential computational expressiveness. We discuss these biological mechanisms in Appendix A.2. There are many contemporary models in deep learning which implicitly model particular types of setwise interactions (Jayakumar et al., 2020). To explicitly model such interactions, we have multiple options. For reasons we discuss in Appendix A.3, we choose to model our setwise connections using a simplicial complex. We therefore develop and study Simplicial Hopfield Networks. We weight the simplices of the simplicial complex to store memory patterns and generalise the energy functions and update rules of traditional and modern Hopfield networks. Our main contributions are: We introduce extensions of various Hopfield networks with setwise connections. In addition to generalising Hopfield networks to include explicit, controllable setwise connections based on an underlying simplicial structure, we also study whether the topological features of the underlying structure influences performance. We prove and discuss higher memory capacity in the general case of simplicial Hopfield networks. For the fully-connected simplicial Hopfield network, we prove a larger memory capacity than previously shown by Newman (1988); Demircigil et al. (2017) for higherdegree Hopfield networks. We empirically show improved performance under parameter constraints. By restricting the total number of connections to that of pairwise Hopfield networks with a mixture of pairwise and setwise connections, we show simplicial Hopfield networks retain a surprising amount of improved performance over pairwise networks but with fewer parameters, and are robust to topological variability. 2 SIMPLICIAL HOPFIELD NETWORKS 2.1 SIMPLICIAL COMPLEXES Simplicial complexes are mathematical objects which naturally represent collections of setwise relationships. Here we use the combinatorial form, called an abstract simplicial complex. Although, to build intuition and visualise the simplicial complex, we also refer to their geometric realisations. Definition 2.1. Let K be a subset of 2[N]. The subset K is an abstract simplicial complex if for any σ K, the condition ρ σ gives ρ K, for any ρ σ. 2Work by Horn, D. & Usher, M. (1988) study almost the same system but with an slight modification to the traditional update rule, whereas Krotov & Hopfield (2016) use their modern, energy-based update rule. Published as a conference paper at ICLR 2023 {i,j,k} {i,j,k,l} 1-simplex weights 2-simplex weights 3-simplex weights relative frequency Figure 1: A. Comparative illustrations of connections in a pairwise Hopfield network (left) and a simplicial Hopfield network (right) with N = 4. In a simplicial Hopfield network, σ = {i, j} is an edge (1 simplex), σ = {i, j, k} is a triangle (2 simplex), σ = {i, j, k, l} is a tetrahedron (3 simplex), and so on. B. Connection weight histograms of 1 , 2 , and 3 simplices in a simplicial Hopfield network. In the binary case, the x-axis range is [ P/N, +P/N]. Here, N = 100 and P = 10, thus the range is [ 0.1, +0.1]. Note that each dimension shows a similar, Gaussian distribution of weights (although there are different absolute numbers of these weights; see Mixed diluted networks in Section 2.2). C. Illustration of the hierarchical relationship between elements in the complex, up to 3 simplices, with arrows indicating potential sources of weight modulation or interaction, e.g., between (co)faces or using Hodge Laplacians within the same dimension. Such modulations and interactions (including their biological interpretations) are discussed in Appendices A.2 and A.3. In other words, an abstract simplicial complex K is a collection of finite sets closed under taking subsets. A member of K is called a simplex σ. A k dimensional simplex (or k simplex) has cardinality k + 1 and k + 1 faces which are (k 1) simplices (obtained by omitting one element from σ). If a simplex σ is a face of another simplex τ, we say that τ is a coface of σ. We denote the set of all k simplices in K as Kk. Geometrically, for k = 0, 1, 2, and 3, a k simplex is, respectively, a point, line segment, triangle, and tetrahedron. Therefore, one may visualise a simplicial complex as being constructed by gluing together simplices such that every finite set of vertices in K form the vertices of at most one simplex. This structure makes it possible to associate every setwise relationship uniquely with a k simplex identified by its elements, which in our case are neurons (see Figure 1A). Simplices in K which are contained in no higher dimensional simplices, i.e., they have no cofaces, are called the facets of K. The dimension of K, dim(K), is the dimension of its largest facet. We call a simplicial complex K a k skeleton when all possible faces of dimension k exist and dim(K) = k. A network of N neurons is modelled by N spins. Let K be a simplicial complex on N vertices. In the binary neuron case, S(t) j = 1 at time-step t. Given a set of neurons σ (which contains the neuron i and is a unique (|σ| 1) simplex in K), w(σ) is the associated simplicial weight and S(t) σ the product of their spins. Spin configurations correspond to patterns of neural firing, with dynamics Published as a conference paper at ICLR 2023 governed by a defined energy. The traditional model is defined by energy and weight functions σ K w(σ)S(t) σ w(σ) = 1 µ=1 ξµ σ, (1) with ξµ i (= 1) static variables being the P binary memory patterns stored in the simplicial weights. Similar for spins, ξµ σ is the product of the static pattern variables for the set of neurons σ in the pattern µ. Figure 1B shows examples of the resulting Gaussian distributions of weights at each dimension of the simplicial complex. We use these weights to update the state of a neuron i by applying the traditional Hopfield update rule σ K w(σ)S(t 1) σ\i Θ(x) = 1 if x 0 1 if x < 0 . (2) When K is a 1-skeleton, this becomes the traditional pairwise Hopfield network (Hopfield, 1982). In the modern Hopfield case, the energy function and update rule are σ K F(ξµ σS(t) σ ) (3) S(t) i = sgn F(1 ξµ i + X σ K ξµ σ\i S(t 1) σ\i ) F( 1 ξµ i + X σ K ξµ σ\i S(t 1) σ\i ) where the function F can be chosen, for example, to be of a polynomial F(x) = xn or exponential F(x) = ex form. When K is a 1-skeleton, this becomes the modern pairwise Hopfield network (Krotov & Hopfield, 2016). In the continuous modern Hopfield case, spins and patterns take real values Sj, ξµ j R. Patterns are arranged in a matrix Ξ = (ξ1, ..., ξP ) and we define the log-sum-exp function (lse) for T 1 > 0 as lse(T 1, ΞT S(t), K) = T log σ K exp(T 1Ξµ σS(t) σ ) The energy function is E = lse(T 1, ΞT S(t), K) + 1 2S(t)T S(t). (6) For each simplex σ K, we denote the submatrix of the patterns stored on that simplex as Ξσ (which has dimensions P σ). Using the dot product to measure the similarity between the patterns and spins, the update rule is S(t) = softmax ΞT σ S(t 1) σ In practice, however, the dot product has been found to under-perform in modern continuous Hopfield networks compared to Euclidean or Manhattan distances (Millidge et al., 2022). Transformer models in natural language tasks have also seen performance improvements by replacing the dot product with cosine similarity (Henry et al., 2020), again a measure with a more geometric flavour. However, these similarity measures generalise distances between pairs of elements rather than sets of elements. We therefore use higher-dimensional geometric similarity measures, cumulative Euclidean distance (ced) and Cayley Menger distance (cmd). Let dρ be the (Euclidean or Manhattan) distance between pattern ξµ ρ and spins S(t) ρ for pattern µ and spins ρ σ. Let Kσ 1 be the subset of K such that all elements in Kσ 1 are 1 simplex faces of σ. We define the cumulative Euclidean distance as ced(ξµ σ, S(t) σ ) = s X ρ Kσ 1 (dρ)2. (8) We define cmd(ξµ σ, S(t) σ ) as the Cayley Menger determinant of all ρ Kσ 1 , with distances set as dρ. Published as a conference paper at ICLR 2023 Mixed diluted networks. A computational concern in the above models is that the number of unique possible k simplices is N k+1 , e.g., with N = 100 there are approximately 9.89 1028 possible 50 simplices, compared to just 4, 950 edges (1 simplices) found in a pairwise Hopfield network. If we allow all possible simplices for a simplicial Hopfield network with N neurons, the total number of simplices (excluding 0 simplices, i.e., autapses) will be PN d=2 N d . Simultaneously, there is also an open question as to how many setwise connections is biologically-realistic to model. We also note that setwise connections can be functionally built from combinations of pairwise connections by introducing additional hidden neurons, as shown by Krotov & Hopfield (2021). Therefore, we might in fact be under-estimating the total number of functional setwise connections, which may appear via common network motifs or synapsembles (Buzsáki, 2010). Conservatively, we evaluate classes of simplicial Hopfield networks which are low-dimensional, i.e., dim(K) is small, and where the total number of weighted simplices is not greater than those normally found in a pairwise Hopfield network, i.e., the number of non-zero weights is N 2 . We randomly choose weights to be non-zero, with each weight of a dimension having an equal probability and according to Table 1. (See Appendix A.4 for a small worked example.) Such random networks have previously been studied in the traditional pairwise case as diluted networks (Treves & Amit, 1988; Bovier & Gayrard, 1993a;b; Löwe & Vermet, 2011). Here we study mixed diluted networks, since we use a mixture of connections of different degrees. We believe we are also the first to study such networks beyond pairwise connections, as well as in modern and continuous cases. Topology. Different collections of simplices in a simplicial complex can result in different Euler characteristics (a homotopy invariant property). Table 1 shows this from a parameter perspective via counting only the simplices with non-zero weights. However, even when using the same proportion of 1 and 2 simplices, the choices of which vertices those simplices contain can be different due to randomness. Therefore, the topologies of each network may vary (and so too may their subsequent dynamics and performance). One well-studied and often important topological property in the context of simplicial complexes, homology, counts the number of holes in each dimension. In the 0th dimension, this is the number of connected components; in the 1st dimension, this is the number of triangles formed by edges which don t also have a 2 simplex filling in the interior surface of that triangle; in the 2nd dimension, this is the number of tetrahedra formed by triangles which don t also have a 3 simplex filling in the interior volume of that tetrahedron; and so on. The exact number of these holes in dimension k can be calculated by the k th Betti number, βk (see Appendix A.5). We calculate these for our networks to observe the relationship between homology and memory capacity. 2.3 THEORETICAL MEMORY CAPACITY Mixed networks. Much is already known about the theoretical memory capacity of various Hopfield networks, including those with explicit (Newman, 1988) or implicit (Demircigil et al., 2017) setwise connectivity. However, we wish to point out a somewhat underappreciated relationship between memory capacity and the explicit or implicit number of network connections which, in the fullyconnected network, is determined by the degree of the connections (see Appendix A.6 for proof). Corollary 2.2 (Memory capacity is proportional to the number of network connections). If the connection weights in a Hopfield network are symmetric, then the order of the network s memory capacity is proportional to the number of its connections. What happens when there are connections between the same neurons at multiple degrees, i.e., what we call a mixed network? To the best of our knowledge, the theoretical memory capacity of such networks has not been well-studied. However, we found one classical study by Dreyfus et al. (1987) which showed, numerically, adding triplet connections to a pairwise model improved attractivity and memory capacity. Most prior formal studies have only considered connections at single higher degrees (Newman, 1988; Bengtsson, 1990). Although, higher order neural networks have historically considered such mixtures of interactions on different degrees simultaneously (Zhang, 2012), but as regular neural networks (e.g., feed-forward networks), not Hopfield networks. Higher order Boltzmann machines (HOBMs) (Sejnowski, 1986) have also been studied with mixed connections Published as a conference paper at ICLR 2023 (Amari et al., 1992; Leisink & Kappen, 1999)3. However, HOBMs are unlike Hopfield networks in that they typically have hidden units, are trained differently, and have stochastic neural activations 4. Modern Hopfield networks also include an implicit mixture of connections of different degrees5 (but and see Theorem 1 of Demircigil et al. (2017), which remains unproven the mixture is unbalanced and not particularly natural, especially for F(x) = xn when n is odd). Therefore, we include the following result demonstrating fixed points, large basins of attraction (i.e., convergence) to those fixed points in mixed networks, and memory capacity which is linear in the number of fully-connected degrees of connections (a proof is provided in Appendix A.6). Lemma 2.3 (Fully-connected mixed Hopfield networks). A fully-connected mixed Hopfield network based on a D skeleton with N neurons and P patterns has, when N and P is finite: (i) fixed point attractors at the memory patterns; and (ii) dynamic convergence towards the fixed point attractors within a finite Hamming distance δ. When P with N , the network has capacity to store up to (PD d=1 N d)/(2 ln N) memory patterns (with small retrieval errors) or (PD d=1 N d)/(4 ln N) (without retrieval errors). This naturally comports with Theorem 2 from Demircigil et al. (2017), except here we show an increased capacity in the mixed network, courtesy of Corollary 2.2. Mixed diluted networks. As mentioned earlier, full setwise connectivity is not necessarily tractable nor realistic. Löwe & Vermet (2011) show for pairwise diluted networks constructed as Erdös-Renyi graphs (constructed by including each possible edge on the vertex set with probability p) that the memory capacity is proportional to p N. Crucial for this result is that the random graph must be asymptotically connected. This makes sense, given that if any vertex was disconnected its dynamics could never be influenced. Empirically, it does seem that a certain threshold of mean connectivity in pairwise random networks is crucial for attractor dynamics (Treves & Amit, 1988). Remark 2.4. By a straightforward generalisation of Löwe & Vermet (2011) s result, diluted networks constructed as pure Erdös-Renyi hypergraphs may store on the order of p N d 1 memory patterns, where d is the degree of the connections. In the case of an unbounded number of allowable connections, Remark 2.4 would suggest picking as many higher-degree connections as possible when choosing between connections of lower or higher degrees in our mixed diluted networks. However, in the bounded case (our case), we are non-trivially changing the asymptotic behaviour in terms of connectivity and dynamics when we use a mixture of connection degrees. We also need to beware of asymmetries which may arise (Kanter, 1988). This makes the analysis of mixed diluted networks not particularly straightforward (also see Section 4). 2.4 NUMERICAL SIMULATIONS AND PERFORMANCE METRICS Given the large space of possible network settings, in the main text we focus primarily on conditions listed in Table 1. Additional experiments are also shown in Appendix A.8. Table 1: List of network condition keys (top row), their number of non-zero weights for 1 and 2 simplices (second and third rows), and their functional Euler characteristic (χ, bottom row). N is the number of neurons. C = (N 1)N. For simulation, the number of simplices at each dimension are rounded to the nearest integer. K1 R12 R12 R12 R2 1 simplices 2 simplices 0 0.25 N 2 0.75 N 2 N 2 χ N (1/2)C N 0.25C N N + 0.25C N + (1/2)C 3HOBMs also suffer the same problem as we face here, one of having many high-order parameters between the neurons to keep a track of. Possibly a factoring trick like in Memisevic & Hinton (2010) for HOBMs could be helpful in simplicial Hopfield networks. 4Despite this, there are equivalences (Leonelli et al., 2021; Marullo & Agliari, 2021; Smart & Zilman, 2021). 5Recall that P Published as a conference paper at ICLR 2023 In our numerical simulations, we perform updates synchronously until E is non-decreasing or until a maximum number of steps is reached, whichever comes first. When a simulation concludes we compute the overlap (for binary patterns) or mean squared error (MSE) (for continuous patterns) of the final spin configuration with respect to all stored patterns using i=1 S(t) i ξµ i i=1 (S(t) i ξµ i )2. (9) We say the network recalls (or attempts to recall) whichever pattern has the largest overlap (where mµ = 1 indicates perfect recall) or smallest MSE (where MSEµ = 0 indicates perfect recall). 3 NUMERICAL SIMULATIONS 3.1 BINARY MEMORY PATTERNS After embedding random binary patterns, we started the network in random initial states and recorded the final overlap of the closest pattern. Table 2 shows the final overlaps for traditional simplicial Hopfield networks (N = 100). Our simplicial Hopfield networks significantly outperform the pairwise Hopfield networks (K1). In fact, the R12 model performs as well at 0.3N patterns as the the pairwise network performs on 0.05N patterns, a six-fold increase in the number of patterns and more than double the theoretical capacity of the pairwise network, 0.14N (Amit et al., 1985). Surprisingly, Table 3 shows homology accounts for very little of the variance in network performance. Table 2: Mean standard deviation of overlap distributions (n = 100) from traditional simplicial Hopfield networks with varying numbers (top row) of random binary patterns. K1 is the traditional pairwise Hopfield network. R12 significantly outperforms K1 at all tested levels (one-way t-tests p < 10 11, F > 50.13). At all pattern loadings, a one-way ANOVA showed significant variance between the networks (p < 10 20, F > 26.35). Box and whisker plots shown in Figure 6. No. patterns 0.05N 0.1N 0.15N 0.2N 0.3N K1 0.87 0.18 0.81 0.16 0.66 0.10 0.65 0.10 0.59 0.08 R12 0.96 0.10 0.94 0.14 0.82 0.20 0.71 0.17 0.64 0.13 R12 0.98 0.10 0.99 0.03 0.97 0.10 0.91 0.15 0.76 0.16 R12 1 0 0.99 0.04 0.99 0.05 0.98 0.08 0.87 0.16 R2 1 0 0.99 0.18 0.94 0.18 0.74 0.29 0.53 0.23 3.2 CONTINUOUS MEMORY PATTERNS Energy landscape. Using Equation 3 and given a set of patterns, a simplicial complex K, and an inverse temperature T 1, we may calculate the energy of network states. To inspect changes in the energy landscapes of different network conditions, we set N = 10 and P = 10 random patterns. We performed principle component analysis (PCA) to create a low dimensional projection of the patterns. Then, we generated network states evenly spaced in a 10 10 grid which spanned the projected memory patterns in the first two dimensions of PCA space. We calculated each state s energy by transforming these points from PCA space back into the N dimensional space, across the network conditions at T 1 = 1, 2, 10 (Figure 7). At T 1 = 1, differences between the network conditions energy landscapes are very subtle. However, at T 1 = 2 and T 1 = 10, we see a clear change: those with more 2 simplices possess more sophisticated, pattern-separating landscapes. Recall as a function of memory loading. We tested the performance of our simplicial Hopfield networks by embedding data from the MNIST (Le Cun et al., 2010), CIFAR-10 (Krizhevsky & Hinton, 2009), and Tiny Image Net (Le & Yang, 2015) datasets as memories. We followed the protocol of Millidge et al. (2022) to test recall under increasing memory load as an indication of the networks memory capacities. To embed the memories, we normalise the pixel values between 0 and 1, and treat them as continuous-valued neurons, e.g., for MNIST we have N = 28 28 = 784 neurons. Published as a conference paper at ICLR 2023 We initialise S as one of the memory patterns corrupted by Gaussian noise with variance 0.5. After allowing the network to settle in an energy minima, we measured the performance as the fraction of correctly recalled memories (over all tested memories) of the uncorrupted patterns, where correct recall was defined as a sum of the squared difference being < 50. In all tests, we used T 1 = 100. Also see Appendix A.7 for further simulation details. Figure 2 compares a pairwise architecture, K1, with a higher-order architecture, R12. The performance of the K1 networks is comparable to that shown in Millidge et al. (2022), however, R12 significantly outperforms K1 across all datasets. Since the MNIST dataset is relatively simple and K1 already performs well, the performance improvement is small, albeit significant. In the CIFAR-10 and Tiny Image Net datasets, the performance improvements are more noticeable, with most distance functions seeing improvements of 10% in the fraction of correctly retrieved memories. Fraction correctly recalled MNIST CIFAR-10 Tiny Image Net 200 400 600 800 1000 Memories stored Fraction correctly recalled Euclidean Manhattan Dot Product ced cmd 200 400 600 800 1000 Memories stored 200 400 600 800 1000 Memories stored Figure 2: Recall (mean S.D. over 10 trials) as a function of memory loading using the MNIST, CIFAR-10, and Tiny Image Net datasets, using different distance functions (see legend). Here we compare the performance of modern continuous pairwise networks (top row) and modern continuous simplicial networks (bottom row). The simplicial networks are R12 networks (see Table 1 for information). R12 significantly outperforms the pairwise network (K1) at all tested levels where there was not already perfect recall (one-way t-tests p < 10 9, F > 16.01). At all memory loadings, a one-way ANOVA showed significant variance between the networks (p < 10 5, F > 11.95). Tabulated results are shown in Tables 6, 7, and 8. Also noticeable in the results for CIFAR-10 and Tiny Image Net (see Figure 2) is the relatively high performance of the ced and cmd distance measures. Indeed, cmd performs as well or better than the Manhattan distance in our experiments. And both ced and cmd (along with the Euclidean and Manhattan distances) outperform the dot product in CIFAR-10 and Tiny Image Net at high memory loadings. This further supports the intuition and results of Millidge et al. (2022), that more geometric distances perform better as similarity measures in modern Hopfield networks. 4 DISCUSSION We have introduced a new class of Hopfield networks which generalises and extends traditional, modern, and continuous Hopfield networks. Our main finding is that mixed diluted networks can improve performance in terms of memory recall, even when there is no increase in the number of parameters. This improvement therefore comes from the topology rather than additional information Published as a conference paper at ICLR 2023 in the form of parameters. We also show how distance measures of a more geometric flavour can further improve performance in these networks. This simplicial framework (in diluted or undiluted forms) now opens up new avenues for researchers in neuroscience and machine learning. In neuroscience, we can now model how setwise connections, such as those provided by astrocytes and dendrites, may improve memory function and may interact to form important topological structures to guide memory dynamics. In machine learning, such topological structures may now be utilised in improved attention mechanisms or Transformer models, such as in Ramsauer et al. (2021). At the intersection of these fields, we may now further study how the topology of networks in neuroscience and machine learning systems may correspond to one another and share functional characteristics, such as how the activity of pairwise Transformer models have shown similarities to activities in auditory cortex (Millet et al., 2022). Could setwise Transformer models correspond more closely? Or to a more diverse range of cell types? These and related questions are now open for exploration, and may lead to improved performance in applications (Clift et al., 2020). Convolution operations and higher-order neural networks. From the perspective of modern deep learning, considering higher order correlations between downstream inputs to a neuron is quite classical. For example, convolutional neural networks have incorporated specialised setwise operations since their inception (Fukushima, 1980; Lecun et al., 1998), and more general setwise relationships have also been introduced in higher-order neural networks (Pineda, 1987; Reid et al., 1989; Ghosh & Shun, 1992; Zhang, 2012). Although our setwise connections are not explicitly convolutional, they are in one notable sense conceptually similar: they collect information from a particular subset of neurons and only become active when those particular neurons are active in the right way. One of the main differences, however, is that unlike typical convolution operations we don t restrict the connection locations to some particular locations or arrangements within the input space. Our results therefore suggest that, in some cases, replacing regular feedforward connections with random convolutions may offer improved performance in some circumstances. Improvements and extensions. Our study focusses on random choices of weighted simplices. What if we choose more carefully? Indeed, it seems quite likely biological setwise connections are not random, and are almost certainly not randomly chosen to replace random pairwise connections. It now seems natural to study how online weight modulations (e.g., based on spectral theories) could generate new connections between Hopfield networks and, e.g., geometric deep learning. Such modulations may have novel biological interpretations, e.g., spatial and anti-Hebbian memory structures may be modelled by strategically inserting inhibitory interactions (Haga & Fukai, 2019; Burns et al., 2022) between higher simplices (and may also model disinhibition). Further analytic studies. Our numerical results suggest diluted mixed networks have larger memory capacities than pairwise networks. In a fairly intuitive sense, this is not particularly surprising we are adding degrees of freedom to the energy landscape, within which we may build more stable and nicely-behaved attractors. However, we have not yet proven this increased capacity analytically for the diluted case, only given some theoretical indications as to why this occurs and proven the undiluted case. We hypothesise it is possible to do so using generalised methods from replicasymmetric theory (Amit et al., 1985) or self-consistent signal-to-noise analysis (Shiino & Fukai, 1993), in combination with methods from structural Ramsey theory. The capacity for modern simplicial networks may be on the order of a double-exponential in the number of neurons (since, in the limit of N , there is an exponential relationship in the number of multispin interactions on top of an exponential relationship in the number of intra-multispin interactions, i.e., both pair-spins and multi-spins can have higher degrees of attraction). This capacity, however, will likely scale nonlinearly with the choice of (random) dilution, e.g., there may be a steep drop in performance around a critical dilution range, likely where some important dynamical guarantees are lost due to an intolerably small number of connections of a particular order. Even higher orders and diluted mixtures of setwise connections may also be studied. Such networks, per Lemma 2.3, will likely improve their performance as higher-degree connections are added (as shown in Appendix A.8). However, and as implied in Section 2.3, the number and distribution of these connections may need to be careful chosen in highly diluted settings. Published as a conference paper at ICLR 2023 REPRODUCIBILITY STATEMENT To reproduce our results in the main text and appendices, we provide our Python code as supplementary material at https://github.com/tfburns/simplicial-hopfield-networks. We have also provided a small worked example in Appendix A.4 to help clarify computational steps in the model construction. Assumptions made in our theoretical results are stated in Section 2.3 and Appendix A.6. ACKNOWLEDGEMENTS The first author thanks Milena Menezes Carvalho for graphic design assistance with Figure 1, as well as Robert Tang, Tom George, and members of the Neural Coding and Brain Computing Unit at OIST for helpful discussions. We thank anonymous reviewers for their feedback and suggestions. The second author acknowledges support from KAKENHI grants JP19H04994 and JP18H05213. Elena Agliari, Adriano Barra, Andrea De Antoni, and Andrea Galluzzi. Parallel retrieval of correlated patterns: From Hopfield networks to Boltzmann machines. Neural Networks, 38:52 63, 2013. ISSN 0893-6080. doi: https://doi.org/10.1016/j.neunet.2012.11.010. URL https://www. sciencedirect.com/science/article/pii/S0893608012003073. S Amari, K Kurata, and H Nagaoka. Information geometry of Boltzmann machines. IEEE Trans Neural Netw, 3(2):260 271, 1992. S.-I. Amari. Learning patterns and pattern sequences by self-organizing nets of threshold elements. IEEE Transactions on Computers, C-21(11):1197 1206, 1972. doi: 10.1109/T-C.1972.223477. Daniel J. Amit, Hanoch Gutfreund, and H. Sompolinsky. Storing infinite numbers of patterns in a spin-glass model of neural networks. Phys. Rev. Lett., 55:1530 1533, Sep 1985. doi: 10.1103/ Phys Rev Lett.55.1530. URL https://link.aps.org/doi/10.1103/Phys Rev Lett. 55.1530. Alfonso Araque, Vladimir Parpura, Rita P Sanzgiri, and Philip G Haydon. Tripartite synapses: glia, the unacknowledged partner. Trends in Neurosciences, 22(5):208 215, 1999. Gordon W. Arbuthnott and Jeff Wickens. Space, time and dopamine. Trends in Neurosciences, 30 (2):62 69, 2007. ISSN 0166-2236. doi: https://doi.org/10.1016/j.tins.2006.12.003. URL https: //www.sciencedirect.com/science/article/pii/S0166223606002748. Dmitriy Aronov, Rhino Nevers, and David W. Tank. Mapping of a non-spatial dimension by the hippocampal entorhinal circuit. Nature, 543(7647):719 722, Mar 2017. ISSN 1476-4687. doi: 10.1038/nature21692. Pierre Baldi and Santosh S. Venkatesh. Number of stable points for spin-glasses and neural networks of higher orders. Phys. Rev. Lett., 58:913 916, Mar 1987. doi: 10.1103/Phys Rev Lett.58.913. URL https://link.aps.org/doi/10.1103/Phys Rev Lett.58.913. Xiaojun Bao, Eva Gjorgieva, Laura K. Shanahan, James D. Howard, Thorsten Kahnt, and Jay A. Gottfried. Grid-like neural representations support olfactory navigation of a two-dimensional odor space. Neuron, 102(5):1066 1075.e5, Jun 2019. ISSN 0896-6273. doi: 10.1016/j.neuron.2019.03. 034. URL https://doi.org/10.1016/j.neuron.2019.03.034. Alison L. Barth and James F A Poulet. Experimental evidence for sparse firing in the neocortex. Trends in Neurosciences, 35(6):345 355, 2012. ISSN 01662236. doi: 10.1016/j.tins.2012.03.008. Jacob L. S. Bellmund, Peter Gärdenfors, Edvard I. Moser, and Christian F. Doeller. Navigating cognition: Spatial codes for human thinking. Science, 362(6415):eaat6766, 2018. doi: 10.1126/science.aat6766. URL https://www.science.org/doi/abs/10.1126/ science.aat6766. Published as a conference paper at ICLR 2023 Alex Bellos. He ate all the pi : Japanese man memorises π to 111,700 digits. The Guardian, 2015. URL https://www.theguardian. com/science/alexs-adventures-in-numberland/2015/mar/13/ pi-day-2015-memory-memorisation-world-record-japanese-akira-haraguchi. Mats Bengtsson. Higher Order Artificial Neural Networks. Diane Publishing Company, 1990. ISBN 9780941375924. URL https://books.google.co.jp/books?id=FTl Fe Fg8HH4C. T. V P Bliss and A. R. Gardner-Medwin. Long-lasting potentiation of synaptic transmission in the dentate area of the unanaesthetized rabbit following stimulation of the perforant path. The Journal of Physiology, 232(2):357 374, 1973. ISSN 14697793. doi: 10.1113/jphysiol.1973.sp010274. Erik B Bloss, Mark S Cembrowski, Bill Karsh, Jennifer Colonell, Richard D Fetter, and Nelson Spruston. Single excitatory axons form clustered synapses onto CA1 pyramidal cell dendrites. Nature Neuroscience, 21(3):353 363, March 2018. Cristian Bodnar, Fabrizio Frasca, Nina Otter, Yuguang Wang, Pietro Liò, Guido F Montufar, and Michael Bronstein. Weisfeiler and Lehman Go Cellular: CW Networks. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, volume 34, pp. 2625 2640. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper/2021/file/ 157792e4abb490f99dbd738483e0d2d4-Paper.pdf. Melina Paula Bordone, Mootaz M. Salman, Haley E. Titus, Elham Amini, Jens V. Andersen, Barnali Chakraborti, Artem V. Diuba, Tatsiana G. Dubouskaya, Eric Ehrke, Andiara Espindola de Freitas, Guilherme Braga de Freitas, Rafaella A. Gonçalves, Deepali Gupta, Richa Gupta, Sharon R. Ha, Isabel A. Hemming, Minal Jaggar, Emil Jakobsen, Punita Kumari, Navya Lakkappa, Ashley P. L. Marsh, Jessica Mitlöhner, Yuki Ogawa, Ramesh Kumar Paidi, Felipe C. Ribeiro, Ahmad Salamian, Suraiya Saleem, Sorabh Sharma, Joana M. Silva, Shripriya Singh, Kunjbihari Sulakhiya, Tesfaye Wolde Tefera, Behnam Vafadari, Anuradha Yadav, Reiji Yamazaki, and Constanze I. Seidenbecher. The energetic brain a review from students to students. Journal of Neurochemistry, 151(2):139 165, 2019. doi: https://doi.org/10.1111/jnc.14829. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/jnc.14829. Anton Bovier and Véronique Gayrard. Lower bounds on the memory capacity of the dilute Hopfield model. In Nino Boccara, Eric Goles, Servet Martinez, and Pierre Picco (eds.), Cellular Automata and Cooperative Systems, pp. 55 66. Springer Netherlands, Dordrecht, 1993a. ISBN 97894-011-1691-6. doi: 10.1007/978-94-011-1691-6_6. URL https://doi.org/10.1007/ 978-94-011-1691-6_6. Anton Bovier and Véronique Gayrard. Rigorous results on the thermodynamics of the dilute Hopfield model. Journal of Statistical Physics, 72(1):79 112, Jul 1993b. ISSN 1572-9613. doi: 10.1007/BF01048041. URL https://doi.org/10.1007/BF01048041. Timothy F Brady, Talia Konkle, George A Alvarez, and Aude Oliva. Visual long-term memory has a massive storage capacity for object details. Proceedings of the National Academy of Sciences, 105 (38):14325 14329, 2008. A. Brombas, S. Kalita-de Croft, E. J. Cooper-Williams, and S. R. Williams. Dendro-dendritic cholinergic excitation controls dendritic spike initiation in retinal ganglion cells. Nature Communications, 8(1):15683, Jun 2017. ISSN 2041-1723. doi: 10.1038/ncomms15683. URL https://doi.org/10.1038/ncomms15683. Jehoshua Bruck and Vwani P. Roychowdhury. On the Number of Spurious Memories in the Hopfield Model. IEEE Transactions on Information Theory, 36(2):393 397, 1990. ISSN 15579654. doi: 10.1109/18.52486. Stephen G. Brush. History of the lenz-ising model. Rev. Mod. Phys., 39:883 893, Oct 1967. doi: 10.1103/Rev Mod Phys.39.883. URL https://link.aps.org/doi/10.1103/ Rev Mod Phys.39.883. Published as a conference paper at ICLR 2023 Marc Brysbaert, Michaël Stevens, Paweł Mandera, and Emmanuel Keuleers. How many words do we know? Practical estimates of vocabulary size dependent on word definition, the degree of language input and the participant s age. Frontiers in Psychology, 7, 2016. ISSN 1664-1078. doi: 10.3389/fpsyg.2016.01116. URL https://www.frontiersin.org/articles/10. 3389/fpsyg.2016.01116. Thomas F Burns and Robert Tang. Detecting danger in gridworlds using Gromov s link condition. ar Xiv, 2022. doi: 10.48550/ARXIV.2201.06274. URL https://arxiv.org/abs/2201. 06274. Thomas F Burns, Tatsuya Haga, and Tomoki Fukai. Multiscale and extended retrieval of associative memory structures in a cortical model of local-global inhibition balance. e Neuro, 9(3), 2022. doi: 10.1523/ENEURO.0023-22.2022. URL https://www.eneuro.org/content/9/3/ ENEURO.0023-22.2022. György Buzsáki. Neural syntax: Cell assemblies, synapsembles, and readers. Neuron, 68(3):362 385, 2010. ISSN 0896-6273. doi: https://doi.org/10.1016/j.neuron.2010.09.023. URL https: //www.sciencedirect.com/science/article/pii/S0896627310007658. Rishidev Chaudhuri and Ila Fiete. Computational principles of memory. Nature Neuroscience, 19(3): 394 403, March 2016. Rishidev Chaudhuri and Ila Fiete. Bipartite expander Hopfield networks as self-decoding highcapacity error correcting codes. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/ file/97008ea27052082be055447be9e85612-Paper.pdf. Spyridon Chavlis and Panayiota Poirazi. Drawing inspiration from biological dendrites to empower artificial neural networks. Current Opinion in Neurobiology, 70:1 10, 2021. ISSN 0959-4388. doi: https://doi.org/10.1016/j.conb.2021.04.007. URL https://www.sciencedirect.com/ science/article/pii/S0959438821000544. Computational Neuroscience. Yuhan Chen, Shengjun Wang, Claus C Hilgetag, and Changsong Zhou. Trade-off between multiple constraints enables simultaneous formation of modules and hubs in neural systems. PLo S Comput. Biol., 9(3):e1002937, March 2013. Peter H Chipman, Chi Chung Alan Fung, Alejandra Pazo Fernandez, Abhilash Sawant, Angelo Tedoldi, Atsushi Kawai, Sunita Ghimire Gautam, Mizuki Kurosawa, Manabu Abe, Kenji Sakimura, Tomoki Fukai, and Yukiko Goda. Astrocyte Glu N2C NMDA receptors control basal synaptic strengths of hippocampal CA1 pyramidal neurons in the stratum radiatum. Elife, 10, October 2021. Kimberly M. Christian and Richard F. Thompson. Neural Substrates of Eyeblink Conditioning: Acquisition and Retention. Learning and Memory, 10(6):427 455, 2003. ISSN 10720502. doi: 10.1101/lm.59603. James Clift, Dmitry Doryn, Daniel Murfet, and James Wallbridge. Logic and the 2-simplicial transformer. In International Conference on Learning Representations, 2020. URL https: //openreview.net/forum?id=rkec J6VFvr. Claudia Clopath, Tobias Bonhoeffer, Mark Hübener, and Tobias Rose. Variance and invariance of neuronal long-term representations. Philos Trans R Soc Lond B Biol Sci, 372(1715), March 2017. Alexandra O. Constantinescu, Jill X. O Reilly, and Timothy E. J. Behrens. Organizing conceptual knowledge in humans with a gridlike code. Science, 352(6292):1464 1468, 2016. doi: 10.1126/science.aaf0941. URL https://www.science.org/doi/abs/10.1126/ science.aaf0941. Ana Covelo and Alfonso Araque. Neuronal activity determines distinct gliotransmitter release from a single astrocyte. e Life, 7:e32237, Jan 2018. ISSN 2050-084X. doi: 10.7554/e Life.32237. URL https://doi.org/10.7554/e Life.32237. Published as a conference paper at ICLR 2023 Christopher J. Darwin, Michael T. Turvey, and Robert G. Crowder. An auditory analogue of the Sperling partial report procedure: Evidence for brief auditory storage. Cognitive Psychology, 3(2): 255 267, 1972. ISSN 00100285. doi: 10.1016/0010-0285(72)90007-2. Nathaniel D. Daw and Kenji Doya. The computational neurobiology of learning and reward. Current Opinion in Neurobiology, 16(2):199 204, 2006. ISSN 09594388. doi: 10.1016/j.conb.2006.03.006. Mete Demircigil, Judith Heusel, Matthias Löwe, Sven Upgang, and Franck Vermet. On a model of associative memory with huge storage capacity. Journal of Statistical Physics, 168(2):288 299, Jul 2017. ISSN 1572-9613. doi: 10.1007/s10955-017-1806-y. URL https://doi.org/10. 1007/s10955-017-1806-y. Anne Didier, Alan Carleton, Jan G. Bjaalie, Jean-Didier Vincent, Ole Petter Ottersen, Jon Storm Mathisen, and Pierre-Marie Lledo. A dendrodendritic reciprocal synapse provides a recurrent excitatory connection in the olfactory bulb. Proceedings of the National Academy of Sciences, 98 (11):6441 6446, 2001. doi: 10.1073/pnas.101126398. URL https://www.pnas.org/doi/ abs/10.1073/pnas.101126398. Adi Doron, Alon Rubin, Aviya Benmelech-Chovav, Netai Benaim, Tom Carmi, Ron Refaeli, Nechama Novick, Tirzah Kreisel, Yaniv Ziv, and Inbal Goshen. Hippocampal astrocytes encode reward location. Nature, Aug 2022. ISSN 1476-4687. doi: 10.1038/s41586-022-05146-6. URL https://doi.org/10.1038/s41586-022-05146-6. Gérard Dreyfus, Isabelle Guyon, Jean-Pierre Nadal, and Léon Personnaz. High order neural networks for efficient associative memory design. In D. Anderson (ed.), Neural Information Processing Systems. American Institute of Physics, 1987. URL https://proceedings.neurips.cc/ paper/1987/file/c7e1249ffc03eb9ded908c236bd1996d-Paper.pdf. Stefania Ebli, Michaël Defferrard, and Gard Spreemann. Simplicial neural networks. In Neur IPS 2020 Workshop on Topological Data Analysis and Beyond, 2020. URL https://openreview. net/forum?id=n PCt39DVIfk. Mária Ercsey-Ravasz, Nikola T. Markov, Camille Lamy, David C. Van Essen, Kenneth Knoblauch, Zoltán Toroczkai, and Henry Kennedy. A predictive network model of cerebral cortical connectivity based on a distance rule. Neuron, 80(1):184 197, 2013. ISSN 0896-6273. doi: https://doi. org/10.1016/j.neuron.2013.07.036. URL https://www.sciencedirect.com/science/ article/pii/S0896627313006600. Sarah J Etherington, Susan E Atkinson, Greg J Stuart, and Stephen R Williams. Synaptic Integration, chapter 1, pp. 1 15. Wiley, 2010. ISBN 9780470015902. doi: https://doi.org/ 10.1002/9780470015902.a0000208.pub2. URL https://onlinelibrary.wiley.com/ doi/abs/10.1002/9780470015902.a0000208.pub2. Yifan Feng, Haoxuan You, Zizhao Zhang, Rongrong Ji, and Yue Gao. Hypergraph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp. 3558 3565, 2019. Viola Folli, Marco Leonetti, and Giancarlo Ruocco. On the maximum storage capacity of the Hopfield model. Frontiers in Computational Neuroscience, 10, 2017. ISSN 1662-5188. doi: 10.3389/fncom.2016.00144. URL https://www.frontiersin.org/articles/ 10.3389/fncom.2016.00144. Kunihiko Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4):193 202, Apr 1980. ISSN 1432-0770. doi: 10.1007/BF00344251. URL https://doi.org/10.1007/ BF00344251. Yuri Geinisman, Robert W. Berry, John F. Disterhoft, John M. Power, and Eddy A. Van der Zee. Associative learning elicits the formation of multiple-synapse boutons. Journal of Neuroscience, 21(15):5568 5573, 2001. ISSN 0270-6474. doi: 10.1523/JNEUROSCI.21-15-05568.2001. URL https://www.jneurosci.org/content/21/15/5568. Joydeep Ghosh and Yoan Shun. Efficient higher-order neural networks for classification and function approximation. International Journal of Neural Systems, 3(4):323 350, 1992. doi: 10.1142/ S0129065792000255. URL https://doi.org/10.1142/S0129065792000255. Published as a conference paper at ICLR 2023 Paul E. Gilbert and Raymond P. Kesner. Memory for objects and their locations: The role of the hippocampus in retention of object-place associations. Neurobiology of Learning and Memory, 81 (1):39 45, 2004. ISSN 10747427. doi: 10.1016/S1074-7427(03)00069-8. Nace L. Golding, Nathan P. Staff, and Nelson Spruston. Dendritic spikes as a mechanism for cooperative long-term potentiation. Nature, 418(6895):326 331, Jul 2002. ISSN 1476-4687. doi: 10.1038/nature00854. URL https://doi.org/10.1038/nature00854. A. M. Gordon, G. Westling, K. J. Cole, and R. S. Johansson. Memory representations underlying motor commands used during manipulation of common and novel objects. Journal of Neurophysiology, 69(6):1789 1797, 1993. ISSN 00223077. doi: 10.1152/jn.1993.69.6.1789. Giorgio Gosti, Viola Folli, Marco Leonetti, and Giancarlo Ruocco. Beyond the maximum storage capacity limit in Hopfield recurrent neural networks. Entropy, 21(8), 2019. ISSN 1099-4300. doi: 10.3390/e21080726. URL https://www.mdpi.com/1099-4300/21/8/726. I. S. Gradshteyn and I. M. Ryzhik. Table of integrals, series, and products. Elsevier/Academic Press, Amsterdam, seventh edition, 2007. ISBN 978-0-12-373637-6; 0-12-373637-4. Translated from the Russian. Translation edited and with a preface by Alan Jeffrey and Daniel Zwillinger. Eva-Maria Griesbauer, Ed Manley, Jan M. Wiener, and Hugo J. Spiers. London taxi drivers: A review of neurocognitive studies and an exploration of how they build their cognitive map of London. Hippocampus, 32(1):3 20, 2022. doi: https://doi.org/10.1002/hipo.23395. URL https: //onlinelibrary.wiley.com/doi/abs/10.1002/hipo.23395. M. Griniasty, M. V. Tsodyks, and Daniel J. Amit. Conversion of Temporal Correlations Between Stimuli to Spatial Correlations Between Attractors. Neural Computation, 5(1):1 17, 01 1993. ISSN 0899-7667. doi: 10.1162/neco.1993.5.1.1. URL https://doi.org/10.1162/neco. 1993.5.1.1. Vincent Gripon and Claude Berrou. Sparse neural networks with large learning diversity. IEEE Transactions on Neural Networks, 22(7):1087 1096, 2011. ISSN 10459227. doi: 10.1109/TNN. 2011.2146789. Lukas N. Groschner, Jonatan G. Malis, Birte Zuidinga, and Alexander Borst. A biophysical account of multiplication by a single neuron. Nature, 603(7899):119 123, Mar 2022. ISSN 1476-4687. doi: 10. 1038/s41586-022-04428-3. URL https://doi.org/10.1038/s41586-022-04428-3. Zengcai V. Guo, Hidehiko K. Inagaki, Kayvon Daie, Shaul Druckmann, Charles R. Gerfen, and Karel Svoboda. Maintenance of persistent activity in a frontal-thalamocortical loop. Nature, 545(7653):181 186, May 2017. ISSN 1476-4687. doi: 10.1038/nature22324. URL https: //doi.org/10.1038/nature22324. Bengt Gustafsson and Holger Wigström. Physiological mechanisms underlying long-term potentiation. Trends in Neurosciences, 11(4):156 162, 1988. ISSN 01662236. doi: 10.1016/0166-2236(88) 90142-7. Tatsuya Haga and Tomoki Fukai. Extended temporal association memory by modulations of inhibitory circuits. Phys. Rev. Lett., 123:078101, Aug 2019. doi: 10.1103/Phys Rev Lett.123.078101. URL https://link.aps.org/doi/10.1103/Phys Rev Lett.123.078101. Mustafa Hajij, Kyle Istvan, and Ghada Zamzmi. Cell complex neural networks. In Neur IPS 2020 Workshop on Topological Data Analysis and Beyond, 2020. URL https://openreview. net/forum?id=6Tq18y SFp GU. Eric Hart and Alexander C. Huk. Recurrent circuit dynamics underlie persistent activity in the macaque frontoparietal network. e Life, 9:e52460, May 2020. ISSN 2050-084X. doi: 10.7554/ e Life.52460. URL https://doi.org/10.7554/e Life.52460. 32379044[pmid]. Nathan G. Hedrick, Zhongmin Lu, Eric Bushong, Surbhi Singhi, Peter Nguyen, Yessenia Magaña, Sayyed Jilani, Byung Kook Lim, Mark Ellisman, and Takaki Komiyama. Learning binds new inputs into functional synaptic clusters via spinogenesis. Nature Neuroscience, 25(6):726 737, Jun 2022. ISSN 1546-1726. doi: 10.1038/s41593-022-01086-6. URL https://doi.org/10. 1038/s41593-022-01086-6. Published as a conference paper at ICLR 2023 Alex Henry, Prudhvi Raj Dachapally, Shubham Shantaram Pawar, and Yuxuan Chen. Query-Key normalization for Transformers. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 4246 4253, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.379. URL https://aclanthology.org/2020. findings-emnlp.379. Suzana Herculano-Houzel. The human brain in numbers: a linearly scaled-up primate brain. Frontiers in Human Neuroscience, 3, 2009. ISSN 1662-5161. doi: 10.3389/neuro.09.031.2009. URL https://www.frontiersin.org/articles/10.3389/neuro.09.031.2009. Christopher J. Hillar and Ngoc M. Tran. Robust exponential memory in Hopfield networks. The Journal of Mathematical Neuroscience, 8(1):1, Jan 2018. ISSN 2190-8567. doi: 10.1186/ s13408-017-0056-2. URL https://doi.org/10.1186/s13408-017-0056-2. Heiko Hoffmann. Sparse associative memory. Neural Computation, 31(5):998 1014, 2019. ISSN 1530888X. doi: 10.1162/neco_a_01181. Benjamin Hoover, Duen Horng Chau, Hendrik Strobelt, and Dmitry Krotov. A universal abstraction for hierarchical Hopfield networks. In The Symbiosis of Deep Learning and Differential Equations II, 2022. URL https://openreview.net/forum?id=SAv3nhz NWhw. J J Hopfieid. Searching for memories, Sudoku, implicit check bits, and the iterative use of not-alwayscorrect rapid neural computation. Neural Computation, 20(5):1119 1164, 2008. ISSN 08997667. doi: 10.1162/neco.2008.09-06-345. J J Hopfield. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, 79(8):2554 2558, 1982. doi: 10.1073/pnas.79. 8.2554. URL https://www.pnas.org/doi/abs/10.1073/pnas.79.8.2554. J J Hopfield. Neurons with graded response have collective computational properties like those of twostate neurons. Proceedings of the National Academy of Sciences, 81(10):3088 3092, 1984. doi: 10. 1073/pnas.81.10.3088. URL https://www.pnas.org/doi/abs/10.1073/pnas.81. 10.3088. Horn, D. and Usher, M. Capacities of multiconnected memory models. J. Phys. France, 49(3):389 395, 1988. doi: 10.1051/jphys:01988004903038900. URL https://doi.org/10.1051/ jphys:01988004903038900. Siddhant M. Jayakumar, Wojciech M. Czarnecki, Jacob Menick, Jonathan Schwarz, Jack Rae, Simon Osindero, Yee Whye Teh, Tim Harley, and Razvan Pascanu. Multiplicative interactions and where to find them. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=ryln K6Vt DH. R Jenkins, AJ Dowsett, and AM Burton. How many faces do people know? Proceedings of the Royal Society B, 285(1888):20181319, 2018. EG Jones and TPS Powell. Morphological variations in the dendritic spines of the neocortex. Journal of cell science, 5(2):509 529, 1969. Marcus Kaiser and Claus C Hilgetag. Nonoptimal component placement, but short processing paths, due to long-distance projections in neural systems. PLo S Comput. Biol., 2(7):e95, July 2006. I. Kanter. Asymmetric neural networks with multispin interactions. Phys. Rev. A, 38:5972 5975, Dec 1988. doi: 10.1103/Phys Rev A.38.5972. URL https://link.aps.org/doi/10.1103/ Phys Rev A.38.5972. Ayaka Kato, Kazumi Ohta, Kazuo Okanoya, and Hokto Kazama. Dopaminergic neurons dynamically update sensory values during navigation. bio Rxiv, 2022. doi: 10.1101/2022.08. 17.504092. URL https://www.biorxiv.org/content/early/2022/08/17/2022. 08.17.504092. Published as a conference paper at ICLR 2023 Ege T. Kavalali, Jürgen Klingauf, and Richard W. Tsien. Activity-dependent regulation of synaptic clustering in a hippocampal culture system. Proceedings of the National Academy of Sciences, 96 (22):12893 12900, 1999. doi: 10.1073/pnas.96.22.12893. URL https://www.pnas.org/ doi/abs/10.1073/pnas.96.22.12893. Do Hyun Kim, Jinha Park, and Byungnam Kahng. Enhanced storage capacity with errors in scale-free Hopfield neural networks: An analytical study. PLo S ONE, 12(10), 2017. ISSN 19326203. doi: 10.1371/journal.pone.0184683. Leo Kozachkov, Ksenia V. Kastanenka, and Dmitry Krotov. Building Transformers from neurons and astrocytes. bio Rxiv, 2022. doi: 10.1101/2022.10.12.511910. URL https://www.biorxiv. org/content/early/2022/10/15/2022.10.12.511910. Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, University of Toronto, Toronto, Ontario, 2009. Dmitry Krotov and John J. Hopfield. Dense associative memory for pattern recognition. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS 16, pp. 1180 1188, Red Hook, NY, USA, 2016. Curran Associates Inc. ISBN 9781510838819. Dmitry Krotov and John J. Hopfield. Large associative memory problem in neurobiology and machine learning. In International Conference on Learning Representations, 2021. URL https: //openreview.net/forum?id=X4y_10OX-h X. Anders Lansner. Associative memory models: from the cell-assembly theory to biophysically detailed cortex simulations. Trends in Neurosciences, 32(3):178 186, 2009. ISSN 01662236. doi: 10.1016/j.tins.2008.12.002. Ya Le and Xuan Yang. Tiny Image Net visual recognition challenge. CS 231N, 7(7):3, 2015. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278 2324, 1998. doi: 10.1109/5.726791. Yann Le Cun, Corinna Cortes, and Christopher Burges. MNIST handwritten digit database, 2010. URL http://yann.lecun.com/exdb/mnist/. S. J. Lederman and R. L. Klatzky. Haptic perception: A tutorial. Attention, Perception, and Psychophysics, 71(7):1439 1459, 2009. ISSN 19433921. doi: 10.3758/APP.71.7.1439. Kea Joo Lee, In Sung Park, Hyun Kim, William T. Greenough, Daniel T. S. Pak, and Im Joo Rhyu. Motor skill training induces coordinated strengthening and weakening between neighboring synapses. Journal of Neuroscience, 33(23):9794 9799, 2013. ISSN 0270-6474. doi: 10.1523/ JNEUROSCI.0848-12.2013. URL https://www.jneurosci.org/content/33/23/ 9794. Y. C. Lee, Gary D. Doolen, H. H. Chen, G. Z. Sun, T. Maxwell, H. Y. Lee, and C. Lee Giles. Machine learning using higher order correlation networks. In Proceedings of the Evolution, Games, and Learning Conference, 1986. Martijn A. R. Leisink and Hilbert J. Kappen. Learning in higher order Boltzmann machines using linear response. Neural Networks, 13:2000, 1999. Francesca Elisa Leonelli, Elena Agliari, Linda Albanese, and Adriano Barra. On the effective initialisation for restricted Boltzmann machines via duality with Hopfield model. Neural Networks, 143: 314 326, 2021. ISSN 0893-6080. doi: https://doi.org/10.1016/j.neunet.2021.06.017. URL https: //www.sciencedirect.com/science/article/pii/S0893608021002495. M. Leonetti, E. Hörmann, L. Leuzzi, G. Parisi, and G. Ruocco. Optical computation of a spin glass dynamics with tunable complexity. Proceedings of the National Academy of Sciences, 118(21): e2015207118, 2021. doi: 10.1073/pnas.2015207118. URL https://www.pnas.org/doi/ abs/10.1073/pnas.2015207118. Lek-Heng Lim. Hodge Laplacians on graphs. SIAM Review, 62(3):685 715, 2020. doi: 10.1137/ 18M1223101. URL https://doi.org/10.1137/18M1223101. Published as a conference paper at ICLR 2023 W.A. Little. The existence of persistent states in the brain. Mathematical Biosciences, 19(1):101 120, 1974. ISSN 0025-5564. doi: https://doi.org/10.1016/0025-5564(74)90031-5. URL https: //www.sciencedirect.com/science/article/pii/0025556474900315. Matthias Löwe and Franck Vermet. The Hopfield model on a sparse Erdös-Renyi graph. Journal of Statistical Physics, 143(1):205 214, Apr 2011. ISSN 1572-9613. doi: 10.1007/s10955-011-0167-1. URL https://doi.org/10.1007/s10955-011-0167-1. Matthias Löwe. On the storage capacity of Hopfield models with correlated patterns. The Annals of Applied Probability, 8(4):1216 1250, 1998. doi: 10.1214/aoap/1028903378. URL https: //doi.org/10.1214/aoap/1028903378. N. J. Mackintosh. Conditioning and associative learning. Oxford University Press, Oxford, 1983. Stephen Maren. Neurobiology of Pavlovian Fear Conditioning. Annual Review of Neuroscience, 24 (1):897 931, 2001. ISSN 0147-006X. doi: 10.1146/annurev.neuro.24.1.897. D Marr. Simple memory: a theory for archicortex. Philos Trans R Soc Lond B Biol Sci, 262(841): 23 81, July 1971. Chiara Marullo and Elena Agliari. Boltzmann machines as generalized Hopfield networks: A review of recent results and outlooks. Entropy, 23(1), 2021. ISSN 1099-4300. doi: 10.3390/e23010034. URL https://www.mdpi.com/1099-4300/23/1/34. William Mau, Michael E Hasselmo, and Denise J Cai. The brain in motion: How ensemble fluidity drives memory-updating and flexibility. e Life, 9:e63550, Dec 2020. ISSN 2050-084X. doi: 10.7554/e Life.63550. URL https://doi.org/10.7554/e Life.63550. R. Mc Eliece, E. Posner, E. Rodemich, and S. Venkatesh. The capacity of the Hopfield associative memory. IEEE Transactions on Information Theory, 33(4):461 482, 1987. doi: 10.1109/TIT.1987. 1057328. Frances K. Mc Sweeney and Eric S. Murphy (eds.). The Wiley Blackwell Handbook of Operant and Classical Conditioning. John Wiley & Sons, Ltd, Oxford, UK, may 2014. ISBN 9781118468135. doi: 10.1002/9781118468135. URL http://doi.wiley.com/10.1002/ 9781118468135. Roland Memisevic and Geoffrey E. Hinton. Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines. Neural Computation, 22(6):1473 1492, 06 2010. ISSN 0899-7667. doi: 10.1162/neco.2010.01-09-953. URL https://doi.org/10.1162/ neco.2010.01-09-953. Juliette Millet, Charlotte Caucheteux, Pierre Orhan, Yves Boubenec, Alexandre Gramfort, Ewan Dunbar, Christophe Pallier, and Jean-Remi King. Toward a realistic model of speech processing in the brain with self-supervised learning, 2022. URL https://arxiv.org/abs/2206. 01685. Beren Millidge, Tommaso Salvatori, Yuhang Song, Thomas Lukasiewicz, and Rafal Bogacz. Universal Hopfield networks: A general framework for single-shot associative memory models, 2022. URL https://arxiv.org/abs/2202.04557. B. Milner. Amnesia following operation on the temporal lobes. In C. Whitty and O. Zangwill (eds.), Amnesia, pp. 109 133. Butterworth, London, 1966. G.A. Milner. The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information. Psychological Review, 101(2):343 352, 1955. URL http://spider. apa.org/ftdocs/rev/1994/april/rev1012343.html. Rogier Min, Mirko Santello, and Thomas Nevian. The computational power of astrocyte mediated synaptic plasticity. Frontiers in Computational Neuroscience, 6, 2012. ISSN 1662-5188. doi: 10.3389/fncom.2012.00093. URL https://www.frontiersin.org/articles/ 10.3389/fncom.2012.00093. Published as a conference paper at ICLR 2023 Ali A. Minai and William B. Levy. The dynamics of sparse random networks. Biological Cybernetics, 70(2):177 187, 1993. ISSN 03401200. doi: 10.1007/BF00200831. A. A. Mofrad and M.G Parker. Nested-clique network model of neural associative memory. Neural Computation, 29:1681 1695, 2017. Gianluigi Mongillo, Omri Barak, and Misha Tsodyks. Synaptic Theory of Working Memory. Science, 319(5869):1543 1546, 2008. ISSN 10959203. doi: 10.1126/science.1150769. Michael G Müller, Christos H Papadimitriou, Wolfgang Maass, and Robert Legenstein. A model for structured information representation in neural networks of the brain. e Neuro, 7(3):ENEURO.0533 19.2020, May 2020. Kaoru Nakano. Associatron-a model of associative memory. IEEE Transactions on Systems, Man, and Cybernetics, SMC-2(3):380 388, 1972. doi: 10.1109/TSMC.1972.4309133. Charles M Newman. Memory capacity in neural network models: Rigorous lower bounds. Neural Networks, 1(3):223 238, 1988. ISSN 0893-6080. doi: https://doi.org/ 10.1016/0893-6080(88)90028-7. URL https://www.sciencedirect.com/science/ article/pii/0893608088900287. Christos H. Papadimitriou, Santosh S. Vempala, Daniel Mitropolsky, Michael Collins, and Wolfgang Maass. Brain computation by assemblies of neurons. Proceedings of the National Academy of Sciences, 117(25):14464 14472, 2020. doi: 10.1073/pnas.2001893117. URL https://www. pnas.org/doi/abs/10.1073/pnas.2001893117. Seongmin A. Park, Douglas S. Miller, and Erie D. Boorman. Inferences on a multidimensional social hierarchy use a grid-like code. Nature Neuroscience, 24(9):1292 1301, Sep 2021. ISSN 1546-1726. doi: 10.1038/s41593-021-00916-3. URL https://doi.org/10.1038/ s41593-021-00916-3. Gertrudis Perea, Marta Navarrete, and Alfonso Araque. Tripartite synapses: astrocytes process and control synaptic information. Trends in Neurosciences, 32(8):421 431, 2009. P Peretto and J J Niez. Long term memory storage capacity of multiconnected neural networks. Biol. Cybern., 54(1):53 64, may 1986. ISSN 0340-1200. doi: 10.1007/BF00337115. URL https://doi.org/10.1007/BF00337115. Giovanni Petri and Alain Barrat. Simplicial activity driven model. Phys. Rev. Lett., 121:228301, Nov 2018. doi: 10.1103/Phys Rev Lett.121.228301. URL https://link.aps.org/doi/ 10.1103/Phys Rev Lett.121.228301. Brad E. Pfeiffer and David J. Foster. Autoassociative dynamics in the generation of sequences of hippocampal place cells. Science, 349(6244):180 183, 2015. ISSN 10959203. doi: 10.1126/ science.aaa9633. Didier Pinault, Yoland Smith, and Martin Deschênes. Dendrodendritic and axoaxonic synapses in the thalamic reticular nucleus of the adult rat. Journal of Neuroscience, 17(9):3215 3233, 1997. ISSN 0270-6474. doi: 10.1523/JNEUROSCI.17-09-03215.1997. URL https://www.jneurosci. org/content/17/9/3215. Fernando Pineda. Generalization of back propagation to recurrent and higher order neural networks. In D. Anderson (ed.), Neural Information Processing Systems. American Institute of Physics, 1987. URL https://proceedings.neurips.cc/paper/1987/file/ 735b90b4568125ed6c3f678819b6e058-Paper.pdf. Panayiota Poirazi and Athanasia Papoutsi. Illuminating dendritic function with computational models. Nature Reviews Neuroscience, 21(6):303 321, Jun 2020. ISSN 1471-0048. doi: 10.1038/ s41583-020-0301-7. URL https://doi.org/10.1038/s41583-020-0301-7. Panayiota Poirazi, Terrence Brannon, and Bartlett W. Mel. Pyramidal neuron as two-layer neural network. Neuron, 37(6):989 999, 2003. ISSN 0896-6273. doi: https://doi.org/10. 1016/S0896-6273(03)00149-1. URL https://www.sciencedirect.com/science/ article/pii/S0896627303001491. Published as a conference paper at ICLR 2023 Alon Poleg-Polsky and Jeffrey S Diamond. NMDA receptors multiplicatively scale visual signals and enhance directional motion discrimination in retinal ganglion cells. Neuron, 89(6):1277 1290, 2016. Melissa A. Preziosi and Jennifer H. Coane. Remembering that big things sound big: Sound symbolism and associative memory. Cognitive Research: Principles and Implications, 2(1), 2017. ISSN 2365-7464. doi: 10.1186/s41235-016-0047-y. Vinu Varghese Pulikkottil, Bhanu Priya Somashekar, and Upinder S. Bhalla. Computation, wiring, and plasticity in synaptic clusters. Current Opinion in Neurobiology, 70:101 112, 2021. ISSN 0959-4388. doi: https://doi.org/10.1016/j.conb.2021.08.001. URL https:// www.sciencedirect.com/science/article/pii/S0959438821000921. Computational Neuroscience. Hubert Ramsauer, Bernhard Schäfl, Johannes Lehner, Philipp Seidl, Michael Widrich, Lukas Gruber, Markus Holzleitner, Thomas Adler, David Kreil, Michael K Kopp, Günter Klambauer, Johannes Brandstetter, and Sepp Hochreiter. Hopfield networks is all you need. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id= t L89Rnz Ii Cd. Nelson Rebola, Mario Carta, and Christophe Mulle. Operation and plasticity of hippocampal CA3 circuits: Implications for memory encoding. Nature Reviews Neuroscience, 18(4):209 221, 2017. ISSN 14710048. doi: 10.1038/nrn.2017.10. Max B. Reid, Lilly Spirkovska, and Ellen Ochoa. Rapid training of higher-order neural networks for invariant pattern recognition. In In Proc. International Joint Conference on Neural Nets, pp. 689 692, 1989. R.A. Rescorla and A.R. Wagner. A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A.H. Black and W.F. Prokasy (eds.), Classical Conditioning II: Current Research and Theory, pp. 64 99. Appleton Century Crofts, New York, 1972. Iris Reuveni, Sourav Ghosh, and Edi Barkai. Real time multiplicative memory amplification mediated by whole-cell scaling of synaptic response in key neurons. PLOS Computational Biology, 13 (1):1 31, 01 2017. doi: 10.1371/journal.pcbi.1005306. URL https://doi.org/10.1371/ journal.pcbi.1005306. Mark Rigby, Federico Grillo, Benjamin Compans, Guilherme Neves, Julia Gallinaro, Sophie Nashashibi, Gema Vizcay-Barrena, Florian Levet, Jean-Baptiste Sibarita, Angus Kirkland, Roland A. Fleck, Claudia Clopath, and Juan Burrone. Multi-synaptic boutons are a feature of ca1 hippocampal connections that may underlie network synchrony. bio Rxiv, 2022. doi: 10.1101/2022.05.30.493836. URL https://www.biorxiv.org/content/early/ 2022/05/30/2022.05.30.493836. Daniel S. Rizzuto and Michael J. Kahana. An autoassociative neural network model of pairedassociate learning. Neural Computation, 13(9):2075 2092, 2001. ISSN 08997667. doi: 10.1162/ 089976601750399317. Jacopo Rocchi, David Saad, and Daniele Tantari. High storage capacity in the Hopfield model with auto-interactions stability analysis. Journal of Physics A: Mathematical and Theoretical, 50(46): 465001, 2017. T. Mitchell Roddenberry, Nicholas Glaze, and Santiago Segarra. Principled simplicial neural networks for trajectory prediction. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 9020 9029. PMLR, 18 24 Jul 2021. URL https://proceedings.mlr. press/v139/roddenberry21a.html. Fernando E. Rosas, Pedro A. M. Mediano, Andrea I. Luppi, Thomas F. Varley, Joseph T. Lizier, Sebastiano Stramaglia, Henrik J. Jensen, and Daniele Marinazzo. Disentangling high-order mechanisms and high-order behaviours in complex systems. Nature Physics, Mar 2022. ISSN 1745-2481. doi: 10.1038/s41567-022-01548-5. URL https://doi.org/10.1038/ s41567-022-01548-5. Published as a conference paper at ICLR 2023 Dmitri A. Rusakov and Dimitri M. Kullmann. Extrasynaptic glutamate diffusion in the hippocampus: Ultrastructural constraints, uptake, and receptor activation. Journal of Neuroscience, 18(9): 3158 3170, 1998. ISSN 0270-6474. doi: 10.1523/JNEUROSCI.18-09-03158.1998. URL https://www.jneurosci.org/content/18/9/3158. Andrea Santoro, Federico Battiston, Giovanni Petri, and Enrico Amico. Unveiling the higherorder organization of multivariate time series, 2022. URL https://arxiv.org/abs/2203. 10702. Saratha Sathasivam and Wan Ahmad Tajuddin Wan Abdullah. Logic Learning in Hopfield Networks. Modern Applied Science, 2(3), 2008. ISSN 1913-1844. doi: 10.5539/mas.v2n3p57. Michael T Schaub, Austin R Benson, Paul Horn, Gabor Lippner, and Ali Jadbabaie. Random walks on simplicial complexes and the normalized Hodge 1-Laplacian. SIAM Review, 62(2):353 391, 2020. Elad Schneidman, Michael J. Berry, Ronen Segev, and William Bialek. Weak pairwise correlations imply strongly correlated network states in a neural population. Nature, 440(7087):1007 1012, Apr 2006. ISSN 1476-4687. doi: 10.1038/nature04701. URL https://doi.org/10.1038/ nature04701. W. B. Scoville and B. Milner. Loss of recent memory after bilateral hippocampal lesions. Journal of Neurolgy, Neurosurgery, and Psychiatry, 20(1):11 21, 1957. Philipp Seidl, Philipp Renz, Natalia Dyubankova, Paulo Neves, Jonas Verhoeven, Jörg K. Wegner, Marwin Segler, Sepp Hochreiter, and Günter Klambauer. Improving fewand zero-shot reaction template prediction using modern Hopfield networks. Journal of Chemical Information and Modeling, 62(9):2111 2120, 2022. doi: 10.1021/acs.jcim.1c01065. URL https://doi.org/ 10.1021/acs.jcim.1c01065. PMID: 35034452. Terrence J. Sejnowski. Higher-order Boltzmann machines. AIP Conference Proceedings, 151(1): 398 403, 1986. doi: 10.1063/1.36246. URL https://aip.scitation.org/doi/abs/ 10.1063/1.36246. Sugandha Sharma, Sarthak Chandra, and Ila Fiete. Content addressable memory without catastrophic forgetting by heteroassociation with a fixed scaffold. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp. 19658 19682. PMLR, 17 23 Jul 2022. URL https://proceedings.mlr. press/v162/sharma22b.html. Masatoshi Shiino and Tomoki Fukai. Self-consistent signal-to-noise analysis of the statistical behavior of analog neural networks and enhancement of the storage capacity. Phys. Rev. E, 48:867 897, Aug 1993. doi: 10.1103/Phys Rev E.48.867. URL https://link.aps.org/doi/10.1103/ Phys Rev E.48.867. Matthew Smart and Anton Zilman. On the mapping between Hopfield networks and restricted Boltzmann machines. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=RGJberg VIo O. KE Sorra and KM Harris. Occurrence and three-dimensional structure of multiple synapses between individual radiatum axons and their target pyramidal cells in hippocampal area ca1. Journal of Neuroscience, 13(9):3736 3748, 1993. ISSN 0270-6474. doi: 10.1523/JNEUROSCI.13-09-03736. 1993. URL https://www.jneurosci.org/content/13/9/3736. George Sperling. A Model for Visual Memory Tasks. Human Factors: The Journal of Human Factors and Ergonomics Society, 5(1):19 31, 1963. ISSN 15478181. doi: 10.1177/001872086300500103. Reisa Sperling, Elizabeth Chua, Andrew Cocchiarella, Erin Rand-Giovannetti, Russell Poldrack, Daniel L. Schacter, and Marilyn Albert. Putting names to faces: Successful encoding of associative memories activates the anterior hippocampal formation. Neuro Image, 20(2):1400 1410, 2003. ISSN 10538119. doi: 10.1016/S1053-8119(03)00391-4. Published as a conference paper at ICLR 2023 David I Spivak. Higher-dimensional models of networks. ar Xiv:0909.4314, 2009. Larry R. Squire. Memory and the Hippocampuss: A Synthesis From Findings With Rats, Monkeys, and Humans. Psychological Review, 99(2):195 231, 1992. ISSN 0033295X. doi: 10.1037/ 0033-295X.99.2.195. Larry R. Squire, Arthur P. Shimamura, and David G. Amaral. Memory and the Hippocampus. Neural Models of Plasticity, pp. 208 239, 1989. doi: 10.1016/b978-0-12-148956-4.50016-4. Vivek H. Sridhar, Liang Li, Dan Gorbonos, Máté Nagy, Bianca R. Schell, Timothy Sorochkin, Nir S. Gov, and Iain D. Couzin. The geometry of decision-making in individuals and collectives. Proceedings of the National Academy of Sciences, 118(50):e2102157118, 2021. doi: 10.1073/pnas.2102157118. URL https://www.pnas.org/doi/abs/10.1073/pnas. 2102157118. Lionel Standing. Learning 10000 pictures. The Quarterly journal of experimental psychology, 25(2): 207 222, 1973. J. C. Stanley. Simulation studies of a temporal sequence memory model. Biological Cybernetics, 24(3):121 137, Sep 1976. ISSN 1432-0770. doi: 10.1007/BF00364115. URL https://doi. org/10.1007/BF00364115. Amos Storkey. Increasing the capacity of a Hopfield network without sacrificing functionality. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 1327:451 456, 1997. ISSN 16113349. doi: 10.1007/bfb0020196. R. J. Sutherland and J. W. Rudy. Configural association theory: The role of the hippocampal formation in learning, memory, and amnesia. Psychobiology, 17(2):129 144, 1989. ISSN 08896313. doi: 10.3758/BF03337828. A Treves and D J Amit. Metastable states in asymmetrically diluted Hopfield networks. Journal of Physics A: Mathematical and General, 21(14):3155 3169, jul 1988. doi: 10.1088/0305-4470/21/ 14/016. URL https://doi.org/10.1088/0305-4470/21/14/016. Danil Tyulmankov, Ching Fang, Annapurna Vadaparty, and Guangyu Robert Yang. Biological learning in key-value memory networks. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, volume 34, pp. 22247 22258. Curran Associates, Inc., 2021. URL https://proceedings.neurips. cc/paper/2021/file/bacadc62d6e67d7897cef027fa2d416c-Paper.pdf. Michael T. Ullman. Contributions of memory circuits to language: The declarative/procedural model. Cognition, 92(1-2):231 270, 2004. ISSN 00100277. doi: 10.1016/j.cognition.2003.10.008. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper/2017/file/ 3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf. Edward K. Vogel, Geoffrey F. Woodman, and Steven J. Luck. Storage of features, conjunctions, and objects in visual working memory. Journal of Experimental Psychology: Human Perception and Performance, 27(1):92 114, 2001. ISSN 00961523. doi: 10.1037/0096-1523.27.1.92. Richard A. Watson, C. L. Buckley, and Rob Mills. Optimization in "self-modeling" complex adaptive systems. Complexity, 16(5):17 26, 2011a. ISSN 10990526. doi: 10.1002/cplx.20346. Richard A. Watson, Rob Mills, and C. L. Buckley. Global adaptation in networks of selfish components: Emergent associative memory at the system scale. Artificial Life, 17(3):147 166, 2011b. ISSN 10645462. doi: 10.1162/artl_a_00029. Melanie Weber, Pedro D. Maia, and J. Nathan Kutz. Estimating memory deterioration rates following neurodegeneration and traumatic brain injuries in a Hopfield network model. Frontiers in Neuroscience, 11(NOV), 2017. ISSN 1662453X. doi: 10.3389/fnins.2017.00623. Published as a conference paper at ICLR 2023 Michael Widrich, Bernhard Schäfl, Milena Pavlovi c, Hubert Ramsauer, Lukas Gruber, Markus Holzleitner, Johannes Brandstetter, Geir Kjetil Sandve, Victor Greiff, Sepp Hochreiter, et al. Modern Hopfield networks and attention for immune repertoire classification. Advances in Neural Information Processing Systems, 33:18832 18845, 2020. Tom J. Wills, Colin Lever, Francesca Cacucci, Neil Burgess, and John O Keefe. Attractor dynamics in the hippocampal representation of the local environment. Science, 308(5723):873 876, 2005. ISSN 00368075. doi: 10.1126/science.1108905. István Winkler and Nelson Cowan. From sensory to long-term memory: Evidence from auditory memory reactivation studies. Experimental Psychology, 52(1):3 20, 2005. ISSN 16183169. doi: 10.1027/1618-3169.52.1.3. Alexander Woodward, Tom Froese, and Takashi Ikegami. Neural coordination can be enhanced by occasional interruption of normal firing patterns: A self-optimizing spiking neural network model. Neural Networks, 62:39 46, 2015. ISSN 18792782. doi: 10.1016/j.neunet.2014.08.011. Chenxin Xu, Maosen Li, Zhenyang Ni, Ya Zhang, and Siheng Chen. Groupnet: Multiscale hypergraph neural networks for trajectory prediction with relational reasoning, 2022. URL https://arxiv. org/abs/2204.08770. Ming Zhang. Artificial higher order neural networks for modeling and simulation. IGI Global, 2012. ISBN 9781466621756. doi: 10.4018/978-1-4666-2175-6. Published as a conference paper at ICLR 2023 A.1 PURPOSE OF STUDYING AND EXTENDING HOPFIELD NETWORKS In the context of memory more broadly, associative memory is a form of long-term memory, memory which can potentially last a lifetime. However, memory also exists on at least two shorter and functionally distinct time-scales: short-term memory and sensory memory. Short-term or working memory stores information for on the order of tens of seconds, and performs as a limited, passive temporary memory reservoir with which to manipulate and use information (Milner, 1955; Mongillo et al., 2008). Sensory memory operates on even shorter timescales than short-term memory, typically on the order of only seconds. It is where visual (Sperling, 1963; Vogel et al., 2001), auditory (Darwin et al., 1972; Winkler & Cowan, 2005), and other sensory information (Gordon et al., 1993; Lederman & Klatzky, 2009) is first actively remembered all other information is either immediate sensory information or recalled information. A general schema for forming a long-term memory is therefore to: (1) receive external sensory information; (2) store the features of that sensory information in sensory memory; (3) manage and store one or more features of the sensory information (and/or combine it with other sensory information) in short-term memory; and then (4) consolidate this information into long-term memory. Understanding these processes from a theoretical and computational perspective has several real-world implications, including: (i) it may allow us to create more intelligent machines, by gaining inspiration and insight from biological strategies to store, retrieve, and use long-term associative memories; and (ii) it may help us understand the neurobiological mechanisms (and their implications) for biological memory systems, helping to not only understand related psychological and biological phenomena, but to potentially help identify therapeutic targets for related dysfunction. Psychological abilities attributed to associative memory in humans and non-human animals are typically said to be any form of long-term memorisation which involves pairing or associating distinct stimuli such that when presented with one stimuli, the subject can recall the other stimuli. Classical examples of this type of associative memory include pairings: name-face pairs (Sperling et al., 2003), object-sound pairs (Preziosi & Coane, 2017), and object-place pairs (Gilbert & Kesner, 2004). These types of associative memories are part of explicit or declarative memory (Ullman, 2004), i.e., long-term memory that can be explicitly or voluntarily stated or declared. This is in contrast to associative memories which are part of implicit or non-declarative memory, i.e., long-term memory recalled or used unconsciously or unintentionally. Examples of this type of associative memory are generated by classical conditioning (Maren, 2001; Christian & Thompson, 2003) and operant conditioning (Mackintosh, 1983; Mc Sweeney & Murphy, 2014). Computational accounts of implicit associative memory have a long and successful history starting from examples like the Rescorla Wagner model (Rescorla & Wagner, 1972). Today, this research has grown into the computational field of reinforcement learning (Daw & Doya, 2006). In comparison to implicit associative memory, explicit associative memory seems more computationally sophisticated, as suggested by its complex biological bases (Chaudhuri & Fiete, 2016; Clopath et al., 2017; Mau et al., 2020). Soon after the proposal of Marr (1971), one of the first and most influential computational models of (explicit) associative memory was the Hopfield model (Hopfield, 1982), which we study and extend here. A nice feature of this model is that in the basic case of embedding a single memory, it is easy to see there exists a choice of threshold for every neuron whereby any partial activation of the memory will lead to activation of all its other members. This therefore generates an attractor dynamic around the fixed point of the stored memory, which is comparable to the neuronal assembly attractor dynamics seen in the hippocampus (Wills et al., 2005; Pfeiffer & Foster, 2015; Rebola et al., 2017). Hippocampus is probably the foremost brain structure involved in memory. It, together with the surrounding entorhinal, perirhinal, and parahippocampal cortices, is especially important for explicit memory (Scoville & Milner, 1957; Milner, 1966; Squire, 1992). It is a necessary structure for the initial formation and learning of explicit memory, acting as a short-term memory for later long-term consolidation, thought to occur in cortex (Squire et al., 1989; Sutherland & Rudy, 1989). Classically, we think these capacities are mainly achieved via Hebbian learning and long-term potentiation (Bliss & Gardner-Medwin, 1973; Gustafsson & Wigström, 1988). However, there is now an increasing literature which shows how other mechanisms may help to achieve these memory functions (see Appendix A.2 for examples). Published as a conference paper at ICLR 2023 As a complete computational account of long-term memory storage, the classical Hopfield model encounters challenges. As discussed in Section 1 of the main text, the memory capacity of the classical Hopfield network is linear in the number of neurons N, specifically: approximately 0.14N patterns may be stored before spurious attractors overwhelm workable levels of memory recall (Amit et al., 1985; Mc Eliece et al., 1987; Bruck & Roychowdhury, 1990), and this capacity diminishes further when the patterns are statistically or spatially correlated (Löwe, 1998), in sparse connectivity regimes (Treves & Amit, 1988; Löwe & Vermet, 2011), and in combination (Burns et al., 2022). Biological networks typically have very sparse connectivity (Minai & Levy, 1993; Lansner, 2009; Barth & Poulet, 2012) and everyday memory items typical share many statistical features, may have sophisticated inter-relations, and are spatially or semantically correlated in non-trivial structures (Constantinescu et al., 2016; Aronov et al., 2017; Bellmund et al., 2018; Bao et al., 2019; Park et al., 2021; Griesbauer et al., 2022). Despite this, humans can remember very high-fidelity information of thousands of statistically similar images (Standing, 1973; Brady et al., 2008) and human faces (Jenkins et al., 2018), tens of thousands of linguistic items (Brysbaert et al., 2016), and more than 100,000 digits of the number π (Bellos, 2015) all, seemingly, without dramatically sacrificing or over-writing other memories. Although modern Hopfield networks have substantially increased theoretical memory capacity (Krotov & Hopfield, 2016; Demircigil et al., 2017), the combined biological and psychological evidence mentioned above, along with the finite (if large) number of brain cells (Herculano-Houzel, 2009) and energetic demands of maintaining them and their inter-connections (Bordone et al., 2019), suggest there may be more to the neurophysiological and computational story. Furthermore, even if we find that the theoretical memory capacity should still be high according to a Hopfield interpretation, capacity can be considered as a measure of undesired interferences between memories, and thus may be maximised for cognitive convenience or speed and accuracy of memory recall. Nevertheless, problems and criticisms don t detract from the usefulness and importance of the Hopfield model or its modern variations in the study of memory systems (Sathasivam & Wan Abdullah, 2008; Rizzuto & Kahana, 2001; Weber et al., 2017), usefulness in machine learning applications (Widrich et al., 2020; Seidl et al., 2022), contribution to more general machine learning (Sharma et al., 2022; Hoover et al., 2022), or connection between biology and machine learning (Chaudhuri & Fiete, 2019; Tyulmankov et al., 2021; Kozachkov et al., 2022). There are even more opportunities to build upon this substantial foundation to create more sophisticated computational models of associative memory. Notably, much work has improved the efficiency and capacity of the Hopfield network (Storkey, 1997; Hopfieid, 2008; Krotov & Hopfield, 2016; Gripon & Berrou, 2011; Mofrad & Parker, 2017). Other work has focused on achieving sparse representations (Kim et al., 2017; Hoffmann, 2019) or including other forms of biological realism (Watson et al., 2011a;b; Woodward et al., 2015; Burns et al., 2022). The current work contributes to developments in sparsity, biological realism, and memory capacity. A.2 SETWISE CONNECTIONS AND MODULATIONS ARE BOUNTIFUL IN BIOLOGY Setwise connections are not limited to the case, as one might expect, of multiple synaptic contacts between pairs or sets of cells (Jones & Powell, 1969; Sorra & Harris, 1993; Geinisman et al., 2001; Lee et al., 2013; Rigby et al., 2022), which may result in multiplicative interactions (Poleg-Polsky & Diamond, 2016; Reuveni et al., 2017; Groschner et al., 2022) or form of functional synaptic clusters (Kavalali et al., 1999; Bloss et al., 2018; Pulikkottil et al., 2021; Hedrick et al., 2022). Other examples (some of which are illustrated in Figure 3) include the wide spatial dispersion of certain neurotransmitters (Rusakov & Kullmann, 1998; Arbuthnott & Wickens, 2007; Kato et al., 2022), dendro-dendritic synapses (Pinault et al., 1997; Didier et al., 2001; Brombas et al., 2017), persistentlyconnected neuronal assembly structures or synapsembles (Buzsáki, 2010; Papadimitriou et al., 2020), distributed persistent activity during activities such as motor planning and working memory (Guo et al., 2017; Hart & Huk, 2020), neuroglial modulations of neurotransmitter release probabilities across multiple neurons or synapses (Min et al., 2012; Covelo & Araque, 2018; Chipman et al., 2021), tripartite astrocyte neuron synapses (Araque et al., 1999; Perea et al., 2009), astrocytic coding (Doron et al., 2022), and during dendritic integration at the level of individual neurons (Golding et al., 2002; Etherington et al., 2010). Modulations of and interactions between such connections are illustrated in Figure 4. Published as a conference paper at ICLR 2023 It is also possible to functionally construct setwise connections through only pairwise synapses, as shown in Krotov & Hopfield (2021). In one sense, this kind of pairwise-based reconstruction of setwise connections could also be thought of as similar to results from Poirazi et al. (2003), who showed how multi-layer artificial neural networks can approximate more biologically-sophisticated model neurons with dendrites. Indeed, many of the known and suggested computational features of dendritic integration (Poirazi & Papoutsi, 2020; Chavlis & Poirazi, 2021) may be considered as highly specialised and sophisticated forms of convolutions or setwise interactions. {i,j} {i,k} D {x , x , y } i j k {x ,y } i...n i...n x X y Y Figure 3: A. Multi-synaptic bouton. B. Dendritic integration. C. Extra-synaptic neurotransmitter diffusion. D. Functional connections between neural assemblies. {i,j} {i,k} {k,l,m,n} m n Figure 4: A. Neurotransmitter depletion at a synapse. B. NMDA-spikes and unequal synaptic strengths in dendritic integration. C. Transmission speed plasticity using myelination (top) and axon diameter (bottom) to affect temporal /functional setwise influence in a post-synaptic cell. D. Astrocytic messaging, between both neurons and astrocytes. Published as a conference paper at ICLR 2023 A.3 OPTIONS FOR MODELLING SETWISE CONNECTIVITY IN NEURAL NETWORKS AND WHY WE CHOOSE SIMPLICIAL COMPLEXES In geometric and topological artificial intelligence and machine learning, recent advances have been realised by utilising higher dimensional analogues of graphs such as simplicial complexes (Ebli et al., 2020; Roddenberry et al., 2021), cube complexes (Burns & Tang, 2022), cell complexes (Hajij et al., 2020; Bodnar et al., 2021), and hypergraphs (Feng et al., 2019; Xu et al., 2022). Unlike graphs, these structures can naturally represent higher-degree, setwise relationships. However, not all structures are appropriate for all systems (Spivak, 2009; Rosas et al., 2022). Why do we choose to model our collections of setwise connections as weighted simplicial complexes and not use general cell complexes or hypergraphs? There are three main reasons: 1. Simplicial complexes allow all possible setwise relations to exist. Simplices, by construction, may span any number of vertices. This means any possible combination of neurons may share a common, setwise weight. This is also possible in hypergraphs, but not possible in all complexes, e.g., cube complexes may not include triangles. The natural complex for modelling all possible setwise relationships is therefore a simplicial one. Since we also wish for our setwise weights to be symmetrical (i.e., have the same value when updating the spin with respect to each constituent neuron), it is unnecessary to include any more than one unique object per setwise relationship. This also makes the choice of a simplex suitable, since there can only be one unique simplex for a given set of vertices which is also the case for edges in undirected hypergraphs (but we find these are less suitable). 2. The sub-edge problem of hypergraphs makes them less suitable. Hypergraphs are graphs where edges may contain any number of unique vertices from the vertex set. In a sense, these are a more general structure than simplicial complexes and lack downward closure, e.g., if the edge {1, 2, 3} exists, edges such as {2, 3} or {1} do not necessarily exist, whereas if {1, 2, 3} was a simplex in a simplicial complex, simplices {2, 3} and {1} exist. However, hypergraphs do not have well-defined sub-edges (described as the sub-edge problem in Remark 3.5 of Spivak (2009)). This has the consequence of defining interactions between levels of hyperedges (setwise relationships) in hypergraphs slightly awkward. In contrast, simplicial complexes have a well-defined hierarchy of setwise relationships, partly due to the downward closure condition. 3. Downward closure of setwise connections is biologically plausible. Another benefit of downward closure in simplicial complexes is that it currently seems better supported from the perspective of biological plausibility (also see Appendix A.2). For example, although it can happen that a setwise connection (anatomical or functional) between neurons could exist without any underlying pairwise connections, the typical machinery used to create such setwise connections is sufficiently local to assume that, because of the functional local modularity of connections in the brain (Kaiser & Hilgetag, 2006; Chen et al., 2013; Müller et al., 2020; Ercsey-Ravasz et al., 2013), there is a high probability of these neurons having a pairwise connection simply due to proximity. As an additional practical benefit, simplicial complexes are currently more well-studied than structures such as hypergraphs (at least in some areas, e.g., spectral theories or (co)homology, which are of natural interest here), meaning that we can also take advantage of the relative maturity of the field in those areas admittedly, we use very few advanced methods or properties in an essential way in this study, although we hope to do so in future studies, having now introduced an initial interpretation of simplicial Hopfield networks and begun exploring some of their potential benefits. However, it will also be interesting to see what differences can be found between hypergraphic and simplicial Hopfield networks, and perhaps which provides a closer approximation to biology or which shows improved performance on certain tasks. Among other possibilities, the weight of a simplex w(σ) could be modulated by the local energy of its spins Sσ, its coface s spins, or of those of simplices in the same dimension as σ which are neighbouring (in the Hodge Laplacian sense (Lim, 2020; Schaub et al., 2020)). These interactions could take many different mathematical forms (Petri & Barrat, 2018; Ebli et al., 2020; Roddenberry et al., 2021; Rosas et al., 2022; Santoro et al., 2022). Neurobiologically, these interactions could represent neural glia interactions, glia glia interactions, nonlinear dendritic integration (especially dendritic spikes and shunting), neurotransmitter neuromodulator interactions, or hierarchical as- Published as a conference paper at ICLR 2023 sembly operations and dynamics, to name a few (see Appendix A.2 for illustrations and further information). The downward closure of simplicial complexes could be seen as a disadvantage. For example, when including simplices of high dimension, we are also forced to limit our choices of simplices if we wish to maintain the simplicial structure. Again, whenever a k simplex exists in K, all its faces must also exist, e.g., if a triangle exists, so must its surrounding edges and their surrounding vertices. If any constituent simplex is missing, the structure of the simplicial complex is broken and in our case would become an undirected, weighted hypergraph. Instead, in our simulations, we prefer to interpret missing simplices as merely functionally missing-in-action by setting their weights to zero if we do not wish to include them in the model. This has the consequence of having no mathematical effect on our update rules while retaining the convenience of a simplicial structure. A.4 A SMALL WORKED EXAMPLE Consider a small example of just P = 3 memory patterns embedded in a simplicial Hopfield network on N = 6 neurons. First, let s consider a 3 skeleton, i.e., a network without any dilution or missing weights up to D = 3. Such a network will have functional connections totalling = 15 + 20 + 15 = 50. (10) We typically do not include functional self-connections (autapses) although Hopfield networks with such networks have been studied (Folli et al., 2017; Rocchi et al., 2017; Gosti et al., 2019). In simplicial Hopfield networks, such self-connections correspond to 0 simplices, i.e., vertices. While these vertices do exist in the underlying simplicial complex K, we set their associated weights to 0. Given N = 6, the 3 skeleton in this example is K0 ={{1}, {2}, {3}, {4}, {5}, {6}}, K1 ={{1, 2}, {1, 3}, {1, 4}, {1, 5}, {1, 6}, {2, 3}, {2, 4}, {2, 5}, {2, 6}, {3, 4}, {3, 5}, {3, 6}, {4, 5}, {4, 6}, {5, 6}}, K2 ={{1, 2, 3}, {1, 2, 4}, {1, 2, 5}, {1, 2, 6}, {1, 3, 4}, {1, 3, 5}, {1, 3, 6}, {1, 4, 5}, {1, 4, 6}, {1, 5, 6}, {2, 3, 4}, {2, 3, 5}, {2, 3, 6}, {2, 4, 5}, {2, 4, 6}, {2, 5, 6}, {3, 4, 5}, {3, 4, 6}, {3, 5, 6}, {4, 5, 6}}, K3 ={{1, 2, 3, 4}, {1, 2, 3, 5}, {1, 2, 3, 6}, {1, 2, 4, 5}, {1, 2, 4, 6}, {1, 2, 5, 6}, {1, 3, 4, 5}, {1, 3, 4, 6}, {1, 3, 5, 6}, {1, 4, 5, 6}, {2, 3, 4, 5}, {2, 3, 4, 6}, {2, 3, 5, 6}, {2, 4, 5, 6}, {3, 4, 5, 6}}, K = K0 K1 K2 K3. (11) Set three patterns as ξ1 =( 1, +1, 1, +1, 1, +1), ξ2 =(+1, 1, +1, 1, 1, +1), ξ3 =( 1, 1, 1, +1, +1, +1). For all σ K0, we set w(σ) = 0. For all higher dimensions, we set µ=1 ξµ σ. (13) For example, w({1, 3}) =1/6 (( 1 +1) + (+1 +1) + ( 1 1)) = 1/6, w({3, 5, 6}) =1/6 (( 1 1 +1) + (+1 1 +1) + ( 1 +1 +1)) = 1/6, w({2, 4, 5, 6}) =1/6 ((+1 +1 1 +1) + ( 1 1 1 +1) + ( 1 +1 +1 +1)) = 1/2. Published as a conference paper at ICLR 2023 Given a set of spins S(t) at a time-step t, the network will evolve according to Equation 2, minimising the energy shown in Equation 1. The energy function consists of a sum of products of the weights with the product of their respective spins, e.g., if S(t) = (+1, +1, 1, +1, 1, 1), E = + w({1, 3}) S(t) {1,3} + + w({3, 5, 6}) S(t) {3,5,6} + + w({2, 4, 5, 6}) S(t) {2,4,5,6} + . . . = ( + 1/6 1 + . . . 1/6 1 + . . . +1/2 1 + . . . ) . (15) In this 3 skeleton case, then, the network s energy function can also be written as µ=1 ξµ i ξµ j S(t) i S(t) j µ=1 ξµ i ξµ j ξµ k S(t) i S(t) j S(t) k µ=1 ξµ i ξµ j ξµ k ξµ l S(t) i S(t) j S(t) k S(t) l Essentially, the energy function is similar to a sum of the energy functions of Krotov & Hopfield (2016) with all possible k neuron connections, but where the weights of those connections are independent of each other for each level of interaction, making each connection and each level more controllable. The memory capacity of this type of simiplicial Hopfield network is discussed in Section 2.3 and Appendix A.6. However, one of our main contributions is the findings related to the diluted case, i.e., where more than just the 0 simplices have their weights set to 0. Indeed, these are the cases we mainly evaluate in Section 2.4. Following on with the same example as above, we can create a diluted simplicial Hopfield network based on K. For example, if we chose to limit ourselves to 6 2 = 15 parameters, we could choose to apportion one third of these parameters to each dimension, i.e., set weights only for a subset K K, e.g., K 1 ={{1, 2}, {1, 6}, {2, 3}, {2, 4}, {5, 6}}, K 2 ={{1, 2, 3}, {1, 2, 6}, {1, 3, 4}, {3, 4, 5}, {3, 4, 6}}, K 3 ={{1, 3, 5, 6}, {1, 4, 5, 6}, {2, 3, 4, 5}, {2, 3, 4, 6}, {2, 3, 5, 6}}, K =K 1 K 2 K 3. In our numerical simulations, the choice of which connections to keep is entirely random. By analogy, we can think of this dilution procedure as a naïve solution to the following (fairly contrived) communications problem: Imagine we are tasked with increasing the speed at which a deliberative body of people, e.g., a very large committee, comes to its decisions. Currently, each committee member has individual channels of communication with every other member. This is good for high-fidelity, accurate, and nuanced conversations between members, but not so good for efficiency or speed of decision-making. For example, if a certain block of members consistently vote similarly, it would perhaps be quicker for those members to communicate as a group to check what their majority opinion is rather than all members individually communicating with every other member one at a time. Conversely, when two members consistently vote differently or are active members within distinct voting blocks (and especially if their votes are often tie-breakers), perhaps those two members ought to regularly discuss matters privately and in detail. Our naïve solution is to randomly replace some individual channels of communication with small group communication channels. Possibly by performing a survey of members or observing patterns in their voting or communications, we could come up with a better strategy. However, in this analogy, it appears the naïve solution works reasonably well (see results in Section 3 of the main text). We think a deserving next step will be determine better strategies, perhaps based on or accounting for the overall memory structure and correlations between memory items. Published as a conference paper at ICLR 2023 A.5 SIMPLICIAL HOMOLOGY Simplicial homology allows us to precisely count the number of holes in each dimension of a simplicial complex by calculating the k dimensional Betti number, βk. A related topological property which is particularly useful when studying low-dimensional objects (e.g., classification of surfaces) is the Euler characteristic, which for a simplicial complex can be calculated by χ(K) = P i=0( 1)k|Kk|, i.e., it is an alternating sum which balances out the number of holes in odd and even dimensions. It is related to the Betti numbers insofar as the Euler characteristic is also given by χ(K) = P i=0( 1)kβk. Note, however, that |Kk| = βk. As such, although the Euler characteristic can be used for comparing the topologies of two simplicial complexes, it is not as informative (with respect to holes) as homology (although the latter is more costly to compute, as we will now see). k-chains and boundaries. The group of k-chains is a free Abelian group with the basis of Kk, Ck = Ck(K) := ZKk := σ Kk ασσ | ασ Z The boundary (difference between the end points ) of a face σ in dimension k is j σ sign(j, σ)(σ\j). where sign(j, σ) = ( 1)i 1, where j is the i-th element of σ (ordered) and σ\j := σ\{j}. Example 1. Consider the simplicial complex K = {{1, 2, 3}, {1, 2}, {1, 3}, {2, 3}, {1}, {2}, {3}, }. (18) For k = 2, we have K2 = {{1, 2, 3}} and C2 = {{1 {1, 2, 3}}}. Let us calculate the boundary of σ = {1, 2, 3}. First, we calculate the respective sign functions: sign(1, σ) = ( 1)i 1 = ( 1)1 1 = ( 1)0 = 1 sign(2, σ) = ( 1)i 1 = ( 1)2 1 = ( 1)1 = 1 sign(3, σ) = ( 1)i 1 = ( 1)3 1 = ( 1)2 = 1. Using these values, we may calculate the boundary by 2(σ) = sign(1, σ)(σ\1) + sign(2, σ)(σ\2) + sign(3, σ)(σ\3) = (1)(σ\1) + ( 1)(σ\2) + (1)(σ\3) = (1)({2, 3}) + ( 1)({1, 3}) + (1)({1, 2}) = {2, 3} {1, 3} + {1, 2}. The boundary of {1, 2, 3} is {2, 3} {1, 3} + {1, 2}. Notice, this is a cycle. Figure 5: Geometric realisation of the simplicial complex in equation 18 and the boundary of {1, 2, 3}. Published as a conference paper at ICLR 2023 Chain complex. The k-th boundary mapping, k, is the map Ck(K) Ck 1(K). If k > m 1 or k < 1, then Ck(K) := 0 and k := 0, where m is the number of 0 simplices. Based on this, the chain complex of K is 0 Cn 1(K) n 1 . . . 2 C1(K) 1 C0(K) 0 C 1(K) 0. We define 2 := = 0. For example, i 1 i = 0. This has the consequence of making the boundary of, say, a solid tetrahedron (3 simplex) a set of oriented triangles (2 simplices) with a net flow of 0 (similar to Stokes curl theorem in calculus). Example 2. Consider the following simplicial complex and its chain complex: K = {{1, 2}, {1}, {2}, {3}, {4}, }, 0 C1(K) 1 C0(K) 0 C 1(K) = 0. The boundary map 1 is {1, 2} 7 {2} {1} and all faces in K0 are mapped to the empty set by 0. Homology. We can now see that our k-cycles are Zk = ker k (k-chains α where k(α) = 0). Whereas, the k-boundaries are Bk = im k+1 (k-chains in the image of k+1). Notice, Bk Zk. The (reduced) k-homology of K is the Abelian group f Hk(K) := Zk/Bk, and we define Hm 1(K) := ker m 1 for k > m 1 and e Hk(K) := 0 for k < 0. The k-th Betti number (number of topological holes) is βk = dim f Hk = dim (Zk) dim (Bk) = nullity ( k) rank ( k+1) . Note, nullity 0 = dim (C0). Example 3. Consider the simplicial complex K = {{1, 2}, {1, 3}, {2, 3}, {1}, {2}, {3}, }, which is the same as that depicted in Figure 5 but without the filled-in triangle. The chain complex is {1,2} {1,3} {2,3} {1} 1 1 0 {2} 1 0 1 {3} 0 1 1 1 / Z3 (0 map) We may compute its Betti numbers by β0 = nullity ( 0) rank ( 0+1) = 3 2 = 1 β1 = nullity ( 1) rank ( 1+1) = 1 0 = 1, and, by definition, β>1 = 0 in this example. The following are our proofs for statements in the main text. Corollary A.1 (Memory capacity is proportional to the number of network connections). If the connection weights in a Hopfield network are symmetric, then the order of the network s memory capacity is proportional to the number of its connections. Published as a conference paper at ICLR 2023 Proof. Let d be the degree of connections in a Hopfield network with N neurons. The explicit or implicit number of connections in such a network is N d. By a simple counting argument, the number of repeated connections between any set of d neurons is interpreted as d!. By Newman (1988); Demircigil et al. (2017), the order of such a network s memory capacity is N d 1. So the following relationship holds: N d 1 Lemma A.2 (Fully-connected mixed Hopfield networks). A fully-connected mixed Hopfield network based on a D skeleton with N neurons and P patterns has, when N and P is finite: fixed point attractors at the memory patterns and dynamic convergence towards the fixed point attractors within a finite Hamming distance δ. When P with N , the network has capacity to store up to (PD d=1 N d)/(2 ln N) memory patterns (with small retrieval errors) or (PD d=1 N d)/(4 ln N) (without retrieval errors). Proof. Let K be a D skeleton. Let S(t) i be the spin of neuron i at time-step t, and let the spin correspond to a stored pattern, S(t) i = ξ1 i . To be in a fixed point, the local field hi applied to i must satisfy the inequality S(t) i hi > 0, meaning the local field being applied to the neuron must be of the same sign as the present spin. In the case of the mixed network, based on the simplicial Hopfield Equations 1 2, the local field is hi{S(t)} = 1 |K1| µ=1 ξµ i ξµ σ\i S(t 1) σ\i + 1 |K2| µ=1 ξµ i ξµ σ\i S(t 1) σ\i + . . . µ=1 ξµ i ξµ σ\i S(t 1) σ\i Because K is a D skeleton, each dimension Kd will have N d elements. Using Stirling s Approximation for the binomial coefficient, N d N d/d! (for N d, which holds if dim(K) is small, which we argue it should normally be for both computational and biological reasons), we can simplify Equation 19 slightly µ=1 ξµ i ξµ σ\i S(t 1) σ\i To analyse the stability of a pattern, we set S(t) i = ξ1 i , where the choice of 1 is arbitrary (since the weights are symmetric). Substituting ξ1 for S and Equation 20 for h, the inequality we must satisfy for i = 1 becomes d! Qd m(N m) µ=2 ξ1 1ξµ 1 ξµ σ\iξ1 σ\i Notice in Equation 21 we decomposed the summation over patterns into signal terms (for the pattern we are analysing) and noise terms (for the contribution of all other patterns). In the limit of N , the signal terms are fixed numbers (of order 1) and, by the Central Limit Theorem, since the noise terms are sums of random numbers (essentially, random walks), they will have means of 0 and standard deviations of v u u t(P 1) m (N m), (22) Published as a conference paper at ICLR 2023 which we can approximate as p P/N d. We can see that as d increases, the noise terms reduce in variance. However, if P remains fixed and N is sufficiently large, the noise terms become negligible compared to the signal terms. This therefore guarantees that every pattern will be a fixed point. Furthermore, these fixed points will remain highly stable against random noise. Suppose we randomly flip a finite number δ of spins away from a pattern ξ = S (a fixed point). The signal terms strengths are reduced by 2δ but still of order 1, whereas the noise terms remain of order N 1/2. Therefore, states within Hamming distance δ away from ξ will converge to the fixed point. Now let P with N. The total variance of the noise terms is P/N d. (23) The probability of stability of neuron i in Equation 21 is the probability that the noise terms are larger than 1; at 1 they will overcome the signal terms. This probability is Pr(ξ1 1h1 > 0) = 1 1 dx exp x2 erf(x) = 2 π 0 dte t2. (25) For small v (which this is, especially as d increases and relative to the signal), the value of the error function in Equation 24 will be large and can therefore be approximated (Gradshteyn & Ryzhik, 2007) as erf(x) 1 1 πxe x2. (26) We can now approximate Equation 24 as Pr(ξ1 1h1 > 0) 1 r z d=1 P/N d. (28) We can now calculate the probability of a stable pattern, i.e., that the inequality ξ1 i hi > 0 is satisfied for all i, with Pr(stable pattern) 1 r z Since N , Equation 29 will be close to 1 as the second term will be negligible. This will always be true if Published as a conference paper at ICLR 2023 z = 1 2ln N . (30) Therefore, since we have N neurons with PD d=1 N d connections between them, the maximum number of patterns we may store (while accepting small errors) is pc = PD d=1 N d 2 ln N . (31) Or, if we cannot accept errors, pc = PD d=1 N d 4 ln N . (32) An additional informal perspective to consider regarding memory capacity is to notice that if P scales with N, the ratio between the signal and noise terms will be constant. And, since P N d, the theoretical memory capacity scales polynomially with N and linearly with D, and so the theoretical capacity is approximately PD d=1 cd N d, where cd is a constant which depends on d. A.7 NUMERICAL IMPLEMENTATION OF SIMILARITY MEASURES As in Millidge et al. (2022), in order to fairly compare similarity functions, we: (i) normalised similarity scores (separately for each similarity function) so their sum would be equal to 1 (since different measures had intrinsically different scales); and (ii) for distance measures, used the normalised reciprocal (since distance measures return low values for similar inputs, but the model relies on high values being returned for similar inputs, as in the dot product). A.8 SUPPLEMENTARY FIGURES AND TABLES Table 3: Pearson correlation coefficients (r) between overlap and β1 for mixed diluted networks from Table 2. Bolded values are significant at α = 0.05 (without multiple comparisons adjustment). With multiple comparisons adjustment, there are no significant correlations. Given their construction, all networks have β0 = 1, and although there is a small chance of 2 dimensional holes in some networks, we found that β 2 = 0 for all simulated networks. No. patterns 0.05N 0.1N 0.15N 0.2N 0.3N R12 0.02 0.09 0.21 0.15 0.01 R12 N/A 0.05 0.1 0.01 0.08 Table 4: Extended list of network condition keys (top row), their number of non-zero weights for 1 , 2 , and 3 simplices (second, third, and fourth rows). N is the number of neurons. For simulation, the number of simplices at each dimension are rounded to the nearest integer. R3 R123 R123 R123 R123 1 simplices 0 0.50 N 2 2 simplices 0 0.25 N 2 3 simplices Published as a conference paper at ICLR 2023 Figure 6: Box and whisker plots of final overlap distributions from traditional simplicial Hopfield networks with varying numbers of embedded patterns. Orange lines indicate the median. The red dashed line indicates an overlap of 0.5, chance. Table 5: Mean standard deviation of overlap distributions (n = 100) from traditional simplicial Hopfield networks with varying numbers (top row) of random binary patterns. Keys per Table 4. At all pattern loadings, a one-way ANOVA showed significant variance between the networks (p < 10 12, F > 13.25). No. patterns 0.05N 0.1N 0.15N 0.2N 0.3N R123 1 0 0.99 0.08 0.97 0.17 0.93 0.22 0.89 0.15 R123 1 0 1 0 0.98 0.05 0.95 0.17 0.91 0.18 R123 1 0 1 0 1 0 0.96 13 0.93 0.13 R123 1 0 1 0 1 0 1 0 1 0 R3 0.94 0.06 0.78 0.14 0.52 0.15 0.51 0.13 0.51 0.14 Table 6: Mean standard deviation (n = 10) of fraction of correctly recalled MNIST memory patterns in simplicial Hopfield networks at a memory loading of 1000 memories. Network performance varied significantly. K1 R12 R12 R123 Euclidean 1 0 1 0 1 0 1 0 Manhattan 1 0 1 0 1 0 1 0 Dot Product 0.93 0.03 0.93 0.02 0.94 0.02 1 0 ced - 0.90 0.02 0.91 0.03 1 0 cmd - 0.95 0.03 0.97 0.03 1 0 Published as a conference paper at ICLR 2023 Figure 7: Relative energies of continuous modern networks in a 10 10 grid plane of the first two dimensions of PCA space, as computed using the memory patterns. Each network has N = 10 and the same P = 10 memory patterns are embedded. Network conditions vary by row and inverse temperatures vary by column. Black dots are projections of the 10 embedded patterns in the PCA space. The combined explained variance of the first two principle components is 59.3% of the memory patterns. Published as a conference paper at ICLR 2023 Table 7: Same as Table 6 but for CIFAR-10. K1 R12 R12 R123 Euclidean 0.31 0.08 0.39 0.08 0.51 0.07 0.64 0.08 Manhattan 0.70 0.06 0.77 0.06 0.90 0.05 0.97 0.04 Dot Product 0.50 0.07 0.56 0.07 0.68 0.06 0.72 0.08 ced - 0.61 0.06 0.74 0.06 0.81 0.07 cmd - 0.65 0.07 0.91 0.06 0.99 0.02 Table 8: Same as Table 6 but for Tiny Image Net. K1 R12 R12 R123 Euclidean 0.31 0.15 0.34 0.14 0.55 0.12 0.61 0.15 Manhattan 0.63 0.10 0.70 0.08 0.84 0.08 0.91 0.09 Dot Product 0 0 0 0 0 0 0 0 ced - 0.51 0.14 0.65 0.13 0.70 0.11 cmd - 0.71 0.08 0.92 0.06 0.95 0.06