# association_patternenhanced_molecular_representation_learning__daae6f64.pdf Association Pattern-enhanced Molecular Representation Learning Lingxiang Jia1,2, Yuchen Ying1,2, Tian Qiu1,2, Shaolun Yao1,2, Liang Xue3*, Jie Lei4, Jie Song1,2, Mingli Song1,2, Zunlei Feng1,2* 1State Key Laboratory of Blockchain and Data Security, Zhejiang University 2Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security 3Computing Science and Artificial Intelligence College, Suzhou City University 4College of Computer Science, Zhejiang University of Technology {lingxiangjia, yingyc, tqiu, yaoshaolun}@zju.edu.cn, xliang@szcu.edu.cn, jasonlei@zjut.edu.cn, {sjie, brooksong, zunleifeng}@zju.edu.cn The applicability of drug molecules in various clinical scenarios is significantly influenced by a diverse range of molecular properties. By leveraging self-supervised conditions such as atom attributes and interatomic bonds, existing advanced molecular foundation models can generate expressive representations of these molecules. However, such models often overlook the fixed association patterns within molecules that influence physiological or chemical properties. In this paper, we introduce a novel association pattern-aware message passing method, which can serve as an effective yet general plugand-play plugin, thereby enhancing the atom representations generated by molecular foundation models without requiring additional pretraining. Additionally, molecular propertyspecific pattern libraries are constructed to collect the generated interpretable common patterns that bind to these properties. Extensive experiments conducted on 11 benchmark molecular property prediction tasks across 8 advanced molecular foundation models demonstrate significant superiority of the proposed method, with performance improvements of up to approximately 20%. Furthermore, a property-specific pattern library is tailored for blood-brain barrier penetration, which has undergone corresponding mechanistic validation. Code & Appendix https://github.com/hry98kki/APMP Introduction Molecular properties, encompassing aspects such as physiology, biophysics, and physical chemistry, play a critical role in determining the viability and usability of pharmaceutical compounds (Glaser 2012; Widmaier, Raff, and Strang 2022; Silbey et al. 2022). These properties often pose significant challenges during clinical trials. For instance, extensive toxicity testing is typically required, wherein medical and pharmaceutical experts evaluate the potential presence of toxic structural alerts in drug candidates. This process is followed by in vivo and in vitro experiments to further assess the drug s safety and feasibility. While such thorough evaluation is essential to ensure patient safety, it also substantially increases time and resource consumption associated with drug development (Di Masi, Grabowski, and Hansen 2016). Thus, efficient strategies are required to promote drug discovery. *Corresponding authors. Copyright 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Pattern Interaction Association Pattern-aware Message Passing Graph-based Message Passing Figure 1: Illustration of message passing of graph-based and association pattern-aware for atom representation updates. The rapid development of deep learning has significantly advanced the automation of molecular property prediction for drug compounds. Given the inherent graph structure of drug molecules, the encoding process for molecular graphs is typically designed as message passing based on graph neural networks (Kipf and Welling 2016; Veliˇckovi c et al. 2018; Xu et al. 2019; Yao et al. 2022), which aggregate information from neighboring nodes to update the representation of a central atom, focusing on local contexts, as illustrated in Figure 1. Currently, state-of-the-art methods in this field predominantly rely on molecular foundation models (MFMs) (Xia et al. 2023). These models utilize various self-supervised signals, such as atom attributes, interatomic bonds, and 3D structures, to effectively capture complex relationships within molecular structures and expressive representations of molecular properties through pretraining on vast datasets of unlabeled molecular data using graph-based encoders, thus allowing them to transfer the learned knowledge to a wide range of downstream property prediction tasks, such as predicting toxicity, solubility, and other physicochemical properties of these drug candidates. However, the aforementioned architectures typically overlook the impact of path patterns on learning of expressive atom representations, even though these patterns are directly relevant to target properties (Nelson 1982; Al-Hasani and The Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25) Bruchas 2011; Kalgutkar 2019; Jia et al. 2022; Yang et al. 2022), limiting the ability to fully capture complex atomic interactions within molecular graphs. For instance, common patterns linking the phenyl and amino groups, known as arylamines, typically determine the liver toxicity in humans (Smith et al. 2003; Usui et al. 2009). Therefore, specific patterns associated with molecular properties should be considered during chemical message passing through atoms in the graph. By extracting such association path-aware information, atomic representations and consequently, the entire molecular graph become more comprehensive and meaningful due to the interpretability of common patterns related to specific properties. To enhance atom representations from the perspective of association patterns, we propose a novel association pathaware message passing method that explicitly integrates relevant patterns within molecular structures. The proposed method can be seamlessly integrated into existing MFMs as a plugin to enhance the representation learning process. Specifically, by exploring underlying path patterns, the proposed method generates expressive pattern-aware representations for molecular properties, forming an interpretable library of property-specific patterns. Experimental results on property prediction tasks demonstrate the effectiveness of the proposed association path-aware message passing. Our contributions are summarized as follows: We propose a novel association path-aware message passing method that samples high-confidence biological patterns associated with nodes and thus extracts common patterns within molecular graphs, which can effectively enhance the expressive ability of atom representations. A simple yet general association pattern-enhanced plugin is designed, which can be easily integrated into existing MFMs without requiring a complex and time-consuming pretraining process. By applying the pattern-enhanced plugin to 8 advanced MFMs, the performance improvements achieve up to 20% across 11 benchmark property prediction tasks. An interpretable library of high-confidence patterns is constructed for blood-brain barrier penetration, thereby offering valuable insights for drug discovery. Related Work Molecular Foundation Models. Inspired by the success of large language models, pretrained MFMs have been developed to learn universal molecular representations from massive unlabeled molecules and then finetuned on specific downstream tasks (Xia et al. 2023). N-GRAM (Liu, Demirel, and Liang 2019) assembled atom node embeddings from short walks and employed simple and classical methods to predict molecular properties. GROVER (Rong et al. 2020) integrated both atom-level and graph-level knowledge through pretext tasks. Building on atom-level information, MGSSL (Zhang et al. 2021) incorporated a motif generation supervised task. GEM (Fang et al. 2022) and Graph MVP (Liu et al. 2022) utilized 2D topological structures and 3D geometric views to generate comprehensive molecular representations, while Uni Mol (Zhou et al. 2023) em- ployed 3D molecular conformations to perform pretraining tasks involving 3D position recovery and masked atom prediction. Moreover, Mol CLR (Wang et al. 2022) introduced a contrastive learning framework by applying general graph augmentation techniques to approximately 10 million unique molecules, including atom masking, bond deletion, and subgraph removal. Despite their successful applications, these advanced methods assume that atom representations are learned through interactions among neighboring atoms or predefined molecular substructures, resulting in a lack of specificity regarding properties, which is insufficient for comprehensively learning complex interactions for a wide range of molecular properties. Path-based Graph Mining. Path-based graph mining techniques have been widely employed to extract local structural information from networks. For homogeneous networks, random walk-based strategy are usually utilized to mine the graph information. Brin and Page (1998) proposed a classic ranking algorithm Page Rank based on the link structure between web pages to obtain the importance score of each page, and Jeh and Widom (2003) further proposed a personalized extension for each node. Jeh and Widom (2002) adopted a similarity measure based on pairwise random walk, which can capture the structural similarity between nodes. Perozzi, Al-Rfou, and Skiena (2014) proposed a deep learning method that leverages local random walk information to learn vertex latent representations. For heterogeneous networks, meta-path-based methods have been introduced to capture contextual information within the network. Sun et al. (2011) proposed a metapath-based similarity framework that captures subtle semantic similarities among objects of the same type in the network. Dong, Chawla, and Swami (2017) proposed a deep learning-based heterogeneous network representation learning method that automatically learns hidden meta-path semantics, generating general node embedding representations. Wang et al. (2023) addressed the issue of low homophily in graph-based fraud detection by integrating label information to generate distinguishable neighborhood information. Specifically, for hypergraph networks, Tu et al. (2018) proposed a deep hyper-network embedding model to preserve both local and global proximities in the embedding space. Zhang, Zou, and Ma (2020) developed a selfattention-based graph neural network applicable to both homogeneous and heterogeneous hypergraphs. Jia et al. (2024) introduced an interpretable rule-mining method to enhance node representations, despite the fact that these rules are constrained by fixed lengths and lack a clear mechanism. The Proposed Method To expand the expressive capability of atom representations, we propose a novel association pattern-aware message passing method. The method is further designed as a simple yet general plugin that can be easily integrated into existing MFMs. Additionally, the molecular propertyspecific pattern libraries containing common biological pathways are constructed based on the pattern-enhanced module. See Figure 2 for the illustrations. Pattern Filtering Predictor Pretrained MFM Encoder APMP Plugin Molecular Properties Enhanced Representation Molecule Dataset Input Graph Molecular Property-specific Pattern Library ...... Pattern-enhanced Feature Common Patterns (a) Path Sampling Common Pattern Interaction Feature Update Common Pattern Starting Node Pattern Coefficient Figure 2: Illustrations of the proposed association pattern-enhanced method. (a) The association pattern-aware message passing process for certain node in the molecular graph. (b) The pipeline of the APMP method integrated into MFMs as a plugin. Hence, the three parts of this section address the following research questions, respectively: RQ1: How to sample and fuse association patterns in molecular graphs? RQ2: Where to integrate the association pattern-enhanced module into the MFM architectures? RQ3: What to be derived from mining association patterns for molecular properties? Association Pattern-aware Message Passing To comprehensively address feature propagation and interaction of association patterns, the proposed Association Pattern-aware Message Passing (APMP) module comprises three core components: First, the Pattern Sampling (PS) block is designed to sample potential patterns for each atom node within the molecular graph based on random walk. Next, the Confidence-based Pattern Filtering (CPF) mechanism is devised to filter out low-confidence patterns and retain the most representative ones. Finally, the Common Pattern Interaction (CPI) block is introduced to update node features through message interaction of the sampled association patterns and mine common patterns, thus solving RQ1. Pattern Sampling. Considering the molecular graph G = (V, E), where V presents the set of atoms, E presents the set of bonds. Technically, G is further formulated as an attributed graph with an adjacency matrix H {0, 1}|V| |V| and node attributes X R|V| f, which is defined as: H[u, v] = 1, if node u links to v 0, if node u does not link to v. (1) To mine potential patterns for each atom node, the random walk (Xia et al. 2019) is adopted to sample connected paths with variable length. Specifically, the random walk process starts from the target atom node v, constructing paths step by step by randomly selecting neighboring nodes. Each step follows the node connectivity relationships from H to ensure that paths are validly generated within graph G. The generation of paths ends with setting a step limit of L. Formally, this process can be described as pv = Random Walk(G, v, L), where pv ZL and pv[i] denotes the i-th node starting from node v. Additionally, to expand the robustness and diversity of the generated path, the randomized sequential masking mechanism is adopted to randomly obscure the latter portion of paths with mask matrices m {0, 1}L, which is formulated as: pv = pv (1L m) + ( 1)L m, (2) where denotes the element-wise multiplication operation. If pv[i] = 1, it indicates that the i-th node is masked. Following the above process, N different random paths are generated for each starting node v. These paths share the same starting node but may visit different neighboring nodes during the walk, producing multiple distinct paths. Consequently, the entire path set is denoted as Pv ZN L, where Pv[j] denotes the j-th path starting from node v. Confidence-based Pattern Filtering. After generating the N random paths Pv for node v, they are evaluated and thus filtered to ensure the selection of top n representative paths as the candidates of common pathway patterns. Definition 1 Given certain sampled path pv starting from node v, the confidence of pv is defined as the cumulative sum of the structural attributes of nodes along pv, representing the property-specific pathway contribution for node v. Following Definition 1, the centrality-based confidence scoring mechanism is introduced to assess the significance of each path by summing the centrality scores of all nodes along the path. Specifically, the degree centrality for node v is the fraction of nodes it is connected to. The degree centrality values are normalized by dividing by the maximum possible degree, i.e. |V| 1. The formula for the degree centrality CG(v) of node v is as follows: CG(v) = 1 |V| 1 u V H[u, v]. (3) Subsequently, the path confidence score Sv is calculated based on the centrality of nodes within each path. The generated paths are then sorted according to Sv, and the top n paths with the highest scores are selected to form the final retained path set P v Pv. This process can be formulated as follows: v pv CG(v ), P v = arg max(Sv, Pv, n). (4) Hence, the filtered path set P v Zn L exhibits paths that are more effective and expressive for subsequent mining of common patterns. Furthermore, the corresponding path feature F v Rn L f is denoted by the concatenation of nodes along each path with the f-dimension initial atom attributes. A linear projection layer then maps each path feature from RL f Rd. Note that if there are empty nodes in the path, the zero-padding is applied to fill the path feature. Common Pattern Interaction. Given the feature vector F v of n high-confidence paths for node v, the goal is to mine the common pattern inside these paths. To this end, the CPI block is designed to aggregate the information of these paths, which consists of a composition of Transformer layers (Vaswani et al. 2017). Each Transformer layer has two parts: a multi-head attention module (MHA) and a position-wise feed-forward network (FFN). Let zv = F v Rn d denotes the input of the MHA, which is projected by three matrices WQ Rd d K, WK Rd d K, and WV Rd d V to obtain the corresponding query, key, and value representations Qv, Kv, and Vv for node v: Qv = zv WQ, Kv = zv WK, Vv = zv WV , (5) Av = Qv K v d K , Attn(zv) = softmax(Av)Vv, (6) where Av Rn n is a matrix capturing the similarity between queries and keys, which denotes the quantitative relation of n paths. For simplicity, we consider the single-head attention and assume d K = d V = d. The extension to the multi-head attention is standard and straightforward. Then we will get the output of the MHA module z v Rn d. To summarize the process of the CPI block for the entire graph G, let Z(0) = [z(0) 1 , . . . , z(0) |V| is denoted as the input embeddings of G, then the feature aggregation in these transformer layers is computed as: Z (t) = MHA(LN(Z(t 1))) + Z(t 1), (7) Z(t+1) = FFN(LN(Z (t)) + Z (t), (8) where LN is the layer normalization, and Z(t+1) is the output of the current transformer layer and the input of the Algorithm 1: Pattern-Enhanced Finetuning for MFMs Input: Molecular graph G = (V, E) with features X, target property label y Parameters: MFM encoder ΘMFM; APMP module ΘAPMP; Predictor ΘPred; Learning rate for three modules α1, α2, α3 Output: Predicted property label ˆy 1: while not converged do 2: L 0 {Initialize loss for the batch} 3: for each molecular graph Gk in the batch do 4: for vi Vk do 5: zvi APMP(vi, Xk; ΘAPMP) 6: end for 7: Mk MFM-Encoder(Xk; ΘMFM) 8: for vi Vk do 9: hvi zvi + Mvi {Element-wise addition} 10: end for 11: gk Global Pooling({hvi|vi Vk}) 12: ˆyk Predictor(gk; ΘPred) 13: L L + Loss(ˆyk, yk) 14: end for 15: L L/Batch Size 16: ΘMFM ΘMFM α1 L ΘMFM 17: ΘAPMP ΘAPMP α2 L ΘAPMP 18: ΘPred ΘPred α3 L ΘPred 19: end while 20: return ˆy next layer. Thus, after T layers, we get the encoding out- put Z(T ) = [z(T ) 1 , . . . , z(T ) |V| ] R|V| n d, and then apply the mean function to fuse n association patterns for node v to get the learned association pattern-aware embedding zv Rd. Similarly, the above strategy is applied to obtain embeddings for all other nodes within the graph. Dual-branch Pattern-enhanced Representation To effectively supplement expressive representations from the association pattern perspective, the dual-branch patternenhanced finetune strategy is proposed to enhance the atom representation generated by MFMs, thus addressing RQ2. Considering the variable designs of MFMs, the APMP module can serve as a dual-branch plugin independent of the MFM main encoder, allowing appropriate finetuning for the downstream task prediction. This enhancement strategy effectively boosts the representation without requiring indepth model redesign, thus achieving generalizability and eliminating the need for extensive retraining. Specifically, during the finetune process, the pattern representation zv of node v obtained from the pattern-enhanced module is integrated into the atom representation Mv obtained from the pretrained encoders of MFMs. Global pooling function is then performed on the enhanced atom representations to obtain the graph representation, which is fed into the predictor to predict the final property label. Figure 2b shows an illustration, and the whole finetune process is formulated in Algorithm 1. Furthermore, the in-depth validation of this plugin is provided through Table 3 in ablation experiments. Molecular Property-specific Pattern Library To mine the intrinsic mechanism of the common patterns for specific molecular property, the contribution-based pattern scoring method is proposed to utilize the contribution of common patterns to atoms based on the trained APMP module and construct the property-specific pattern libraries, thereby addressing RQ3. Specifically, the relationships of these high-confidence patterns are defined in Equation 6, i.e. Av Rn n, where n denotes the number of high-confidence patterns after the CPF block. Then, the contributions of these patterns to node v are calculated by a function, termed by Common Pattern Coefficient (CPC). The contribution of n patterns to node v calculated through CPC function is formulated as follows: Av = softmax(Av), (9) j=1 Av[j, :]. (10) Based on the CPC function, the contribution ranking for n patterns associated with node v can be conducted. Patterns with relatively high contribution values are identified and compiled into a set of common pattern candidates. After all common patterns across the molecular graph are retrieved, a filtering process is applied to obtain a molecular property-specific pattern library. Furthermore, the pattern library is comprehensively validated through the biological mechanisms referenced in the corresponding literature. Complexity Analysis Considering the APMP, the critical CPI module primary including several MHA layers and FFN layers. For certain atom node with n high-confidence patterns obtained from the CPF block, the dimension of the input and hidden features for the MHA layer are assumed d, and the dimension of hidden features for the FFN layer is d . The query, key, and value matrices are derived from the same input sequence and share the same length n. In CPI, the primary operations include scaled dot-product attention, multiplication of attention weights and values, MHA linear transformation, and FFN linear projection. The time complexity is expressed as O(n2 d + n d2 + n d d ). Hence, for the entire molecular graph G, the total complexity is O |V| (n2 d + n d2 + n d d ) , where |V| is the number of atom nodes in the G. Experiments Experimental Setting Datasets. To comprehensively evaluate the proposed method on downstream tasks, we conduct experiments on 11 benchmark datasets from Molecule Net (Wu et al. 2018), covering diverse targets, including physical chemistry (ESOL, Free Solv, Lipophilicity), biophysics (BACE, MUV, HIV), and physiology (BBBP, Tox21, Tox Cast, SIDER, Clin Tox). Among these tasks, ESOL, Free Solv, and Lipophilicity are formulated as regression tasks, while the others are treated as classification tasks. Details of these datasets can be found at Appendix C.1. Regression (RMSE, lower is better ) Datasets ESOL Free Solv Lipophilicity # Molecules 1,128 642 4,200 N-GRAM 0.8290 0.0197 0.4609 0.1251 0.8090 0.0091 + APMP 0.8153 0.0188 0.4524 0.0897 0.7940 0.0248 Impr. +1.65% +1.84% +1.85% GROVER 0.6239 0.0471 1.4754 0.2627 0.7421 0.0341 + APMP 0.6073 0.0309 1.1922 0.2922 0.6804 0.0733 Impr. +2.67% +19.20% +8.31% MGSSL 1.0792 0.0216 2.4261 0.0346 0.8726 0.0079 + APMP 1.0701 0.0110 2.3427 0.0738 0.8673 0.0095 Impr. +0.84% +3.44% +0.61% GEM 0.4668 0.1277 0.8810 0.3863 0.4685 0.0671 + APMP 0.4606 0.0346 0.8211 0.3856 0.4461 0.0579 Impr. +1.33% +6.80% +4.78% Graph MVP 0.6535 0.0399 1.6031 0.3622 0.6072 0.0610 + APMP 0.6532 0.0371 1.5688 0.3991 0.6011 0.0669 Impr. +0.05% +2.14% +1.01% Mol CLRGCN 1.1428 0.1512 2.4033 0.0983 0.7621 0.0472 + APMP 1.1138 0.1581 2.3037 0.2192 0.7584 0.0452 Impr. +2.54% +4.14% +0.49% Mol CLRGIN 0.8098 0.0450 1.7434 0.4078 0.6330 0.0352 + APMP 0.8062 0.0329 1.5701 0.1459 0.6271 0.0353 Impr. +0.45% +9.94% +0.93% Uni Mol 0.6322 0.0106 0.7350 0.0670 0.5571 0.0015 + APMP 0.5867 0.0429 0.7160 0.1135 0.5552 0.0072 Impr. +7.20% +2.59% +0.34% Table 1: Performance comparison on molecular property prediction regression tasks. All scores are represented as mean std with three different seeds. The third lines (Impr.) denote performance improvement with the APMP module. Baselines. We comprehensively evaluate the proposed APMP plugin integrated into 8 advanced MFMs, including N-GRAM (Liu, Demirel, and Liang 2019), GROVER (Rong et al. 2020), MGSSL (Zhang et al. 2021), GEM (Fang et al. 2022), Graph MVP (Liu et al. 2022), Mol CLR (Wang et al. 2022) and Uni Mol (Zhou et al. 2023). For Mol CLR, the backbone utilizes both Graph Convolutional Networks (GCN) (Kipf and Welling 2016) and Graph Isomorphism Network (GIN) (Xu et al. 2019). Implementation Details. To accurately evaluate model performance and prevent overfitting, three different seeds are used to evaluate the performance for each dataset and the mean and standard deviations are reported as the prediction results. Moreover, we randomly split all the datasets with a ratio for train/validation/test as 8:1:1, and use the best of validation results to select the best model evaluated on the test set. More finetuning details are found at Appendix C.2. Metrics. Following the evaluation setting of Molecule Net (Wu et al. 2018), we adopt the ROC-AUC score as the evaluation metric for eight molecular property classification tasks and RMSE for three molecular property regression tasks. Classification (ROC-AUC %, higher is better ) Datasets BBBP BACE Clin Tox Tox21 Tox Cast SIDER HIV MUV # Molecules 2,039 1,513 1,478 7,831 8,575 1,427 41,127 93,087 # Tasks 1 1 2 12 617 27 1 17 N-GRAM 90.24 2.01 88.09 3.75 84.44 4.27 82.03 0.83 - - - - + APMP 91.92 1.75 88.65 3.63 86.83 3.22 82.91 1.99 - - - - Impr. +1.68 +0.56 +2.39 +0.88 - - - - GROVER 89.42 1.70 83.13 1.24 79.93 3.07 78.40 1.34 75.06 1.17 65.93 1.77 80.58 2.25 78.02 2.19 + APMP 90.52 1.34 83.57 3.30 82.94 1.91 79.34 0.23 75.51 0.68 65.96 1.74 80.90 1.82 78.16 1.73 Impr. +1.10 +0.44 +3.01 +0.94 +0.45 +0.03 +0.32 +0.14 MGSSL 91.55 0.18 87.88 0.28 77.94 1.12 80.06 0.18 70.15 0.07 62.70 0.49 84.32 0.40 82.07 0.37 + APMP 91.74 0.31 88.21 0.34 77.99 0.21 81.34 0.18 70.54 0.13 63.23 1.62 84.61 0.45 84.65 1.62 Impr. +0.19 +0.33 +0.05 +1.28 +0.39 +0.53 +0.29 +2.58 GEM 92.99 1.06 88.77 1.38 91.38 0.89 85.58 0.72 74.83 1.04 67.04 0.57 82.78 2.85 77.46 0.41 + APMP 93.76 2.65 90.14 0.76 93.19 2.28 85.74 0.91 75.19 0.87 67.45 2.55 83.03 0.63 80.68 2.27 Impr. +0.77 +1.37 +1.81 +0.16 +0.36 +0.41 +0.25 +3.22 Graph MVP 90.43 1.71 86.36 3.10 67.68 10.29 84.52 1.33 74.14 0.82 63.46 1.81 81.47 1.62 79.08 3.27 + APMP 91.11 1.71 86.76 3.10 68.92 13.98 84.59 0.74 74.14 0.55 63.50 0.52 82.00 1.15 79.62 0.99 Impr. +0.68 +0.40 +1.24 +0.07 +0.00 +0.04 +0.53 +0.54 Mol CLRGCN 90.64 0.97 80.25 2.34 79.90 3.47 81.49 0.62 - - 82.19 0.60 - + APMP 90.78 0.67 80.39 1.54 81.16 4.17 81.99 0.88 - - 82.21 1.37 - Impr. +0.14 +0.14 +1.26 +0.50 - - +0.02 - Mol CLRGIN 92.71 0.42 85.77 2.59 87.07 3.81 82.59 0.60 - - 83.22 1.47 - + APMP 93.53 0.92 86.04 2.99 87.68 1.74 82.76 0.50 - - 83.74 1.29 - Impr. +0.82 +0.27 +0.61 +0.19 - - +0.52 - Uni Mol 89.22 1.58 87.79 3.09 89.35 3.07 84.90 0.30 71.43 1.40 63.62 3.22 82.95 0.56 75.84 2.02 + APMP 93.70 0.23 89.62 3.72 90.56 1.99 85.51 1.02 72.09 1.77 64.74 1.44 84.02 0.48 76.37 2.22 Impr. +4.48 +1.83 +1.21 +0.61 +0.66 +1.12 +1.07 +0.53 Table 2: Performance comparison on molecular property prediction classification tasks. All scores are represented as mean std with three different seeds. The third lines (Impr.) denote performance improvement with the APMP module. Note that certain tasks for N-GRAM are not reported (denoted by - ) due to the excessive time required by its original vertex embedding training strategy. Similarly, some tasks for Mol CLR are excluded due to severe validation/test imbalance, making evaluations infeasible. Quantitative Performance Comparison For the downstream molecular property prediction tasks, Table 1 and Table 2 present the overall performance results on 3 regression tasks and 8 classification tasks across 8 advanced MFMs. The performance comparisons provide the following observations: (1) All MFMs with the proposed plugin consistently outperform the baselines on all datasets with a large margin on most of them. The relative improvement across all datasets and MFMs reaches up to 4.48% for classification tasks, while for regression tasks, it is 19.20%. This remarkable boosting validates the effectiveness and generalizability of the integrated plugin to enhance the original pretrained representation of atoms. (2) For the regression tasks, the improvements through the proposed method are particularly pronounced. Especially in the small dataset Free Solv, which contains only 642 labeled molecules, the plugin achieves a 19.20% relative RMSE improvement over the baseline MFM. This confirms the efficacy of the proposed method, as it can significantly enhance performance even when label information is scarce. (3) For the classification tasks, the improvement on single-task datasets gen- erally surpasses that on multi-task datasets. This is due to the fact that each task may have its specific patterns, while in a multi-task framework, the confusion caused by common patterns can interfere with individual task performance, even though the overall performance still exceeds that of the baseline MFMs. Interpretability of BBBP Pattern Library Based on the CPC calculation, we construct the propertyspecific pattern library for the blood-brain barrier (BBB) penetration. As illustrated in Figure 3, starting from certain atom node (red star) in the Zafuleptine, we obtain two groups of sampling paths with their δ scores. For the left column, the connection of the phenyl group and fluorine group denotes the common pattern (orange regions), and for the right column the significant impact of the amine group (yellow regions) is highlighted for the atom representation. In contrast, the last row shows the paths with smaller δ values (gray regions), indicating minimal contribution to the atom representation. The above observations are intuitive: (1) The conjugated system and non-polar nature of the ben- Figure 3: Interpretability cases of the BBBP pattern library. The two columns denote the common patterns and negligible paths starting from certain atom v after two rounds of path sampling, respectively. Here, we use δ = CPC(v) n to represent the relative contribution to the node representation. zene ring endow Zafuleptine with high lipophilicity (Abbott et al. 2010; Geldenhuys et al. 2015; Cornelissen et al. 2023), enhancing its interaction with lipid components of cell membranes and promoting passive diffusion through lipophilic and hydrophilic phases across the BBB. (2) The fluorine group introduces hydrophobic properties, further increasing the molecule s lipophilicity and its ability for passive membrane diffusion (Sun and Adejare 2006; Muller, Faeh, and Diederich 2007). Notably, the robust C-F bond resists metabolic degradation, reducing the likelihood of the molecule being broken down before crossing the BBB. (3) In the context of active transport across the BBB, the amine groups can act as hydrogen bond receptors (Young et al. 1988; Pardridge 2012; Geldenhuys et al. 2015), which is crucial for binding to influx or efflux transporters involved in the BBB transport. Additional interpretability cases and the corresponding discussions can be found at Appendix C.3. Ablation Studies To validate the impact of hyperparameters on the APMP module for RQ1, we conduct the ablation studies on two key parameters, namely the filtering ratio of path candidates β = n/N and the length of sampled patterns L. Further, to investigate the mechanisms by which the plugin functions for RQ2, we design the ablation studies regarding where the plugin is integrated into the MFMs and how it is trained. Filtering Ratio of Path Candidates β. Figure 4a shows the prediction performance varying with top β% confidence of ranked sampled paths as high-confidence patterns. The performance reaches its highest and most stable levels when approximately top 30% to 45% of the paths are selected. In contrast, the performance declines when fewer than 20% or more than 80% of paths are chosen, although it still exceeds baseline without APMP. This finding underscores the need for a balanced path selection to avoid information loss or redundancy, highlighting the effectiveness of the CPF block. Length of Sampled Path L. Figure 4b shows prediction performance varying with different lengths of sampling paths. When L is relatively larger (L 6), the prediction performance surpasses compared to these shorter lengths 20 40 60 80 100 Filtering Ratio (%) 89 90 91 92 93 ROC AUC (%) w/ APMP w/o APMP 2 3 4 5 6 7 8 9 10 Sampled Length 89 90 91 92 93 ROC AUC (%) w/ APMP w/o APMP Figure 4: Ablation results on the hyperparameters of APMP for the BBBP property task, including: (a) filtering ratio of path candidates β, and (b) length of sampled patterns L. APMP Feature Integration Finetune ROC-AUC Before encoder After encoder Yes No 89.22 1.58 74.09 0.89 89.24 1.57 88.19 3.00 93.70 0.23 Table 3: Ablation results on where to integrate the APMP plugin into the baseline MFM Unimol and how it is trained for the downstream BBBP property classification task. (L<6), showing that sufficiently long paths can encompass a more diverse and expressive range of information. This finding can facilitate the discovery of common patterns and subsequently enhance the overall performance. Plugin Integration Workflow. Table 3 demonstrates that the best performance is yielded by finetuning combined representations from MFM encoders and the APMP plugin. Specifically, the pretrained MFM encoders generate the atom representations, which are enhanced by the APMP that generates the representations from common patterns, integrates them with the outputs of MFM encoders, and finetunes the combined representations via a predictor. In contrast, introducing plugin-generated features before input would require substantial additional training for the MFM encoders, potentially necessitating training from scratch, complicating rapid convergence. Additionally, if the plugin is added directly without finetuning, the representations will be quite random and difficult to improve performance. Conclusion In this paper, we introduce a novel association pattern-aware message passing method to uncover the molecular propertyspecific common patterns and expand the ability to represent complex interactions within molecular graphs. The method functions as a general plugin that can be integrated into the MFMs with pretrained encoders, effectively enhancing atom representations and thereby increasing their expressiveness. Extensive tests on 11 benchmark molecular property tasks across 8 advanced MFMs show the effectiveness of the proposed plugin. Furthermore, the construction of propertyspecific pattern library highlights the strong interpretability for further potential application in the real-world scenarios. Acknowledgments This work is supported by Zhejiang Province High-Level Talents Special Support Program Leading Talent of Technological Innovation of Ten-Thousands Talents Program (No. 2022R52046) and National Natural Science Foundation of China (No. 62376248). References Abbott, N. J.; Patabendige, A. A.; Dolman, D. E.; Yusof, S. R.; and Begley, D. J. 2010. Structure and function of the blood brain barrier. Neurobiology of disease, 37(1): 13 25. Al-Hasani, R.; and Bruchas, M. R. 2011. Molecular mechanisms of opioid receptor-dependent signaling and behavior. The Journal of the American Society of Anesthesiologists, 115(6): 1363 1381. Brin, S.; and Page, L. 1998. The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems, 30(1-7): 107 117. Cornelissen, F. M.; Markert, G.; Deutsch, G.; Antonara, M.; Faaij, N.; Bartelink, I.; Noske, D.; Vandertop, W. P.; Bender, A.; and Westerman, B. A. 2023. Explaining blood brain barrier permeability of small molecules by integrated analysis of different transport mechanisms. Journal of Medicinal Chemistry, 66(11): 7253 7267. Di Masi, J. A.; Grabowski, H. G.; and Hansen, R. W. 2016. Innovation in the pharmaceutical industry: new estimates of R&D costs. Journal of health economics, 47: 20 33. Dong, Y.; Chawla, N. V.; and Swami, A. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining(KDD), 135 144. Fang, X.; Liu, L.; Lei, J.; He, D.; Zhang, S.; Zhou, J.; Wang, F.; Wu, H.; and Wang, H. 2022. Geometry-enhanced molecular representation learning for property prediction. Nature Machine Intelligence, 4(2): 127 134. Geldenhuys, W. J.; Mohammad, A. S.; Adkins, C. E.; and Lockman, P. R. 2015. Molecular determinants of blood brain barrier permeation. Therapeutic delivery, 6(8): 961 971. Glaser, R. 2012. Biophysics: an introduction. Springer Science & Business Media. Jeh, G.; and Widom, J. 2002. Simrank: a measure of structural-context similarity. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining(KDD), 538 543. Jeh, G.; and Widom, J. 2003. Scaling personalized web search. In Proceedings of the 12th international conference on World Wide Web(WWW), 271 279. Jia, L.; Feng, Z.; Zhang, H.; Song, J.; Zhong, Z.; Yao, S.; and Song, M. 2022. Explainable Fragment-Based Molecular Property Attribution. Advanced Intelligent Systems, 4(10): 2200104. Jia, L.; Ying, Y.; Feng, Z.; Zhong, Z.; Yao, S.; Hu, J.; Duan, M.; Wang, X.; Song, J.; and Song, M. 2024. Association Pattern-aware Fusion for Biological Entity Relationship Prediction. In The Thirty-eighth Annual Conference on Neural Information Processing Systems. Kalgutkar, A. S. 2019. Designing around structural alerts in drug discovery. Journal of Medicinal Chemistry, 63(12): 6276 6302. Kipf, T. N.; and Welling, M. 2016. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations(ICLR). Liu, S.; Demirel, M. F.; and Liang, Y. 2019. N-gram graph: Simple unsupervised representation for graphs, with applications to molecules. Advances in neural information processing systems, 32. Liu, S.; Wang, H.; Liu, W.; Lasenby, J.; Guo, H.; and Tang, J. 2022. Pre-training Molecular Graph Representation with 3D Geometry. In International Conference on Learning Representations. Muller, K.; Faeh, C.; and Diederich, F. 2007. Fluorine in pharmaceuticals: looking beyond intuition. science, 317(5846): 1881 1886. Nelson, S. D. 1982. Metabolic activation and drug toxicity. Journal of medicinal chemistry, 25(7): 753 765. Pardridge, W. M. 2012. Drug transport across the blood brain barrier. Journal of cerebral blood flow & metabolism, 32(11): 1959 1972. Perozzi, B.; Al-Rfou, R.; and Skiena, S. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining(KDD), 701 710. Rong, Y.; Bian, Y.; Xu, T.; Xie, W.; Wei, Y.; Huang, W.; and Huang, J. 2020. Self-supervised graph transformer on large-scale molecular data. Advances in neural information processing systems, 33: 12559 12571. Silbey, R. J.; Alberty, R. A.; Papadantonakis, G. A.; and Bawendi, M. G. 2022. Physical chemistry. John Wiley & Sons. Smith, K. S.; Smith, P. L.; Heady, T. N.; Trugman, J. M.; Harman, W. D.; and Macdonald, T. L. 2003. In vitro metabolism of tolcapone to reactive intermediates: relevance to tolcapone liver toxicity. Chemical research in toxicology, 16(2): 123 128. Sun, S.; and Adejare, A. 2006. Fluorinated molecules as drugs and imaging agents in the CNS. Current topics in medicinal chemistry, 6(14): 1457 1464. Sun, Y.; Han, J.; Yan, X.; Yu, P. S.; and Wu, T. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment, 4(11): 992 1003. Tu, K.; Cui, P.; Wang, X.; Wang, F.; and Zhu, W. 2018. Structural deep embedding for hyper-networks. In Proceedings of the AAAI conference on artificial intelligence(AAAI), 426 433. Usui, T.; Mise, M.; Hashizume, T.; Yabuki, M.; and Komuro, S. 2009. Evaluation of the potential for drug-induced liver injury based on in vitro covalent binding to human liver proteins. Drug Metabolism and Disposition, 37(12): 2383 2392. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; and Polosukhin, I. 2017. Attention is all you need. Advances in neural information processing systems(Neur IPS), 30. Veliˇckovi c, P.; Cucurull, G.; Casanova, A.; Romero, A.; Li o, P.; and Bengio, Y. 2018. Graph Attention Networks. In International Conference on Learning Representations(ICLR). Wang, Y.; Wang, J.; Cao, Z.; and Barati Farimani, A. 2022. Molecular contrastive learning of representations via graph neural networks. Nature Machine Intelligence, 4(3): 279 287. Wang, Y.; Zhang, J.; Huang, Z.; Li, W.; Feng, S.; Ma, Z.; Sun, Y.; Yu, D.; Dong, F.; Jin, J.; et al. 2023. Label information enhanced fraud detection against low homophily in graphs. In Proceedings of the ACM Web Conference 2023(WWW), 406 416. Widmaier, E.; Raff, H.; and Strang, K. T. 2022. Vander s human physiology. Mc Graw-Hill US Higher Ed USE. Wu, Z.; Ramsundar, B.; Feinberg, E. N.; Gomes, J.; Geniesse, C.; Pappu, A. S.; Leswing, K.; and Pande, V. 2018. Molecule Net: a benchmark for molecular machine learning. Chemical science, 9(2): 513 530. Xia, F.; Liu, J.; Nie, H.; Fu, Y.; Wan, L.; and Kong, X. 2019. Random walks: A review of algorithms and applications. IEEE Transactions on Emerging Topics in Computational Intelligence, 4(2): 95 107. Xia, J.; Zhu, Y.; Du, Y.; and Li, S. Z. 2023. A systematic survey of chemical pre-trained models. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 6787 6795. Xu, K.; Hu, W.; Leskovec, J.; and Jegelka, S. 2019. How powerful are graph neural networks? In International Conference on Learning Representations(ICLR). Yang, Z.; Zhong, W.; Lv, Q.; and Chen, C. Y.-C. 2022. Learning size-adaptive molecular substructures for explainable drug drug interaction prediction by substructure-aware graph neural network. Chemical science, 13(29): 8693 8703. Yao, S.; Feng, Z.; Song, J.; Jia, L.; Zhong, Z.; and Song, M. 2022. Chemical property relation guided few-shot molecular property prediction. In 2022 International Joint Conference on Neural Networks (IJCNN), 1 8. IEEE. Young, R. C.; Mitchell, R. C.; Brown, T. H.; Ganellin, C. R.; Griffiths, R.; Jones, M.; Rana, K. K.; Saunders, D.; and Smith, I. R. 1988. Development of a new physicochemical model for brain penetration and its application to the design of centrally acting H2 receptor histamine antagonists. Journal of medicinal chemistry, 31(3): 656 671. Zhang, R.; Zou, Y.; and Ma, J. 2020. Hyper-SAGNN: a self-attention based graph neural network for hypergraphs. In International Conference on Learning Representations (ICLR). Zhang, Z.; Liu, Q.; Wang, H.; Lu, C.; and Lee, C.-K. 2021. Motif-based graph self-supervised learning for molecular property prediction. Advances in Neural Information Processing Systems, 34: 15870 15882. Zhou, G.; Gao, Z.; Ding, Q.; Zheng, H.; Xu, H.; Wei, Z.; Zhang, L.; and Ke, G. 2023. Uni-Mol: A Universal 3D Molecular Representation Learning Framework. In The Eleventh International Conference on Learning Representations.