Researchers present a novel foundation model for graph data, adapting the Transformer architecture typically used in large language models. Nodes are represented as sequences of random walks, enabling the model to learn representations from diverse graphs and perform well on downstream analytical tasks involving graph-structured data.
The increasing prevalence of complex, interconnected data necessitates novel approaches to machine learning, moving beyond traditional tabular formats to embrace graph-structured information. Researchers are now investigating whether the principles underpinning large language models, such as those based on the Transformer architecture, can be extended to effectively process and learn from graphs. A new study, titled ‘Toward a Graph Foundation Model: Pre-Training Transformers With Random Walks’, details a method for pre-training a Transformer model using diverse graph datasets. Ziyuan Tang from the University of Minnesota and Jie Chen from IBM Research, alongside colleagues, propose representing nodes within a graph as multiple random walks, allowing the Transformer to extract meaningful representations from sequential data. This approach addresses the challenge of encoding graphs of varying sizes and domains, and is accompanied by a novel context prediction loss function designed to enhance the model’s ability to distinguish between different network neighbourhoods and overall graph structures.
Graph neural networks are demonstrating enhanced performance through a novel pre-training methodology that utilises a decoder-only Transformer architecture and random walks to represent complex graph structures. This innovative approach learns representations applicable to diverse tasks, including node, link, and graph-level prediction, potentially establishing a pathway toward a graph foundation model, mirroring recent advancements in large language models. Context prediction consistently outperforms alternative loss functions, such as token or position reconstruction, highlighting the importance of understanding local graph structure through sequential prediction for effective pre-training.
The research team developed a system that explores graph connectivity by traversing edges randomly, creating sequential representations of the network’s structure. A Transformer architecture, originally designed for sequence modelling in natural language processing, then processes these sequences, allowing the model to capture intricate relationships within the graph data. Crucially, the incorporation of edge features within each Transformer layer proves critical to performance, enabling the model to dynamically incorporate relational data at each processing step.
Providing edge embeddings solely at the input layer yields inconsistent results, particularly hindering performance on graph-level tasks, which require a more holistic understanding of the network’s overall structure. This allows the model to learn more robust and generalizable representations, improving its ability to perform complex graph-based tasks.
Furthermore, a custom attention mask demonstrably improves performance by preventing spurious connections between unrelated nodes, a common issue with full attention mechanisms. Attention mechanisms allow the model to focus on the most relevant parts of the input sequence, but without careful regulation, they can create connections where none exist. The custom attention mask effectively filters out these irrelevant connections, allowing the model to focus on the most important relationships within the graph.
Experimental results across diverse graph datasets—including WIKICS, ARXIV, WN18RR, FB15K237, HIV, and TOX21—confirm the efficacy of the proposed approach. The model consistently achieves state-of-the-art results on node, link, and graph-level prediction tasks, validating its ability to generalise across different graph structures and domains. This suggests the learned representations capture fundamental properties of graph data, enabling effective transfer learning to downstream applications. Node-level prediction involves predicting properties of individual nodes, such as their category or function, while link-level prediction focuses on predicting relationships between nodes. Graph-level prediction involves classifying entire graphs.
The interpretability of the learned representations is crucial for building trust in the model and identifying potential biases. The researchers plan to develop techniques for visualizing and explaining the model’s internal representations, providing insights into its reasoning process.
Future research directions include exploring more sophisticated random walk strategies, potentially incorporating techniques such as biased random walks or personalized PageRank. The development of methods for efficiently handling dynamic graphs remains a significant challenge, requiring innovative approaches to model the evolving structure of the network. The expansion of downstream tasks to include more complex graph reasoning problems, such as graph editing and subgraph discovery, will further demonstrate the versatility and power of the proposed approach. Finally, the researchers plan to investigate the potential for multi-modal graph representation learning, incorporating both structural and attribute information. Combining these two types of information could unlock new capabilities and broaden the applicability of this approach.
The research team’s work represents a significant step forward in the field of graph neural networks, demonstrating the potential of Transformer-based architectures for learning representations of complex graph data. The ongoing research efforts promise to further enhance the capabilities of this approach, unlocking new possibilities for graph-based machine learning.
👉 More information
🗞 Toward a Graph Foundation Model: Pre-Training Transformers With Random Walks
🧠 DOI: https://doi.org/10.48550/arXiv.2506.14098
