Skip to main content

Introduction to PyTorch Geometric and Weights & Biases

A guide to getting started on PyG with Weights & Biases
Created on April 2|Last edited on April 11

Introduction

Graphs are a versatile data representation that can capture complex relationships between objects in various domains, such as social networks, molecular structures, and recommendation systems. PyTorch Geometric (PyG) is a powerful and highly extensible library built on top of PyTorch for implementing deep learning models on graph-structured data.
The rapid growth of graph-based data has led to the development of graph neural networks (GNNs), a class of deep learning models specifically designed for learning representations of graphs. PyTorch Geometric streamlines the process of implementing GNNs by providing efficient data structures, preprocessing utilities, and a wide range of graph-based neural network layers and models. It enables researchers and practitioners to quickly prototype and experiment with GNNs while still enjoying the full flexibility of the underlying PyTorch framework.
Some key features of PyTorch Geometric include:
  • Efficient data handling and processing for graphs of varying sizes and structures.
  • A comprehensive collection of graph neural network layers and models, such as graph convolutional networks (GCNs), graph attention networks (GATs), and GraphSAGE.
  • Built-in support for common benchmark datasets, making it easy to evaluate and compare different GNN models.
  • Customizable data transforms and augmentations, allowing for seamless pre-processing and data augmentation within the PyG ecosystem.
  • Integration with popular machine learning tools and platforms, such as Weights & Biases, for experiment tracking, visualization, and collaboration.
In this article, we will dive into the core components and features of PyTorch Geometric, exploring how to handle graph data, apply data transforms, and build custom graph neural network models. We will also demonstrate how to integrate PyG with Weights & Biases for efficient experiment tracking and collaboration. By the end of this article, you will have a solid understanding of the PyTorch Geometric library and be well-equipped to start working on your own graph-based machine learning projects.

5 Reasons to Use PyTorch Geometric

PyTorch Geometric is a powerful library designed specifically for graph-based machine learning tasks, making it an excellent choice when you're working with graph-structured data or complex relational data. There are several reasons why you should consider using PyTorch Geometric for your projects:

1. Rich Graph Data Representation and Handling 📊

Graphs are a natural way to represent complex relationships between entities, and PyTorch Geometric excels at handling graph data. With its flexible data structure, PyG makes it easy to represent and work with graphs, including node features, edge features, and global features. This makes PyG a perfect choice for various applications, such as social network analysis, molecular property prediction, recommendation systems, and more.

2. Extensive Library of Graph Neural Network (GNN) Layers and Models 🧠

PyTorch Geometric comes with a wide range of pre-built GNN layers and models, making it easy to experiment with different graph-based architectures. From simple Graph Convolutional Networks (GCNs) to more advanced models like GraphSAGE, GAT, and ChebNet, PyG has you covered. This extensive library allows you to quickly prototype and experiment with various GNN architectures to find the best solution for your specific problem.

3. Efficient and Scalable 🚀

PyTorch Geometric is built with efficiency and scalability in mind. By leveraging sparse tensor operations and message-passing paradigms, PyG ensures that your models run efficiently even when dealing with large-scale graphs. This makes it suitable for real-world applications where the amount of graph data can be quite substantial. Furthermore, PyG supports parallelization over a mini-batch by creating sparse block diagonal adjacency matrices, which allows for efficient training of GNNs with varying graph sizes in a single batch.

4. Seamless Integration with PyTorch 🤝

As an extension of the popular PyTorch framework, PyTorch Geometric seamlessly integrates with the PyTorch ecosystem. This means that you can easily take advantage of existing PyTorch functionalities, such as automatic differentiation, GPU acceleration, and various optimization algorithms. This tight integration with PyTorch makes it easy to build end-to-end machine learning pipelines with PyG, from data processing to model evaluation and deployment.

5. Active and Growing Community 💡

PyTorch Geometric has an active and growing community of researchers and practitioners who contribute to the library's development and share their expertise. This ensures that PyG stays up to date with the latest research advances in the field of graph neural networks and provides a strong foundation for future improvements and extensions. By choosing PyTorch Geometric, you'll be joining a thriving community and have access to a wealth of knowledge and resources.


In summary, PyTorch Geometric is an excellent choice when working with graph-structured data, offering a rich set of tools and functionalities for data representation, graph neural network models, and scalability. Its seamless integration with PyTorch and active community support makes it an attractive option for both researchers and practitioners looking to leverage the power of graph-based machine learning.

Working with PyG

PyTorch Geometric (PyG) simplifies the process of working with graph-structured data by providing useful abstractions and data structures. It's essential to have a solid understanding of its core components and functionalities so, in the following section, we'll provide a comprehensive guide on working with PyG, covering topics such as data handling, graph neural network architectures, and training and evaluation. This section will serve as a foundation for your PyTorch Geometric journey, enabling you to tackle graph-structured problems effectively and efficiently.
So, let's dive in and explore the various aspects of working with PyG!

The Data Class

The fundamental class for handling graph data in PyG is torch_geometric.data.Data. A Data object holds the following attributes by default:
  1. data.x: Node feature matrix with shape [num_nodes, num_node_features]
  2. data.edge_index: Graph connectivity in COO format with shape [2, num_edges] and type torch.long
  3. data.edge_attr: Edge feature matrix with shape [num_edges, num_edge_features]
  4. data.y: Target to train against (may have arbitrary shape), e.g., node-level targets of shape [num_nodes, *] or graph-level targets of shape [1, *]
  5. data.pos: Node position matrix with shape [num_nodes, num_dimensions]
These attributes can be extended or omitted depending on the specific use case.

Utility Functions

The Data class provides a number of useful utility functions, such as:
  1. data.num_nodes(): Returns the number of nodes in the graph.
  2. data.num_edges(): Returns the number of edges in the graph.
  3. data.num_features(): Returns the number of features per node.
  4. data.contains_isolated_nodes(): Returns whether the graph contains isolated
nodes.
  1. data.contains_self_loops(): Returns whether the graph contains self-loops.
  2. data.is_directed(): Returns whether the graph is directed.
  3. data.to(): Transfers the data object to the specified device.
  4. data.apply(func, keys=None): Applies the function func to all attributes specified in keys.
These utility functions provide an easy way to inspect and manipulate the graph data during the development process.

Data Transforms

PyG supports data transformations to preprocess, augment, or manipulate the graph data. Transforms are functions that take a Data object as input and return a new, transformed Data object. They can be chained together using torch_geometric.transforms.Compose and are applied before saving a processed dataset on disk (using the pre_transform attribute) or before accessing a graph in a dataset (using the transform attribute).
Some common transforms include:
  1. torch_geometric.transforms.KNNGraph(k): Generates a k-nearest neighbors (KNN) graph from the given node features or positions.
  2. torch_geometric.transforms.RandomNodeTranslate(translate): Randomly translates the node positions by a value sampled from a uniform distribution within the range [-translate, translate].
  3. torch_geometric.transforms.RandomRotate(degrees, axis): Randomly rotates the node positions around a specified axis by a value sampled from a uniform distribution within the range [-degrees, degrees].

Learning Methods on Graphs

PyTorch Geometric provides various graph neural network layers and architectures that can be easily integrated with the PyTorch framework. Some popular layers include:
  1. torch_geometric.nn.GCNConv: Graph Convolutional Network (GCN) layer.
  2. torch_geometric.nn.GATConv: Graph Attention Network (GAT) layer.
  3. torch_geometric.nn.ChebConv: ChebNet layer based on Chebyshev polynomials.
  4. torch_geometric.nn.SAGEConv: GraphSAGE layer for inductive learning on large graphs.
These layers can be used to build custom graph neural network architectures. In the forward pass of the network, the non-linearity is not integrated into the convolutional layers and needs to be applied afterward. This design choice allows for greater flexibility and is consistent across all operators in PyG.

Training and Evaluation

Training and evaluation of graph neural networks in PyG follow the same procedure as in standard PyTorch. You can define a custom model by subclassing torch.nn.Module and implementing the forward method. During training, you can use the torch_geometric.loader.DataLoader to handle mini-batches and parallelize the computations.
Here's an example of a simple Graph Convolutional Network (GCN) model:
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv

class GCN(torch.nn.Module):
def __init__(self, num_node_features, hidden_channels, num_classes):
super(GCN, self).__init__()
self.conv1 = GCNConv(num_node_features, hidden_channels)
self.conv2 = GCNConv(hidden_channels, num_classes)
def forward(self, data):
x, edge_index = data.x, data.edge_index
x = self.conv1(x, edge_index)
x = F.relu(x)
x = F.dropout(x, p=0.5, training=self.training)
x = self.conv2(x, edge_index)
return F.log_softmax(x, dim=1)

Create an Instance of the GCN model

Once you have defined your model, you can train it using the standard PyTorch training loop. Don't forget to use the torch_geometric.loader.DataLoader for handling mini-batches during training:
from torch_geometric.loader import DataLoader

model = GCN(num_node_features, hidden_channels, num_classes)

# Create a DataLoader for your dataset
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

# Training loop
for epoch in range(num_epochs):
model.train()
for batch in train_loader:
optimizer.zero_grad()
out = model(batch)
loss = F.nll_loss(out[batch.train_mask], batch.y[batch.train_mask])
loss.backward()
optimizer.step()
To evaluate the model, you can use the same DataLoader and loop through the test dataset:
test_loader = DataLoader(test_dataset, batch_size=32)

model.eval()
correct = 0
total = 0
for batch in test_loader:
out = model(batch)
pred = out.argmax(dim=1)
correct += (pred == batch.y).sum().item()
total += batch.num_nodes

print(f"Test accuracy: {correct / total:.4f}")


Utilizing Weights & Biases and PyTorch Geometric

As we delve deeper into the world of graph neural networks, it becomes increasingly important to manage and optimize our experiments effectively. In this section, we will discuss how to harness the combined power of PyTorch Geometric and Weights & Biases to streamline the process of building, training, and evaluating graph neural networks. We'll demonstrate how these two tools complement each other, providing invaluable insights into the training process, hyperparameter tuning, and visualization of your models.

Setting up the Environment

To begin, we install the necessary libraries: PyTorch Geometric and Weights & Biases.
!pip install torch-geometric
!pip install wandb
We also import the required packages and initialize the W&B run.
import wandb
wandb.login()

import torch
from torch_geometric.datasets import Planetoid
from torch_geometric.nn import GCNConv

Loading and Analyzing the Dataset

The Cora dataset is a citation network consisting of 2,708 research papers classified into one of seven categories. It is available through the PyTorch Geometric library. We load the dataset and visualize it using NetworkX and Plotly.
dataset_path = '/tmp/Cora'
dataset_name = 'Cora'
dataset = Planetoid(root=dataset_path, name=dataset_name)
data = dataset[0].to(device)
We then analyze the dataset and log the details to W&B.
data_details = {
"num_nodes": data.num_nodes,
"num_edges": data.num_edges,
"has_isolated_nodes": data.has_isolated_nodes(),
"has_self_loops": data.has_self_loops(),
"is_undirected": data.is_undirected(),
"avg_node_degree": data.num_edges / data.num_nodes,
"num_node_features": data.num_node_features,
"num_edge_features": data.num_edge_features,
"num_classes": dataset.num_classes,
"labels": label_dict
}

run.log(data_details)

Run set
1




Building the Model

We create a simple Graph Convolutional Network (GCN) model to perform node classification on the Cora dataset. The model consists of two GCN layers with a ReLU activation function and dropout in between.
class GCN(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv1 = GCNConv(dataset.num_node_features, config.get("latent_size", 16))
self.conv2 = GCNConv(config.get("latent_size", 16), dataset.num_classes)

def forward(self, data):
x, edge_index = data.x, data.edge_index

x = self.conv1(x, edge_index)
x = F.relu(x)
x = F.dropout(x, training=self.training)
x = self.conv2(x, edge_index)

return F.log_softmax(x, dim=1)

Training and Evaluation

We train the model using the Adam optimizer and negative log-likelihood (NLL) loss.
model = GCN().to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=config.get("lr", 0.01), weight_decay=config.get("weight_decay", 5e-4))
criterion = F.nll_loss
We train the model for the specified number of epochs and log the training loss to W&B.
model.train()
for epoch in range(config.get("epochs", 200)):
optimizer.zero_grad()
out = model(data)
loss = criterion(out[data.train_mask], data.y[data.train_mask])
run.log({"epoch": epoch, "train/loss": loss})
loss.backward()
optimizer.step()
After training, we evaluate the model on the test set, calculate the accuracy, and log the results to W&B.
model.eval()
pred = model(data).argmax(dim=1)

eval_predictions = pred[data.test_mask]
eval_ground_truth = data.y[data.test_mask]

correct = (eval_predictions == eval_ground_truth).sum()
acc = int(correct) / int(data.test_mask.sum())

run.log({"eval/acc": acc})

Run set
1


Saving the Model and Dataset

We save the trained model and dataset as artifacts in W&B, ensuring reproducibility and versioning.
model_path = "/tmp/saved_model.pt"
model_name = "cora_gcn_model"
torch.save(model.state_dict(), model_path)

model_artifact = wandb.Artifact(model_name, type="model")
model_artifact.add_file(model_path)
run.log_artifact(model_artifact)

dataset_artifact = wandb.Artifact(name=dataset_name, type="dataset", metadata=data_details)
dataset_artifact.add_dir(dataset_path)
run.log_artifact(dataset_artifact)

run.finish()

Run set
1



Conclusion

In this article, we've explored PyTorch Geometric and Weights & Biases, two powerful tools that can greatly benefit machine learning practitioners working with graph-structured data.
PyTorch Geometric is a versatile framework designed specifically for graph-structured data, providing a wide range of tools and utilities for handling graphs, preprocessing and transforming data, constructing custom graph neural network architectures, and training and evaluating models using familiar PyTorch techniques. With PyG, you can efficiently develop and deploy cutting-edge graph-based models for diverse applications such as social network analysis, molecular property prediction, and 3D mesh analysis.
On the other hand, Weights & Biases serves as an exceptional platform for tracking experiments, visualizing results, and fostering collaboration when working with graph neural networks. It assists in monitoring the training process, comparing models and hyperparameters, and sharing your work with colleagues. Furthermore, W&B supports graph visualization, which can be invaluable for understanding and debugging graph-based models.
We've showcased the seamless integration of these tools by training a Graph Convolutional Network on the Cora dataset. By utilizing these powerful resources, you can build, train, and evaluate machine learning models more effectively, while maintaining comprehensive records of experiments, versioning, and collaboration.
We encourage you to delve into the provided example code and experiment with various datasets and model architectures to further your understanding of both PyTorch Geometric and Weights & Biases.
Iterate on AI agents and models faster. Try Weights & Biases today.