gReLU: A Python Library for Deep Learning on DNA Sequences
Created on May 31|Last edited on May 31
Comment
gReLU is a new Python library developed by the folks Genentech to train, interpret, and apply deep learning models specifically to DNA sequences. This library aims to streamline the process of genomic data analysis using advanced machine learning techniques. You can think of gReLU as sort of the 'HuggingFace' for DNA modeling!
Weights & Biases is proud to have been used in the training of this library, and in hosting the curated model zoo.
💡

Genomic Data Input
The first step in utilizing gReLU involves handling various types of genomic data. Users can input data in formats like FASTA, BED, BigWig, BAM, AnnData, and GWAS summary statistics. These formats cater to different types of genomic information, from sequence data to annotation data, ensuring a broad range of data can be processed.
Preprocessing
gReLU also supports many popular preprocessing operations. This includes filtering, resizing, removing blacklisted sequences, and generating matched negatives. These steps are crucial for cleaning and standardizing the data, which improves the accuracy and performance of subsequent model training.
Model Design and Training
gReLU supports the design of various neural network architectures, including convolutional layers, long-range layers (such as GRU and Transformer), and U-Net's. The model training phase allows for single or multi-task regression, binary classification, segmentation, multi-class classification, and other types of modeling. Hyperparameter sweeps are also incorporated to optimize the training process. The training is facilitated by PyTorch and PyTorch Lightning, with Weights & Biases integration for experiment tracking and hosting the model zoo.-
Model Evaluation
Post-training, the gReLU supports evaluation using metrics like correlation, mean squared error (MSE), accuracy, area under the precision-recall curve (AUPRC), area under the receiver operating characteristic curve (AUROC), and F1 score. These metrics provide a comprehensive assessment of the model's performance, ensuring it meets the desired criteria.
Variant Effect Prediction
One key use of gReLU is predicting how genetic changes affect biological functions. The models can estimate the impact of changes like insertions (adding extra DNA), deletions (removing DNA), and single nucleotide variants (SNVs, which are single-letter changes in the DNA sequence). By shuffling sequences to create a baseline, the models provide estimates of how significant these changes are (effect size) and how confident we can be in these predictions (p-value). This helps prioritize which genetic variants might be most important for further study.
Sequence Design
gReLU also helps design new DNA sequences for specific purposes. It uses techniques like directed evolution (simulating natural selection to find the best sequences), base or motif substitution (changing specific parts of the sequence), and backpropagation. These designed sequences can be customized with certain constraints, like maintaining a specific GC content (the proportion of guanine and cytosine bases), to ensure they meet specific needs. This is useful for creating optimized DNA sequences for various research and industrial applications.
Model Interpretation
Understanding how the models make their predictions is crucial. gReLU offers several methods to interpret the trained models:
Gradient-based Importance Scores: These show which parts of the DNA sequence most influence the model's predictions.
TF-MoDISco: This tool identifies recurring patterns (motifs) in the DNA that are important for the model, which often correspond to sites where proteins bind to regulate gene activity.
Attention Scores: Used in models like transformers, these scores highlight the most relevant parts of the sequence for the model's prediction.
Motif Scanning: This method looks for known patterns in the DNA sequence that are biologically significant.
Sequence Simulations: These help test how different changes in the DNA might affect the model's predictions, offering a dynamic way to explore genetic functions.
These tools help pinpoint critical regions in the DNA and understand the biological mechanisms behind the model's predictions!
gReLU
gReLU is a comprehensive tool for researchers looking to apply deep learning to genomic data. Its robust framework covers everything from data preprocessing to model interpretation, making it a valuable asset for advancing genomic research and applications.
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.