How To Calculate Number of Model Parameters for PyTorch and TensorFlow Models
This article provides a short tutorial on calculating the number of parameters for TensorFlow and PyTorch deep learning models, with examples for you to follow.
Created on May 24|Last edited on July 9
Comment
We live in the age of readily accessible large models. Anyone can create a Kaggle Kernel with a pre-trained Deberta v3 model and fine-tune it on any arbitrary dataset. What many people don't realize is that they are using a 75-100 M parameter model, which was pre-trained on >100GB of training data.
Sure, over-parameterization might lead to better performance, but it's also coupled with increased storage sizes and, as a consequence, large inference times. Therefore you might want to log the number of parameters your model has.
Wouldn't it be interesting to see models with 10x or 20x fewer parameters that perform relatively the same? Be it for a model parameter versus a performance graph or simply for benchmarking, this is an essential fact to know.
Let's walk through a couple of examples to see how you can calculate the number of parameters in your PyTorch and TensorFlow models.
Table of Contents
DeepMind's Flamingo: Visual & Language Communication Combined
DeepMind recently released a combined visual and language model (a VLM) called Flamingo, capable of a variety of tasks taking text and image input simultaneously.
Meta AI Releases OPT-175B, Set Of Free-To-Use Pretrained Language Models
Meta AI announced a blog post today that they have released a new set of language models under the name "Open Pretrained Transformer". These models aim to replicate GPT-3 while being freely available for local use and training.
The Code
PyTorch
PyTorch doesn't have a utility function (at least at the moment!) to count the number of model parameters, but there is a property of the model class that you can use to get the model parameters.
Use the following snippet to get all the model parameters:
total_params = sum(param.numel() for param in model.parameters())
Let's quickly walk through this snippet:
- model.parameters(): PyTorch modules have a method called parameters() which returns an iterator over all the parameters.
- param.numel(): We use the Iterator object returned by the model.parameters() and calculate the number of elements in it using the .numel() function
- sum(...): We add up all the groups of parameters (a Module might contain submodules as layers)
NOTE: This snippet returns all the parameters in the Module. Both trainable and non-trainable. If you want just the trainable parameters then use the following snippet.
💡
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
The extra .requires_grad property of Tensor is used to determine if it's a trainable parameter. If the tensor has requires_grad set to true, then the autograd engine can modify this tensor i.e., it's "trainable".
Tensorflow
Tensorflow provides a utility function for calculating the number of parameters called count_params available in keras utils (keras.utils.layer_utils).
Use the following snippet to count all the trainable parameters and non-trainable parameters of your Tensorflow models:
from keras.utils.layer_utils import count_paramsmodel = ...trainable_params = sum(count_params(layer) for layer in model.trainable_weights)non_trainable_params = sum(count_params(layer) for layer in model.non_trainable_weights)
Now what do we do with this information, you ask? Well, with the help of Weights & Biases, you can log the number of parameters as a wandb.config parameter or as a summary to the W&B run to review and compare later on.
wandb.config.update({"Model Parameters": trainable_model_params})###################### OR #####################wandb.run.summary["Model Parameters"] = trainable_model_params
Summary
In this article, you saw how you can calculate the number of parameters for both TensorFlow and PyTorch models. To see the full suite of W&B features, please check out this short 5 minutes guide. If you want more reports covering the math and "from-scratch" code implementations, let us know in the comments down below or on our forum ✨!
Check out these other reports on Fully Connected covering other fundamental development topics like GPU Utilization and Saving Models.
Recommended Reading
Setting Up TensorFlow And PyTorch Using GPU On Docker
A short tutorial on setting up TensorFlow and PyTorch deep learning models on GPUs using Docker.
How to Compare Keras Optimizers in Tensorflow for Deep Learning
A short tutorial outlining how to compare Keras optimizers for your deep learning pipelines in Tensorflow, with a Colab to help you follow along.
Preventing The CUDA Out Of Memory Error In PyTorch
A short tutorial on how you can avoid the "RuntimeError: CUDA out of memory" error while using the PyTorch framework.
How to Initialize Weights in PyTorch
A short tutorial on how you can initialize weights in PyTorch with code and interactive visualizations.
Recurrent Neural Network Regularization With Keras
A short tutorial teaching how you can use regularization methods for Recurrent Neural Networks (RNNs) in Keras, with a Colab to help you follow along.
Tutorial: Regression and Classification on XGBoost
A short tutorial on how you can use XGBoost with code and interactive visualizations.
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.