Skip to main content

How To Calculate Number of Model Parameters for PyTorch and TensorFlow Models

This article provides a short tutorial on calculating the number of parameters for TensorFlow and PyTorch deep learning models, with examples for you to follow.
Created on May 24|Last edited on July 9
We live in the age of readily accessible large models. Anyone can create a Kaggle Kernel with a pre-trained Deberta v3 model and fine-tune it on any arbitrary dataset. What many people don't realize is that they are using a 75-100 M parameter model, which was pre-trained on >100GB of training data.
Sure, over-parameterization might lead to better performance, but it's also coupled with increased storage sizes and, as a consequence, large inference times. Therefore you might want to log the number of parameters your model has.
Wouldn't it be interesting to see models with 10x or 20x fewer parameters that perform relatively the same? Be it for a model parameter versus a performance graph or simply for benchmarking, this is an essential fact to know.
Let's walk through a couple of examples to see how you can calculate the number of parameters in your PyTorch and TensorFlow models.

Table of Contents






The Code

PyTorch

PyTorch doesn't have a utility function (at least at the moment!) to count the number of model parameters, but there is a property of the model class that you can use to get the model parameters.
Use the following snippet to get all the model parameters:
total_params = sum(
param.numel() for param in model.parameters()
)
Let's quickly walk through this snippet:
  • model.parameters(): PyTorch modules have a method called parameters() which returns an iterator over all the parameters.
  • param.numel(): We use the Iterator object returned by the model.parameters() and calculate the number of elements in it using the .numel() function
  • sum(...): We add up all the groups of parameters (a Module might contain submodules as layers)
NOTE: This snippet returns all the parameters in the Module. Both trainable and non-trainable. If you want just the trainable parameters then use the following snippet.
💡
trainable_params = sum(
p.numel() for p in model.parameters() if p.requires_grad
)
The extra .requires_grad property of Tensor is used to determine if it's a trainable parameter. If the tensor has requires_grad set to true, then the autograd engine can modify this tensor i.e., it's "trainable".

Tensorflow

Tensorflow provides a utility function for calculating the number of parameters called count_params available in keras utils (keras.utils.layer_utils).
Use the following snippet to count all the trainable parameters and non-trainable parameters of your Tensorflow models:
from keras.utils.layer_utils import count_params

model = ...

trainable_params = sum(count_params(layer) for layer in model.trainable_weights)
non_trainable_params = sum(count_params(layer) for layer in model.non_trainable_weights)


Now what do we do with this information, you ask? Well, with the help of Weights & Biases, you can log the number of parameters as a wandb.config parameter or as a summary to the W&B run to review and compare later on.
wandb.config.update({"Model Parameters": trainable_model_params})
###################### OR #####################
wandb.run.summary["Model Parameters"] = trainable_model_params

Summary

In this article, you saw how you can calculate the number of parameters for both TensorFlow and PyTorch models. To see the full suite of W&B features, please check out this short 5 minutes guide. If you want more reports covering the math and "from-scratch" code implementations, let us know in the comments down below or on our forum ✨!
Check out these other reports on Fully Connected covering other fundamental development topics like GPU Utilization and Saving Models.

Iterate on AI agents and models faster. Try Weights & Biases today.