Tracking CO2 Emissions of Your Deep Learning Models with CodeCarbon and Weights & Biases

AI can benefit society in many ways but, given the energy needed to support the computing behind AI, these benefits can come at a high environmental price. In this report, we showcase how to use codecarbon and W&B to track the CO2 emission of your computing resources.
Aman Arora
As most practitioners are well aware, training AI models comes with steep compute costs and that compute brings with it some not insignificant environmental concerns. Recently, I came across something that really helps put all this cost into perspective: the CodeCarbon project.
Essentially, CodeCarbon can help estimate and track carbon emissions from your compute, as well as quantify and analyze its impact. The aim of this report is simple: to track carbon emissions (in Kg) for every training epoch on W&B. And, as part of this report, we showcase a simple example on how to integrate CodeCarbon with Weights and Biases!

Introduction

When we talk about the carbon footprint of AI, what exactly are we talking about? Let's start with this, from the Quantifying the Carbon Emissions of Machine Learning paper:
While a decade ago, only a few ML pioneers were training neural networks on GPUs (Graphical Processing Units), in recent years powerful GPUs have become increasingly accessible and used by ML practitioners worldwide. Furthermore, new models often need to beat existing challenges, which entails training on more GPUs, with larger datasets, for a longer time. This expansion brings with it ever-growing costs in terms of the energy needed to fuel it. This trend has been the subject of recent studies aiming to evaluate the climate impact of AI, which have predominantly put the focus on the environmental cost of training large-scale models connected to grids powered by fossil fuels.
And, as per AI Computing Emits CO₂ report on medium,
Datacenters consume from 1% to 2% of all the energy generated each year around the world, with this amount increasing annually. Some tech companies boast that they are improving the efficiency of their datacenters. But with global compute instances rising by as much as 550% in the last ten years, the amount of energy it consumes — and the greenhouse gas emissions (GHG) it releases — will continue to grow.
One consequence of this increase in computing is the heavy environmental impact of training machine learning models. A recent research paper — Energy and Policy Considerations for Deep Learning in NLP — notes that an inefficiently trained NLP model using Neural Architecture Search can emit more than 626,000 pounds of CO₂. That's about five times the lifetime emissions of an average American car!
Before we dig in a bit more, one last point here: from an environmental standpoint, there are a few crucial aspects of training a neural network that have a major impact on the quantity of carbon that it emits. These factors include:
Let's look at CodeCarbon now, shall we?

Using CodeCarbon with W&B

This is the most straightforward usage of the package. Note that it's ideal to gather information regarding your geographical location for the best accuracy here.
from codecarbon import EmissionsTrackertracker = EmissionsTracker()tracker.start()# GPU Intensive code goes heretracker.stop()
It's really that simple!
As part of integration with W&B, we simply update the training loop that now looks like:
def train_fn(model, train_data_loader, optimizer, epoch): # create codecarbon tracker tracker = EmissionsTracker() tracker.start() model.train() fin_loss = 0.0 tk = tqdm(train_data_loader, desc="Epoch" + " [TRAIN] " + str(epoch + 1)) for t, data in enumerate(tk): data[0] = data[0].to(DEVICE) data[1] = data[1].to(DEVICE) optimizer.zero_grad() out = model(data[0]) loss = nn.CrossEntropyLoss()(out, data[1]) loss.backward() optimizer.step() fin_loss += loss.item() tk.set_postfix( { "loss": "%.6f" % float(fin_loss / (t + 1)), "LR": optimizer.param_groups[0]["lr"], } ) # get co2 emissions from tracker emissions = tracker.stop() return fin_loss / len(train_data_loader), optimizer.param_groups[0]["lr"], emissions
We simply just add CodeCarbon's Emission Tracker to track Co2 emissions for our training loop! Next, we can log all metrics to W&B to create the following dashboard:
def main(): # train and eval datasets train_dataset = torchvision.datasets.ImageFolder( Config["TRAIN_DATA_DIR"], transform=Config["TRAIN_AUG"] ) eval_dataset = torchvision.datasets.ImageFolder( Config["TEST_DATA_DIR"], transform=Config["TEST_AUG"] ) # train and eval dataloaders train_dataloader = torch.utils.data.DataLoader( train_dataset, batch_size=Config["BS"], shuffle=True, ) eval_dataloader = torch.utils.data.DataLoader( eval_dataset, batch_size=Config["BS"], ) # model model = timm.create_model(Config["MODEL"], pretrained=Config["PRETRAINED"]) model = model.cuda() # optimizer optimizer = torch.optim.Adam(model.parameters(), lr=Config["LR"]) tot_co2_emission = 0 for epoch in range(Config["EPOCHS"]): avg_loss_train, lr, co2_emission = train_fn( model, train_dataloader, optimizer, epoch ) tot_co2_emission += co2_emission avg_loss_eval = eval_fn(model, eval_dataloader, epoch) # log metrics and Co2 emission per epoch to W&B dashboard wandb.run.log({ "epoch": epoch, "learning rate": lr, "train loss": avg_loss_train, "evaluation loss": avg_loss_eval, "CO2 emission (in Kg)": co2_emission }) return tot_co2_emission

Call for Action!

The ability to track CO₂ emissions represents a significant step forward in data scientists’ ability to use energy resources wisely and, therefore, reduce the impact of their work on an increasingly fragile climate.
Knowledge is power, so now that you know how to measure your carbon footprint, how can you reduce it?

Conclusion

I hope I've been able to introduce deep learning researchers to CodeCarbon and also shown a way on how to integrate CodeCarbon's emission tracker with W&B to create an example that looks like below:
Figure-1: An exemplary example for relatable CO2 emission understanding
Also, as part of this report, I've shared code that can be found here so that you're able to reproduce all experiments shown in this report step-by-step and also create a carbon emission aware dashboard for your own experiments!
If you have any questions, please feel free to drop in a comment or reach out to me at @amaarora! For further reading on the CodeCarbon project, please refer to the references shared below.

References

Alexandre Lacoste, Alexandra Luccioni, Victor Schmidt, and Thomas Dandres. Quantifying the Carbon Emissions of Machine Learning. arXiv:1910.09700, 2019.
Emma Strubell, Ananya Ganesh, and Andrew McCallum. Energy and policy considerations for deep learning in nlp. arXiv preprint arXiv:1906.02243, 2019.
Roy Schwartz, Jesse Dodge, Noah A Smith, and Oren Etzioni. Green ai. arXiv preprint arXiv:1907.10597, 2019.
Simon Eggleston, Leandro Buendia, Kyoko Miwa, Todd Ngara, and Kiyoto Tanabe. 2006 IPCC guidelines for national greenhouse gas inventories, volume 5. Institute for Global Environmental Strategies Hayama, Japan, 2006.