Skip to main content

Federated Learning with Weights & Biases

This report details some ways W&B can be used to support Federated Learning.
Created on April 19|Last edited on January 22

What is Federated Learning and When is it Useful?

Federated learning is a machine learning technique where multiple clients collaboratively train a machine learning model, with each client training a local model on their own local dataset, and then sharing the weights with other clients to enable collaboration, typically over many rounds.
Federated learning is useful in scenarios where data may be stored in different locations and where it may be impractical (e.g., huge constantly changing datasets) or illegal (e.g., due to GDPR) to centralize the dataset for traditional machine learning.
For example, consider a scenario where a bank has operations across the world. Due to the GDPR, it may not be possible for the bank to transfer their EU data to the USA for training models, and the inverse may also be true. As we consider more and more countries the bank may operate in, it becomes clear that there exists a large number of siloed datasets that can not be centralised as each dataset can not leave the silo in which it lives. And yet: deep learning models benefit from larger datasets.
This is precisely where federated learning can be useful. In a federated learning setup, a model would be trained within each silo, and the weights of each model would be aggregated together. By doing so, each model is trained on data which never leaves the silo, but by aggregating the weights of the local model with other local models, a form of collaborative learning takes place.
TL;DR: Federated Learning permits learning from large distributed datasets in a privacy preserving manner where it is not possible due to technical or legal reasons to centralize the data for non-federated machine/deep learning.
💡

An example of a federated learning scenario. Green lines represent bi-directional communication.

How W&B Helps Federated Learning

While federated learning provides an approach to solve important problems where data is siloed but collaborative learning is beneficial to building better models, it has both the challenges from distributed computing and machine learning combined.

Challenge 1: Observability

Federated learning suffers from an observability problem. Machine learning models run on distributed and remote hardware with their own datasets, where access may be challenging if not impossible. Each federated learning experiment may involve anywhere from 2 to thousands if not hundreds of thousands of models.
How can the performance of thousands of models be effectively tracked, organized and analyzed? W&B provides a centralized system of record where each experiment running on each federated client is able to log relevant performance metrics back to W&B, where these can be interrogated individually or as part of the larger experiment.



Hardware heterogeneity brings another challenge for observability. In federated learning experiments, clients may be using different hardware. W&B system metric logging can help diagnose, debug and improve federated learning experiments to better utilize the heterogeneity of hardware in your federation.



Challenge 2: Huge Number of Artifacts per Experiment

W&B Artifacts keeps track of your datasets, models, dependencies, and results through each step of your machine learning pipeline, providing a complete and auditable history of changes to your files.
A regular machine learning experiment has a number of experimental artifacts that is helpful to track using W&B. For example, a single non-federated experiment may have a single dataset that is being used to train a single model and produce a single set of performance metrics. In a federated learning setting however, these three metrics multiply by the number of clients in the experiment. A simplified non-federated experiment may have 3 artifacts being tracked, the equivalent federated experiment with 1000 clients would have 3000 artifacts.
W&B provides different ways of storing your artifacts. If privacy and security matters then each client can connect to their own object stores for tracking artifacts (e.g., S3, GCP) as to remain in control of the data, or reference artifacts can be used. This prevents data from leaving your system entirely, works with local filesystems—plus, metadata about the artifacts can still be tracked, such as URLs, size, and checksums. Reference artifacts provides many of the benefits while maintaining privacy.
In the below artifact lineage view of the global federated model (specifically version 21 of it), change the style in the dropdown menu to "Complete". From this view we can see how it fits into the wider federated learning pipeline. We can see the federated learning server run (zany-dew-122) that produced it, and we can see the resulting clients (comic-bee-125, fresh-breeze-124) that trained with that server (connecting via the "connectivity" artifact called conn_details:v97), and each of their datasets (e.g., (client_dataset:v267) they trained on, and their resulting local models (e.g., client_model:v205)!

conn_details
Direct lineage view
Artifact - dataset
client_dataset:v73
Artifact - connectivity
conn_details:v24
Artifact - model
client_model:v42
Artifact - dataset
client_dataset:v75
Artifact - model
client_model:v44
Artifact - dataset
client_dataset:v74
Artifact - model
client_model:v43
Run - client
smart-star-21
Run - client
autumn-resonance-20
Run - client
soft-hill-19
Run - server
skilled-sweep-3



Challenge 3: Hyperparameter Optimisation

A regular machine learning experiment typically has a single model that is tuned via hyperparameter optimization. Following the trends above, it should be unsurprising that this is more challenging in federated learning given a single experiment will have many models to tune. Thankfully, however, W&B makes it possible to conduct hyperparameter sweeps easily in federated learning, where the W&B Sweep controller distributes the hyperparameters in each trial to the federated server, who in turn is able to communicate these to each client involved in training.
With W&B Sweeps it we can gain insights into parameter importance both globally and locally, with the ability to see how this varies across clients, a key value-add in federated learning where data is non-IID.





Challenge 4: Increased Manual Processes

As mentioned earlier in this report, federated learning inherits the challenges of two notoriously difficult fields, distributed computing and machine learning.
If we were to consider running a single federated learning experiment, this may involve starting a federated server on one node, and then spinning up many clients to take part in the training, each of which could be running on their own node. As the number of clients increases, so does the burden of coordinating training runs. While there are many ad hoc ways of approaching this problem, they tend to be hacks, unstable and difficult to maintain and scale.
W&B provides two tools to solve this problem. With W&B Launch we are able to spin up jobs on different infrastructure, automatically building the required environment and executing the training runs, all controllable from the W&B UI or command line interface. Further, with the use of W&B Automations, we are able to, for example, spin up each client across different infrastructure, automatically, whenever a federated learning server is launched. This reduces the manual burden of coordinating the launching of the servers and clients across potentially heterogeneous hardware.

This overview provides an example setup where W&B Launch can be used to deploy federated learning across different infrastructure effortlessly, and via W&B Automations, spin up clients to being training (across infrastructure) automatically.

Conclusion

Thanks for reading our report about federated learning and how W&B can help make it a whole lot easier. If you're interested in checking out W&B, it's always free to try. And feel free to peruse our docs if you're looking for a little more information about anything in this post.







Iterate on AI agents and models faster. Try Weights & Biases today.
artifact