Deep Learning on the M1 Pro with Apple Silicon

Let's take my new Macbook Pro for a spin and see how well it performs, shall we?. Made by Thomas Capelle using Weights & Biases
Thomas Capelle
If you've got your new shiny Mac 🍎 with the awesome Apple silicon, you may be wondering "how exactly do I set up this machine to run python and do some deep learning experiments? If so, you're luck. We've got you covered. (As an aside: if you're thinking of purchasing a M1Pro or just want to know the results, jump to the Benchmarking section down below!)
In this report, we will show you:
βœ… How to set up your new Macbook Pro for Python and a few common libraries
βœ… How to install Tensorflow with a metal backend so you can get full GPU acceleration
βœ… Some deep learning benchmarks on the M1Pro GPU! (Spoiler alert: things look really promising).
Alright, let's get going.

Python 🐍 set up on your new Mac

First things first: you need to open a terminal and install the Apple developer tools. To do so, just enter:
git clone some_repo_of_your_choice
This will trigger your setup. You don't need anything else after this, not brew, not macports, not anything!
Note: In a Mac, you don't want to use your system Python for anything else, as you don't want to mess with your system install. (Also, a very old Python is used by Apple systems. My brand new 14 inch Macbook Pro came with python 2.7.18!!)

Getting a brand new Python installed:

(Additionally, a cool trick is to use 🐍 mamba on top of your miniconda3/miniforge install to make everything even faster.)
Once you have your conda/mamba installation running, you will need to create an environment to work on. This is done with the command:
conda create --name=env_name "python<3.10" pandas numpy matplotlib jupyterlab
You can put whatever packages you need after the env_name separated by space. You can also pin specific versions using == or <,> , ideally put quotes around.

I recommend to look at Jeff Heaton youtube channel for more info and updates on the Apple silicon for ML. A lot of what I put in here comes straight from this video:

Install Tensorflow with Apple Metal backend 🀘

Since last year, installing python and ML frameworks has become simpler. You can follow official instructions on the Apple website but I have also provided detailed instructions in this repo.
Note: You need latest Monterrey (Mac OS 12)
Once you have conda/mamba installed, you will need to install TensorFlow. To make your life easier, you can get a conda environment file in this repo. And then you can run:
conda env create --file=tf_apple.yml
(As before, you will now need to activate the environment.)
Now you're ready to do some benchmarks.

Benchmarks of the M1Pro

Here, we will be looking at the performance of the M1 Pro with 16 GPU cores. I know some engineer on our team have ordered some top of the line, M1 Max machines and we'll update once we have one available! Also, if you haven't checked out the report last year from our co-founder CVP exploring the M1 (non-Pro), it's a great starting point.
The training script is the same as the one used on M1 report, and can be found here.

Methodology πŸ““

For our benchmark, we used 3 4 different computers:
(You can also toggle a K80 basic google colab instance, but it's very slow.)
For our model, we trained a MobileNetV2 in two variants:
In each set of plots, you have both runs.

Results

We see that the energy used for our laptops is very low when compared to GPU machines. This is normal, of course, and we will need to benchmark against mobile GPU to get something meaningful in regards to energy used.
We can also look at samples/sec in the following graphs.
Note: On the first epoch, the NVIDIA card is slower, probably due to something happening with the cache, but afterwards it gets up to full speed. This only happens when we fine-tune the model's head.
For the fully trainable network, the difference is smaller, the RTX5000 is around 55% faster than the M1Pro. We get comparable speeds to an RTX2060m which is not bad, considering the RTX2060 uses 4 times more power.
(Note: Runs inside NVIDIA's NGC TensorFlow container are considerably faster than conda environment runs. You can select toggle them as they are tagged as ngc . These containers are tuned and optimized to run as fast as possible and are maintained by Google and NVIDIA. We currently don't have an equivalent for Apple.)

Bonus: Resnet50

What happens if we replace the backbone for a Resnet50? Here we train the full model, for a total of 23.5M parameters. You see the 🟩 green power here, a laptop GPU is not comparison for the workstation cards.

Conclusions

The M1 Pro with 16 cores GPU is an upgrade to the M1 chip. It has double the GPU cores and more than double the memory bandwidth. You have access to tons of memory, as the memory is shared by the CPU and GPU, which is optimal for deep learning pipelines, as the tensors don't need to be moved from one device to another. Also, you can get a configuration with 64GB of ram, and it is actually the largest mobile GPU on the market right now by a fair margin.
As expected, the M1Pro Max it's twice as fast as the M1Pro (double the GPU core count). You get almost the performance of an RTX5000 on a low power laptop GPU.
The M1 will not replace your workstation video cards but it can provide compute to fine tune models on the go. I even ran some of the tests on battery power and didn't notice any performance impact. Past that, what's great about the M1 is that the computer stays warm-ish and silent compared to my XPS 15 which sounds like a jet engine βœˆοΈπŸ”Š.
This may become a game changer.

And PytorchπŸ”₯??

The lead Pytorch developer Soumith Chintala dropped this bomb earlier!