Deep Learning on the M1 Pro with Apple Silicon
Let's take my new Macbook Pro for a spin and see how well it performs, shall we?. Made by Thomas Capelle using Weights & Biases
If you've got your new shiny Mac 🍎 with the awesome Apple silicon, you may be wondering "how exactly do I set up this machine to run python and do some deep learning experiments? If so, you're luck. We've got you covered. (As an aside: if you're thinking of purchasing a M1Pro or just want to know the results, jump to the Benchmarking section
In this report, we will show you:
✅ How to set up your new Macbook Pro for Python and a few common libraries
✅ How to install Tensorflow with a metal backend so you can get full GPU acceleration
✅ Some deep learning benchmarks on the M1Pro GPU! (Spoiler alert: things look really promising).
Alright, let's get going.
Python 🐍 set up on your new Mac
First things first: you need to open a terminal and install the Apple developer tools. To do so, just enter:
git clone some_repo_of_your_choice
This will trigger your setup. You don't need anything else after this, not brew, not macports, not anything!
Note: In a Mac, you don't want to use your system Python for anything else, as you don't want to mess with your system install. (Also, a very old Python is used by Apple systems. My brand new 14 inch Macbook Pro came with python 2.7.18!!)
Getting a brand new Python installed:
- The simplest way is to install miniconda3, but for some reason right now the ARM binary is not available 💩...
Option 2 is to use the fork from miniconda3 that uses conda-forge: miniforge
. I am personally using this now and I adore it ❤️.
(Additionally, a cool trick is to use 🐍 mamba
on top of your miniconda3/miniforge install to make everything even faster.)
Once you have your conda/mamba installation running, you will need to create an environment to work on. This is done with the command:
conda create --name=env_name "python<3.10" pandas numpy matplotlib jupyterlab
You can put whatever packages you need after the env_name separated by space. You can also pin specific versions using == or <,> , ideally put quotes around.
I recommend to look at Jeff Heaton youtube channel for more info and updates on the Apple silicon for ML. A lot of what I put in here comes straight from this video:
Install Tensorflow with Apple Metal backend 🤘
Since last year, installing python and ML frameworks has become simpler. You can follow official instructions on the Apple website
but I have also provided detailed instructions in this repo
Note: You need latest Monterrey (Mac OS 12)
Once you have conda/mamba installed, you will need to install TensorFlow. To make your life easier, you can get a conda environment file
in this repo
. And then you can run:
conda env create --file=tf_apple.yml
(As before, you will now need to activate the environment.)
Now you're ready to do some benchmarks.
Benchmarks of the M1Pro
Here, we will be looking at the performance of the M1 Pro with 16 GPU cores. I know some engineer on our team have ordered some top of the line, M1 Max machines and we'll update once we have one available! Also, if you haven't checked out the report last year from our co-founder CVP
exploring the M1 (non-Pro), it's a great starting point.
The training script is the same as the one used on M1 report
, and can be found here
For our benchmark, we used 3 4 different computers:
An entry level Macbook Air with a 7 core GPU: M1_7
A 14 inch MacBook Pro equipped with a 16 core GPU: M1Pro with 16GB of RAM.
NEW: A 16 inch MacBook Pro equipped with a 32 core GPU: M1Max with 64GB of RAM. (only for RestNet50 benchmarks)
A Linux workstation from Paperspace with 8 core CPU and a 16GB RTX 5000: RTX5000
NEW: A Linux workstation with a 16 core CPU and RTX 3090 and RTX 3080
NEW: The old king of deep learning, the GTX1080Ti
(You can also toggle a K80 basic google colab instance, but it's very slow.)
For our model, we trained a MobileNetV2 in two variants:
The "fine-tune model's head" runs, where we only fine tune the head of the model, with roughly 20k params.
The "full model train", where we trained a 2.5M param MobileNetV2.
In each set of plots, you have both runs.
We see that the energy used for our laptops is very low when compared to GPU machines. This is normal, of course, and we will need to benchmark against mobile GPU to get something meaningful in regards to energy used.
We can also look at samples/sec in the following graphs.
Note: On the first epoch, the NVIDIA card is slower, probably due to something happening with the cache, but afterwards it gets up to full speed. This only happens when we fine-tune the model's head.
For the fully trainable network, the difference is smaller, the RTX5000 is around 55% faster than the M1Pro. We get comparable speeds to an RTX2060m which is not bad, considering the RTX2060 uses 4 times more power.
(Note: Runs inside NVIDIA's NGC TensorFlow container are considerably faster than conda environment runs. You can select toggle them as they are tagged as ngc . These containers are tuned and optimized to run as fast as possible and are maintained by Google and NVIDIA. We currently don't have an equivalent for Apple.)
What happens if we replace the backbone for a Resnet50? Here we train the full model, for a total of 23.5M parameters. You see the 🟩 green power here, a laptop GPU is not comparison for the workstation cards.
The M1 Pro with 16 cores GPU is an upgrade to the M1 chip. It has double the GPU cores and more than double the memory bandwidth. You have access to tons of memory, as the memory is shared by the CPU and GPU, which is optimal for deep learning pipelines, as the tensors don't need to be moved from one device to another. Also, you can get a configuration with 64GB of ram, and it is actually the largest mobile GPU on the market right now by a fair margin.
As expected, the M1Pro Max it's twice as fast as the M1Pro (double the GPU core count). You get almost the performance of an RTX5000 on a low power laptop GPU.
The M1 will not replace your workstation video cards but it can provide compute to fine tune models on the go. I even ran some of the tests on battery power and didn't notice any performance impact. Past that, what's great about the M1 is that the computer stays warm-ish and silent compared to my XPS 15 which sounds like a jet engine ✈️🔊.
This may become a game changer.
The lead Pytorch developer Soumith Chintala dropped this bomb earlier!