Skip to main content

Deep Learning on the M1 Pro with Apple Silicon

Let's take my new Macbook Pro for a spin and see how well it performs, shall we?
Created on November 19|Last edited on February 23
If you've got your new shiny Mac 🍎 with the awesome Apple silicon, you may be wondering "how exactly do I set up this machine to run python and do some deep learning experiments? If so, you're luck. We've got you covered. (As an aside: if you're thinking of purchasing a M1Pro or just want to know the results, jump to the Benchmarking section down below!)
In this report, we will show you:
✅ How to set up your new Macbook Pro for Python and a few common libraries
✅ How to install Tensorflow with a metal backend so you can get full GPU acceleration
✅ Some deep learning benchmarks on the M1Pro GPU! (Spoiler alert: things look really promising).
Alright, let's get going.

Python 🐍 set up on your new Mac

First things first: you need to open a terminal and install the Apple developer tools. To do so, just enter:
git clone some_repo_of_your_choice
This will trigger your setup. You don't need anything else after this, not brew, not macports, not anything!
Note: In a Mac, you don't want to use your system Python for anything else, as you don't want to mess with your system install. (Also, a very old Python is used by Apple systems. My brand new 14 inch Macbook Pro came with python 2.7.18!!)
💡

Getting a brand new Python installed:

  • The simplest way is to install miniconda3, but for some reason right now the ARM binary is not available 💩...
  • Option 2 is to use the fork from miniconda3 that uses conda-forge: miniforge. I am personally using this now and I adore it ❤️.
(Additionally, a cool trick is to use 🐍 mamba on top of your miniconda3/miniforge install to make everything even faster.)
Once you have your conda/mamba installation running, you will need to create an environment to work on. This is done with the command:
conda create --name=env_name "python<3.10" pandas numpy matplotlib jupyterlab
You can put whatever packages you need after the env_name separated by space. You can also pin specific versions using == or <,> , ideally put quotes around.


I recommend to look at Jeff Heaton youtube channel for more info and updates on the Apple silicon for ML. A lot of what I put in here comes straight from this video:


Install Tensorflow with Apple Metal backend 🤘

Since last year, installing python and ML frameworks has become simpler. You can follow official instructions on the Apple website but I have also provided detailed instructions in this repo.
Note: You need latest Monterrey (Mac OS 12)
💡
Once you have conda/mamba installed, you will need to install TensorFlow. To make your life easier, you can get a conda environment file in this repo. And then you can run:
conda env create --file=tf_apple.yml
(As before, you will now need to activate the environment.)
Now you're ready to do some benchmarks.

Benchmarks of the M1Pro

Here, we will be looking at the performance of the M1 Pro with 16 GPU cores. I know some engineer on our team have ordered some top of the line, M1 Max machines and we'll update once we have one available! Also, if you haven't checked out the report last year from our co-founder CVP exploring the M1 (non-Pro), it's a great starting point.
The training script is the same as the one used on M1 report, and can be found here.

Methodology 📓

For our benchmark, we used 3 4 different computers:
  • An entry level Macbook Air with a 7 core GPU: M1_7
  • A 14 inch MacBook Pro equipped with a 16 core GPU: M1Pro with 16GB of RAM.
  • NEW: A 16 inch MacBook Pro equipped with a 32 core GPU: M1Max with 64GB of RAM. (only for RestNet50 benchmarks)
  • A Linux workstation from Paperspace with 8 core CPU and a 16GB RTX 5000: RTX5000
  • NEW: A Linux workstation with a 16 core CPU and RTX 3090 and RTX 3080
  • NEW: The old king of deep learning, the GTX1080Ti
(You can also toggle a K80 basic google colab instance, but it's very slow.)
For our model, we trained a MobileNetV2 in two variants:
  • The "fine-tune model's head" runs, where we only fine tune the head of the model, with roughly 20k params.
  • The "full model train", where we trained a 2.5M param MobileNetV2.
In each set of plots, you have both runs.

Results

We see that the energy used for our laptops is very low when compared to GPU machines. This is normal, of course, and we will need to benchmark against mobile GPU to get something meaningful in regards to energy used.

all runs
74

We can also look at samples/sec in the following graphs.
Note: On the first epoch, the NVIDIA card is slower, probably due to something happening with the cache, but afterwards it gets up to full speed. This only happens when we fine-tune the model's head.
💡
For the fully trainable network, the difference is smaller, the RTX5000 is around 55% faster than the M1Pro. We get comparable speeds to an RTX2060m which is not bad, considering the RTX2060 uses 4 times more power.
(Note: Runs inside NVIDIA's NGC TensorFlow container are considerably faster than conda environment runs. You can select toggle them as they are tagged as ngc . These containers are tuned and optimized to run as fast as possible and are maintained by Google and NVIDIA. We currently don't have an equivalent for Apple.)

fine-tune model's head
19
train full model
10


Bonus: Resnet50

What happens if we replace the backbone for a Resnet50? Here we train the full model, for a total of 23.5M parameters. You see the 🟩 green power here, a laptop GPU is not comparison for the workstation cards.

train full model
18


Conclusions

The M1 Pro with 16 cores GPU is an upgrade to the M1 chip. It has double the GPU cores and more than double the memory bandwidth. You have access to tons of memory, as the memory is shared by the CPU and GPU, which is optimal for deep learning pipelines, as the tensors don't need to be moved from one device to another. Also, you can get a configuration with 64GB of ram, and it is actually the largest mobile GPU on the market right now by a fair margin.
As expected, the M1Pro Max it's twice as fast as the M1Pro (double the GPU core count). You get almost the performance of an RTX5000 on a low power laptop GPU.
The M1 will not replace your workstation video cards but it can provide compute to fine tune models on the go. I even ran some of the tests on battery power and didn't notice any performance impact. Past that, what's great about the M1 is that the computer stays warm-ish and silent compared to my XPS 15 which sounds like a jet engine ✈️🔊.
This may become a game changer.

And Pytorch🔥??

The lead Pytorch developer Soumith Chintala dropped this bomb earlier!
We now have PyTorch running natively on Mac!

pa antya
pa antya •  
https://pytorch.org/blog/introducing-accelerated-pytorch-training-on-mac/
Reply
R J
R J •  *
Hi, firstly I just wanted to say thank you for this post. I am new to training models, primarily as I have not ever had the hardware, but this may no longer be the case as I have acquired an M1 Max 64GB. I ran keras_cvp.py at the link provided and got a samples_per_s value of 911, noting that the script is configured to use ResNet50. I'd love to know how to run the MobileNetV2 variants. Assuming that my results are correct, they appear to be significantly improved when compared to M1_max in your benchmarks. What do you suppose the difference/s may be - me doing something wrong, 64GB, further optimisations in the latest packages, something else?
1 reply
Chris Van Pelt
Chris Van Pelt •  
This is such a great followup to the original post. Can't wait to see the M1 Max results with 64GB of RAM!
1 reply
Iterate on AI agents and models faster. Try Weights & Biases today.