Skip to main content

True TinyML with Weights & Biases - Wake Word Detection

W&B as Tiny ML tooling
Created on January 9|Last edited on July 11


Abstract

The world of tiny ML has its foundations in many ubiquitous applications such as wake word detection (Alexa, ok google, Siri) however this has recently taken off in the open source community with democratically accessible devices such as the raspberry pi pico and other such microcontrollers. True TinyML applications run on microcontrollers with only several hundred kilobytes of ram, no operating system, and can perform continuous inference on extremely low amounts of power. This is small for three reasons:
  1. Tiny models that have Micro Watts of power for inference;
  2. Devices have Limited RAM;
  3. Devices are Physically tiny.
Tasks such as language comprehension, and face recognition that are inherently more complex are not suitable to use cases, tiny ML is in the realm of switch-type tasks that initiate more complex, sophisticated, and resource-intensive tasks, where constant listening/real-time inference takes place. In this webinar, we explain and demonstrate how to train and deploy a model from scratch using Tensorflow Light, evaluate and tune a tiny ml model using W&B in a step-by-step easy to follow and reproducible example.
We'll cover step by step:
  1. Data storage, acquisition, and versioning using Artifacts;
  2. Pre-processing and data visualization using wandb Tables;
  3. Hyperparameter tuning using Sweeps;
  4. Selecting and storing models using Model Registry;
  5. Deployment/Inference monitoring.

Device

Here we will use transfer learning from a model trained on 50 classes of sound to train wake-word detector for 'yes' and 'no' using voice. We have provided instructions of how to capture sound using the Spark Fun Edge Micro Mod with A Rasberry pi Pico (Arm Cortex Micro Controller/Silicon). This device has a built-in microphone and also has usb connectivity.

How small is Tiny?

Enter the Micro Controller

Presently there are a growing number of edge applications such that are typically similar with devices such as Nvidia Jetson, or mobile phones.
Weights and Biases can be installed on any Edge device that can have an operating system.

Setup

Clone this github repo to get the files and dependencies to recreate this project.
The sound_classifier notebook can be run locally, or in colab.
You'll need to create a new venv, clone the git repository and inside your python virtual env run:
cd tiny-ml
pip install -r requirements.txt
If installing Tensorflow on mac with apple silicon aka M1 or M2, you can use the following to install tensorflow :
pip install tensorflow-metal
You'll also need tfio which can be install with the following:
git clone https://github.com/tensorflow/io.git
cd io
python3 setup.py -q bdist_wheel
python3 -m pip install --no-deps dist/tensorflow_io-0.30.0-cp310-cp310-macosx_12_0_arm64.whl
Note the .whl file may change and and your bash command should also
We also recommend using pyenv or some other virtual environment manager to manage your python environment.

Dataset

Data recording of word and versioning

Recording data to wandb

We make it super easy to log and version data syncing this with what you have stored on your device. This makes the process super explainable, efficient, and crucially easy to manage as a team.

Data Versioning & Debugging

Here we have live recordings so the sound data that is recorded by our script from the Spark Fun Edge Micro mod is saved straight to our wandb Project.
We thought that all was good, but in reviewing the data we realized that the classes got mixed up because of a bug in our code as we changed the code several times, and well python polymorphism can sometimes get the better of even the best of us. In realizing that some of our device that some of our categories where mixed up that this was an issue with how we were logging we named the files and structured the directory appropriately so this was super easy to correct.
There are many different ways to structure the artifact, but here we use the type and name to distinguish between the class and the data that we have recorded. This makes it easy to correct and also captures the fact that we made a mistake mixing up the yes, and no with the background (an easy mistake to make in your code). Rather that start again, we can simply update the versions.
We're working to the following schema:
data
├── background
│ ├── background_record 1.wav
│ └── background_record 9.wav
├── no
│ ├── no_record 1.wav
│ └── no_record 9.wav
└── yes
├── yes_record 1.wav
└── yes_record 93.wav
See here an example of how we corrected our mistake with absolutely no data loss using our compare artifact version, you (or anyone else in your team) can easily check and see that on the left no got included in the background noise class and how we corrected this in later versions by simply logging the file names to the same schema.

background_record 1.wav
32.8KB
background_record 10.wav
32.8KB
background_record 11.wav
32.8KB
background_record 12.wav
32.8KB
background_record 13.wav
32.8KB
background_record 14.wav
32.8KB
background_record 15.wav
32.8KB
background_record 16.wav
32.8KB
background_record 17.wav
32.8KB
background_record 18.wav
32.8KB
background_record 19.wav
32.8KB
background_record 2.wav
32.8KB
background_record 20.wav
32.8KB
background_record 21.wav
32.8KB
background_record 22.wav
32.8KB
background_record 23.wav
32.8KB
background_record 24.wav
32.8KB
background_record 25.wav
32.8KB
background_record 26.wav
32.8KB
background_record 27.wav
32.8KB
background_record 28.wav
32.8KB
background_record 29.wav
32.8KB
background_record 3.wav
32.8KB
background_record 30.wav
32.8KB
background_record 31.wav
32.8KB
no_record 1.wav
32.8KB
no_record 10.wav
32.8KB
no_record 11.wav
32.8KB
no_record 12.wav
32.8KB
no_record 13.wav
32.8KB
no_record 14.wav
32.8KB
no_record 15.wav
32.8KB
no_record 16.wav
32.8KB
no_record 17.wav
32.8KB
no_record 18.wav
32.8KB
no_record 19.wav
32.8KB
no_record 2.wav
32.8KB
no_record 20.wav
32.8KB
no_record 21.wav
32.8KB
no_record 22.wav
32.8KB
no_record 23.wav
32.8KB
no_record 24.wav
32.8KB
no_record 25.wav
32.8KB
no_record 26.wav
32.8KB
no_record 27.wav
32.8KB
no_record 28.wav
32.8KB
no_record 29.wav
32.8KB
no_record 3.wav
32.8KB
no_record 30.wav
32.8KB
no_record 31.wav
32.8KB
1-2 of 2
You can play the below audio files to listen to the Recorded sample for "Yes", "No" and "Background" classes.

Wake Words
1


Training Data for model

The Training Data is logged to W&B as a Table that allows to log and visualize rich media files like Audio files, Spectrogram in this case. Here's what the Training Data looks like

Run set
190



Training 🏃🏻‍♀️

Training Metrics


Run set
258


Validation Metrics


Run set
258


Training Weights & Biases Sweep 🧹

Follow this link to learn more about Sweeps.

Run set
253


Pull Models using W&B API

We have logged our checkpoints as a W&B Artifact and also linked it to Model Registry.
This model (here the latest version) can be downloaded and then compiled with W&B's public API using the below code snippet:
api = wandb.Api()
artifact = api.artifact(f'tiny-ml/wake_word_detection/run_{run.id}_model:latest', type='model')
print(artifact.digest,artifact.aliases)
file = artifact.download()
baseline_model.load_weights(f'artifacts/run_ex78039g_model:latest/cp.ckpt')
baseline_model.evaluate(test_ds, verbose=2)
The output will look something like this:
36/36 - 0s - loss: 2.4910 - accuracy: 0.3246 - 86ms/epoch - 2ms/step
[2.4910478591918945, 0.32463011145591736]




Join us and Our ML Community

Thank you for joining us for the Tiny ML Webinar.
Here are the links to our community forum and some sample getting Started tutorials
Community Forum - wandb.me/and-you
Fully Connected - wandb.me/fc
YouTube - wandb.me/youtube
Twitter - wandb.me/twitter

List<Dir>