Create Edge Machine Learning Experiments with the Edge Impulse Python SDK and W&B
In this tutorial, we show you how to use the Edge Impulse Python SDK to create Weights & Biases experiments to help optimize your models for edge deployment.
Created on April 19|Last edited on May 11
Comment
Introduction
Edge Impulse is an online platform to help engineers and developers build and optimize edge machine learning (ML) applications for embedded and internet of things (IoT) devices.
At its core, the Edge Impulse Studio walks users through the process of collecting data from their edge devices, optimizing features, training ML models, and deploying the end-to-end feature extraction and model pipeline to various hardware.
This post will show you how to use the Weights & Biases Python package with the Edge Impulse Python SDK to conduct experiments to measure the predicted RAM, flash, and inference times for your models. If you would like to learn more about the Edge Impulse Python SDK, check out our getting started guides here.
ML on the Edge
Most modern deep learning algorithms are quite complex. Achieving high accuracy in object detection, natural language processing, and other tasks often requires models with millions of parameters and server farms or GPU clusters for training. Even inference can tax the most powerful computers. As a result, many ML applications are built around web interfaces for inference: using an endpoint to send data and receive inference results.
Sometimes, however, you need to run inference locally on small, resource-constrained devices. For example, you may need instantaneous wake word detection on a smartphone or home assistant. Fall or heat exhaustion detection can occur immediately on wearable devices to notify emergency services. Alternatively, you may require simple object detection on a robotic platform that operates in environments without an Internet connection (e.g. the Amazon, deep space).

The SlateSafety BAND V2 uses edge ML to monitor for signs of heat exhaustion in firefighters. Image courtesy of SlateSafety.
Optimizing machine learning for small, resource-constrained devices requires a combination of the following:
- Problem scoping: many large ML models are developed for generic use. Model complexity can often be reduced if the problem scope is also reduced. For example, a simpler model can be developed for just fall detection instead of generic pose or motion classification.
- Model optimizations: models can be optimized through a variety of techniques, such as pruning and quantization.
- Software optimizations: precious hardware resources can often be wasted on language and library overhead. As a result, using lower-level languages (e.g. C++) and carefully choosing efficient libraries can help create efficient inference code.
- Hardware optimizations: some hardware architectures include features to assist with specific computations, such as floating point units and single instruction, multiple data (SIMD) parallel processing. ML applications can often make use of these features to decrease inference times.
Edge Impulse Python SDK
In addition to the online graphical interface in the Studio, everything in Edge Impulse can be scripted using a web API, which allows ML practitioners to construct entire training and deployment pipelines programmatically. To make this process easier, Edge Impulse maintains a Python SDK that wraps many of these web API features into single function calls.
At launch, the Python SDK supports two main functions: profile and deploy. These functions accept a model from one of the popular ML frameworks, including TensorFlow, TensorFlow Lite, and ONNX. Because you can convert most model files from one framework to another (e.g. PyTorch to ONNX), the SDK essentially supports most ML model formats.

The profile function estimates the RAM and flash (e.g. ROM) utilization along with the predicted computation time for performing inference with that model on a given hardware architecture (e.g. ARM Cortex-M4F running at 80 MHz). We will focus on using the profile function in this tutorial to demonstrate how you can create experiments to measure model size and inference time while you adjust hyperparameters.
The deploy function converts your model to embedded code. By default, this is a generated C++ library that allows you to run inference with TensorFlow Lite Micro (TFLM) on nearly any device, including microcontrollers. Additionally, the library uses macros to automatically select the appropriate optimizations for the processor you are targeting, such as ARM’s CMSIS-NN, Himax WE-I, etc. For deep learning accelerators that do not support TFLM, you can also convert a TF model directly into a compatible package, for example BrainChip AKD1000, Ethos-U55 microNPU, and Syntiant NDP.
Experiment With Model Profiles
The ability to profile a model can help with architecture development. In many IoT and low-power devices, you must select hardware that will meet your application, cost, and power requirements. Knowing if a model will run on a particular device prior to deployment can help save many hours of work (and frustration).
For example, if you want to measure bee activity in your beehive, you might create an object detection model that can run on low-power hardware to count the number of bees throughout the day. By profiling your model, you can figure out if the RAM, flash, and inference speed will meet your requirements.
To begin, we will install the wandb, tensorflow and edgeimpulse Python packages.
python -m pip install tensorflow==2.12.0 wandb edgeimpulse
From there, we can import the packages as follows:
from tensorflow import kerasimport wandbimport edgeimpulse as ei
You will need to obtain an API key from an Edge Impulse project. Log into edgeimpulse.com and create a new project. Open the project, navigate to Dashboard and click on the Keys tab to view your API keys. Double-click on the API key to highlight it, right-click, and select Copy.

In the following code, replace the ei.API_KEY value with your own API key.
# Settingsei.API_KEY = "ei_dae2..." # Change this to your Edge Impulse API keylabels = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]num_classes = len(labels)num_epochs = 5profile_device = 'cortex-m4f-80mhz' # Run ei.model.list_profile_devices() to see available devicesdeploy_filename = "my_model_cpp.zip"# Define experiment hyperparameters - sweep across number of nodesproject_name = "nodes-sweep"num_nodes_sweep = [8, 16, 32, 64, 128]
Log into your Weights & Biases account with the following:
# Log in to Weights & Biaseswandb.login()
# Load MNIST data(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()x_train = keras.utils.normalize(x_train, axis=1)x_test = keras.utils.normalize(x_test, axis=1)y_train = keras.utils.to_categorical(y_train, num_classes)y_test = keras.utils.to_categorical(y_test, num_classes)input_shape = x_train[0].shape
We then define our experiment. Here, we vary the number of nodes in our hidden layer to see how it affects accuracy, RAM, flash, and inference time. For each experiment, we create a new dense neural network in Keras, train it, and profile the model using the Edge Impulse Python SDK.
Note that we chose the cortex-m4f-80mhz as our target hardware device. You can see a list of target hardware devices for the profile function by running ei.model.list_profile_devices().
# Define experiment - Train and test model, log metricsdef do_experiment(num_nodes):# Create W&B projectrun = wandb.init(project=project_name,name=f"{num_nodes}-nodes")# Build the model (vary number of nodes in the hidden layer)model = keras.Sequential([keras.layers.Flatten(),keras.layers.Dense(num_nodes, activation='relu', input_shape=input_shape),keras.layers.Dense(num_classes, activation='softmax')])# Compile the modelmodel.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])# Train the modelmodel.fit(x_train,y_train,epochs=num_epochs)# Evaluate modeltest_loss, test_accuracy = model.evaluate(x_test, y_test)# Profile model on target devicetry:profile = ei.model.profile(model=model,device=profile_device)except Exception as e:print(f"Could not profile: {e}")# Log metricsif profile.success:print("Profiling successful. Logging.")wandb.log({'num_nodes': num_nodes,'test_loss': test_loss,'test_accuracy': test_accuracy,'profile_ram': profile.model.profile_info.float32.memory.tflite.ram,'profile_rom': profile.model.profile_info.float32.memory.tflite.rom,'inference_time_ms': profile.model.profile_info.float32.time_per_inference_ms})else:print(f"Profiling unsuccessful. Error: {job_resp.error}")# Close runwandb.finish()
Finally, we can run the experiment:
# Perform the experiments - check your dashboard in WandB!for num_nodes in num_nodes_sweep:do_experiment(num_nodes)
Head to your projects on wandb.ai and click on the nodes-sweep project. From here, you can visualize the results of your experiments with the myriad charts and graphs. For our simple example, here is a parallel coordinates plot that shows how the different hidden layer sizes affect the profile metrics.
Run set
6
Deploy Your Model
Once you are happy with the performance of your model, you can then deploy it to your target hardware. We will assume that 32 nodes in our hidden layer provided the best tradeoff of RAM, flash, inference time, and accuracy for our needs. To start, we will retrain the model:
# Build the modelmodel = keras.Sequential([keras.layers.Flatten(),keras.layers.Dense(32, activation='relu', input_shape=input_shape),keras.layers.Dense(num_classes, activation='softmax')])# Compile the modelmodel.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])# Train the modelmodel.fit(x_train,y_train,epochs=5)
It's usually a good idea to evaluate your model on a holdout test set prior to deployment.
# Evaluate model on test setscore = model.evaluate(x_test, y_test, verbose=0)print(f"Test loss: {score[0]}")print(f"Test accuracy: {score[1]}")
From there, you can deploy to a C++ library using the Edge Impulse Python SDK. Just like the target device for the profile function, we need to choose a deployment target. To view the list of available deployment targets, you can run the ei.model.list_deployment_targets() function in Python. We chose the zip target, which will download a C++ library containing our trained model.
# Set model information, such as your list of labelsmodel_output_type = ei.model.output_type.Classification(labels=labels)# Create C++ library with trained modeldeploy_bytes = Nonetry:deploy_bytes = ei.model.deploy(model=model,model_output_type=model_output_type,deploy_target='zip')except Exception as e:print(f"Could not deploy: {e}")# Write the downloaded raw bytes to a fileif deploy_bytes:with open(deploy_filename, 'wb') as f:f.write(deploy_bytes)
At this point, you should see a file named my_model_cpp.zip in the same directory you are running your Python code. You are now ready to use your model in your embedded application. You can learn more about using the Edge Impulse C++ library here.
Conclusion
Keep in mind that not all operations found in more complex models may be converted to C++ code. Simple dense neural networks (DNNs) and convolutional neural networks (CNNs) are safe bets. You may run into trouble trying to convert recurrent neural networks (RNNs) or transformers, but support for these more complex architectures should be coming soon.
We hope this helps you develop efficient model architectures to create amazing edge ML applications! You can find this and more Edge Impulse Python SDK examples here.
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.