How Captur pioneers LiveAI computer vision at scale with the help of Weights & Biases

“For our model training, having all the metrics in one place in W&B and comparing runs easily is so useful. Making sure we can do fire-and-forget and see all the metrics coming in has been super helpful. We’re tracking all the loss functions, evaluation metrics like precision, recall, and of course GPU usage metrics as well.”
Philip Botros
ML Engineer

When a delivery driver drops off your lunch or someone parks a shared e-bike, there’s a crucial moment of truth: the verification photo. For years, businesses relied on retrospective batch processing to verify these images, leading to delayed feedback, customer disputes, and missed hazards. Captur is changing this with transformational AI that works in real-time, on any mobile device, providing instant verification that keeps businesses, drivers, and customers protected. Forget large, heavy compute cloud models of GenAI. Captur is pioneering the next generation of LiveAI – algorithms that can run compute, generate responses, and interact with users all on-device.

The world’s leading enterprises in last-mile delivery and micromobility depend on Captur’s LiveAI image verification to prevent fraudulent claims, ensure quality control, and avoid safety hazards. With their SDK deployed to iOS and Android devices across hundreds of cities worldwide, Captur processes thousands of verifications daily, helping businesses scale operations while maintaining strict compliance standards.

But developing cutting-edge tailored computer vision models to work on a mobile phone experience, in real-time, is no small feat. The ML team at Captur—led by Head of AI and CTO Sumanas Sarma—faced numerous challenges in setting up a training and deployment infrastructure that could support the speed and accuracy that their clients demanded. 

The challenges of developing computer vision models for mobile

Building AI that works instantly on any mobile phone presents three core challenges that Captur’s ML team had to solve:

For help managing and streamlining their complex ML workflow, and to help tackle their unique AI-for-mobile and computer vision challenges, the team turned to Weights & Biases.

  • The power problem: “Mobile chips are much weaker than cloud servers,’ explained ML Engineer Philip Botros. “While cloud-based AI can take seconds to process an image, our users need instant feedback.” The team solved this through sophisticated model compression techniques, ensuring their AI runs smoothly even on older devices.
  • The device diversity dilemma: From six-year-old Android phones to the latest iPhones, Captur’s AI needs to work flawlessly across thousands of device types. This requires extensive testing and optimization to ensure consistent performance regardless of the user’s device.
  • The size constraint: “We have to keep our models under 20MB,” noted MLOps Engineer Isao Makabe. “That’s tiny compared to typical AI models.” This size constraint ensures quick downloads and updates, even on slower mobile networks, while maintaining high accuracy.

In addition to mobile challenges, the team also faces some unique obstacles developing, training and deploying computer vision models. Image data can be inherently ambiguous, increasing the importance of having multiple labelers per sample as part of their critical data labeling processes. The team also has to deal with image quality issues, with some mobile cameras compressing images sent over an API, or clients compressing images they sent to the Captur team.

“Because we are analyzing a stream of images, basically a video, that adds a layer of complexity,” said Botros. “That makes evaluation quite hard because we’re not allowed to send the whole video back to our server, that’s a lot of MBs and the client doesn’t want to send a 20 mb every time they park a bike or complete a delivery. Understanding and visualizing what’s going on in the set of predictions can also be quite hard.”

To help overcome these challenges, the team has developed a sophisticated and impressive model lifecycle and workflow, with Weights & Biases at the center of it all.

The Captur model lifecycle, with W&B Registry as a hub

Each project starts with defining success criteria and metrics goals for what they hope to achieve with a specific model type; in this example, the team is trying to better identify when a bicycle is parked too close to tactile paving, which would be considered a pedestrian hazard and might lead to a fine. They’ve set a goal for a 15% Precision-Recall – Areas Under the Curve (PR-AUC) improvement in this model.

The process starts with the team gathering data, either through public datasets, proprietary data, or proprietary synthetic data pipelines through Stable Diffusion. A unique challenge the team faces is that as they get better at solving specific use cases, particularly around identifying hazards, the less data they get to train on, which is where synthetic data generated from Stable Diffusion becomes valuable.

The team stores images on Google Cloud Storage bucket, and sends them out for labeling, with a consensus labeling process that requires each image to be labeled by at least two labelers. Once the dataset is created in Google’s Dataflow, the team uses Tensorflow Extended for training, running the TFX pipeline on Google Cloud’s Vertex AI. All these steps, from dataset creation through training, evaluation, and deployment to the Edge, are registered and recorded as Weights & Biases Artifacts. Seamless integration between W&B and Google Cloud ensures the two platforms work well together.

“We track the full lineage of where the dataset came from, when training happened,” explained MLOps Engineer Isao Makabe. “Since we are deploying models to mobile devices, we do knowledge distillation and quantization as well. We then evaluate both teacher and student models, and log all results to Weights & Biases.”

W&B has become a critical component for Captur, not just in terms of serving as the system of record for all ML results and activities, but also with Registry becoming the hub for all models preparing for release and deployment.

“One of the biggest changes that Weights & Biases helped us make was having this concept of a release candidate,” said Sarma. “Prior to this, a model was either live – and by that I mean client-facing – or it wasn’t. If it was live getting client traffic, there was always a rush to try and figure out if it was behaving as expected, which also meant that there was a lot of risk aversion when releasing new models.”

“We’ve changed that massively with the help of Weights & Biases by having this real release candidate concept. As soon as someone on the ML side signs off on the model, Product reviews it to confirm it meets their requirements. Then customer success can communicate these validated improvements to our clients. It also gives our clients the opportunity to test the models before it reaches their customers and end-users too”

The team has set up automated model evaluations, which fetches the production model and latest training set from the Registry. The automation compares the model to the latest training set, and surfaces precise changes to the team via Slack. They also have automations for model deployment; by adding a particular alias inside Registry, they can designate models as release candidates and deploy to production.

“Since we have a lot of models and we train a lot of models every week, we want to make the cycle as quick as possible,” said Botros. “We can just do a one-click release candidate deployment; if that happens, we automatically trigger the benchmark which gives us the results back. If we’re happy, we just go to prod with one click, and we can also roll back with one click if something isn’t as we want it to be.”

In addition to relying on W&B Registry to delineate release candidates and coordinate deployments to production, Weights & Biases has been instrumental to the team in a few other core areas.

“For our model training, having all the metrics in one place in W&B and comparing runs easily is so useful,” said Botros. “Making sure we can do fire-and-forget and see all the metrics coming in has been super helpful. We’re tracking all the loss functions, evaluations like precision, recall, curves, and of course GPU usage metrics as well.”

The team also relies on the W&B + Slack integration, to notify all stakeholders whenever a new dataset has been created, with metrics about the dataset included. And finally, for Evaluations, the team has found it useful to have all images and predictions logged in an easy-to-analyze format in W&B. With useful tables and visualizations for analyzing predictions that miss, the team can quickly pivot to prioritizing a new set of labeling or different types of training.

Looking ahead, Captur continues to push the boundaries of what’s possible with mobile AI. The implications extend beyond parking and delivery verification. As mobile devices become more powerful and AI models more efficient, Captur’s innovations in real-time, on-device AI are paving the way for a new generation of intelligent mobile applications that can process, understand, and respond to the physical world instantly.