Nanit's Nimrod Shabtay on deployment and monitoring

A look at how Nimrod and the team at Nanit are building smart baby monitor systems, from data collection to model deployment and production monitoring.
Angelica Pan

Listen on these platforms

Apple Podcasts Spotify Google Podcasts YouTube Soundcloud

Guest Bio

Nimrod Shabtay is a Senior Computer Vision Algorithm Developer at Nanit, a New York-based company that's developing better baby monitoring devices.

Connect with Nimrod

Show Notes

Topics Covered

0:00 Sneak peek, intro
0:50 The story and models behind Nanit
8:23 Deploying and evaluating models
14:15 The importance of good data collection
17:25 Production monitoring and preparing to deploy
22:48 On new ideas and research avenues
25:27 Insights into baby sleep
30:46 Building good processes for model deployment

Links Discussed

Transcript

Note: Transcriptions are provided by a third-party service, and may contain some inaccuracies. Please submit any corrections to angelica@wandb.com. Thank you!
Nimrod :
The focus, as I see it in the industry has shifted from sometimes, making the models into making them work well in the real world, and be able to be flexible enough and adapt changes. I guess, I can say that, many times, maintaining the model and make it good and reliable out there is sometimes much harder than actually developing it.
Lukas :
You're listening to Gradient Dissent, a show about machine learning in the real world. And I'm your host, Lukas Biewald. Nimrod is a senior computer vision algorithms developer at Nanit and the father of two children. Nanit develops smart baby monitoring systems, and it's a product that I happen to use every day. So, I'm extra excited to talk to him. Nimrod, I'm super excited to talk to you about the article you wrote on ML in production, but I'd say I'm especially excited to talk to you because you make maybe the app that I use the most these days, the Nanit app. My daughter actually turned one today, and we've been using it for the last year. Basically, every morning, my mother-in-law and my wife discuss the stats from the previous night's sleep. I really, really love your app, I could say that honestly, and I was proud to discover that you are customers of Weights & Biases. But I was wondering if you could start by maybe talking about what your app does and what the history of the company is, and how you think about that.
Nimrod :
Yeah, sure. So, first, I'm happy to be here. The whole company started by an idea of a staff, one of the founders, that actually wanted to monitor his son's sleep during the night. Since he came from the whole world of processes and monitoring using cameras, and he wanted to take that to his son, and it started as a project when he was at Cornell University and everything just rolled from there, actually. And since we have a camera and he is from the field of computer vision, we started the camera, and we started doing the smart baby monitor using computer vision algorithms that can track sleep, also, the breathing motion, and then, let you celebrate the milestones of your baby. For example, sleeping, falling asleep first time on his own, and sleeping through the night without any visits from the parents, which is great for us, the parents, of course. And they're giving you a specific sleep tips in order to improve your baby's sleep. Actually, the key, or I can say what guides the companies is what value can we extract from visual data that the camera collects. So, it's kind of obvious on sleep, and of course, on breathing for young babies. But also, this is the guidelines that guide us for the next products and features, how to give value in terms of health and wellness to our customers. And it's also really unique since also, this product has two hats basically. We can have the hat of a consumer of electronic product as you use it, and it's also for research tool, which started to being used more and more recently. Researchers are doing the home sleep research. So, it's pretty cool that science and technology are working together and we get to deliver a really interesting product.
Lukas :
That is really cool. And I think folks who are listening to this who haven't had children yet might not realize how essential sleep is for your sanity as a parent, and also, how important sleep is for the sanity of your child.
Nimrod :
Oh, for everyone, yeah.
Lukas :
I think we thought much more about sleep in the last year than I ever thought about before.
Nimrod :
One of the key advantages of the product is, as parents, you get up at night for your children, and you're drowsy, and you don't remember exactly, did I get up two times, it was at 3:00 AM, maybe it was 5:00, I don't remember. And Nanit just collects you the data and serves to you clearly in order to make useful summary of the night, and you can also make data-driven decisions, if you want, and not by beliefs, because this whole field of baby sleep is full with beliefs. Some say that this method works better than the other. And here, you get the facts, you get the data. The baby slept well, the baby slept better, the baby didn't slept that good this night. And we also see that, since parents are more focusing on the baby's sleep, also, babies with Nanit sleeps better, they sleep longer, their sleep quality is better. Because everyone is in this process and they're focusing. So, it's really amazing, I must say.
Lukas :
That's really amazing. How do you know that babies that use Nanit sleep better?
Nimrod :
We have a large user base, and we often send servers to our customers, and they actually respond to that. And we see in the statistics and what they're telling that just babies with Nanit can sleep better because you're more aware of that. The tips are useful. So, you're in a mindset of improving and how sleep is important, I guess that's-
Lukas :
Oh, that's very cool. Can you break down what the... You know, this is supposed to be an ML podcast, parenting has been coming up an awful lot lately, I think we've been talking to, can you break down the pieces that are kind of ML problems, or computer vision problems that you need to solve to make the app work?
Nimrod :
Yeah. We use all sorts of computer vision algorithm in order to get a good understanding of the scene. I mean, in order to know... For example, when the baby is falling asleep on his own, and whether a parent comes to visit or not, all those are actually computer vision problems that we need to solve. And we actually serve multiple models during the night in order to get the whole scene understanding. On top of that, we take those outputs from the models and serve you the data much more clearly, so it's been a lot going during the night.
Lukas :
Do you run the models on the phone, or do you run them in the cloud? How does that work?
Nimrod :
Mostly in the cloud. We do have some algorithms that run on the camera as well. But mostly, on the cloud.
Lukas :
Can you give me some sense of what the scale of this is, like how much data your models are handling, or how many streams of video you get in a typical night?
Nimrod :
Yeah. Let's take a short example. We have more than 100,000 users and we have full nights, which basically means that if we serve, for example, every 10 minutes or so, we're getting to a few tens of millions of calls per models per night. It's a nice scale. I mean, we get to serve over tens of millions of requests per night to all our users.
Lukas :
And these are pretty sensitive models. I've noticed that you've never gone down. I mean, at least, in my experience, like it seems like you do a really good job with reliability, and I would think you'd have maybe a higher reliability bar than some other applications of folks we've talked to.
Nimrod :
Yeah. Well, you're right. Since babies are actually the most important things to the parents, we try to be reliable as possible in terms of robustness of the models and accuracy of the models. And also, in terms of runtime and to reduce downtime as much as possible, because, again, everyone expects our algorithms to work all the time and give them the data, especially when it comes to babies. So, we're putting a lot of effort in that as well.
Lukas :
And I guess the sleeping model's important, but the one that seems must be kind of anxiety-producing... I mean, just talking about it, it's giving me anxiety, but the breathing motion monitoring, is that also an ML model that checks for that?
Nimrod :
Well, we use multiple models there. There are some models that are more of machine learning, deep learning base, and there are some computer vision classics, models as well, all sorts of models.
Lukas :
Why do you use multiple models for a single application?
Nimrod :
Well, we have many tasks that we need to solve in order to get this product to be reliable and robust enough, especially when we're talking about breathing motion.
Lukas :
So, I guess when you look at handling like millions of requests per night, I guess, what are some things that you do to make sure that this is reliable and make sure that your compute spend is the same, how do you think about like model architecture and how do you deploy your models and what frameworks and tools do you use?
Nimrod :
That's pretty interesting. At our team, we actually responsible for the whole flow, end to end. I mean, from developing and defining the task, all the research, selecting the model architecture, even conducting proof of concept many times. We'll probably elaborate on that later because I think it's really important nowadays for practitioners in the industry. Also, the whole training process, of course, where you come in the picture with some great tools, helping us find which models and experiments are better. Evaluating, which is actually pretty interesting because we try to conduct an evaluation metrics that also holds the product objectives inside as well, because we're not building models in vacuum, we're all tied up to a product and a value to give to our customers. It's not always that straightforward. And until deploying to production, including building monitoring systems which should be our eyes out there eventually and random optimization, as you said, not to spend so much on compute. It's pretty complicated flow, but over the last few projects, we actually formed a nice formula for it, which I posted on medium blockbusters as a guidelines, which proven to be successful in the past few times. It's actually in the trend, at least, as I see it now. I mean, every time I read on Twitter or LinkedIn or whatever about people that are talking how to maintain and deploy and make good models in production, because there isn't any silver bullet there, and there are companies that always trying to solve the whole pipeline, some part of it, so it's pretty interesting. I mean, the focus, as I see it in industry, is shifted from sometimes making the models into make them work well in the real world, and be able to be flexible enough and adapt changes. So that's... I guess, I can say that many times, maintaining the model and make it good and reliable out there is sometimes much harder than actually developing it. Which is kind of amazing, if you think of it. I guess that wasn't exactly the focus few years ago, but kind of like get there.
Lukas :
Tell me some stories about some stuff that you've run into, and if you could tell me specifically, like maybe pick a model and what it does and what were the issues that you ran into in the process of getting it deployed and running.
Nimrod :
Yeah. We can take object detectors, as example, we use them, of course, in our product. And-
Lukas :
And in this case, an object detector would be like a baby detector or like a parent detector, is that fair?
Nimrod :
For example, yeah, it can be... Let's say for example, yeah, a baby detector. So, when you take a baby detector, and you actually want to start building it, you must be aware of, for example, the evaluation on how you're going to be performed. I mean, that's a common pitfall. I mean, choosing the right evaluation metrics is pretty tricky, and I know that I can say for myself, I have to recover from some bad decisions, and it's actually how you look on the model and...
Lukas :
If you could break that down, I mean, so what would be a bad evaluation metric from a baby detector? Because I can think like, probably some people are listening to this and thinking like, okay, accuracy sounds a pretty good metric, but what would be a metric that might lead you astray with the baby detection model?
Nimrod :
Okay. Let's take just a tiny example about it, and let's say we have a baby detector and its accuracy, let's say, it's pretty good, but, eventually, in the product, we care more about the false positives than the false negative, for example. Okay. And how you look on the evaluation metrics can really affect that. So, if I will give a little bit more weight to the false positive, we saw, for example, a decrease in accuracy on some metrics that actually average everything at once, but eventually, this is the right metric and we get a much higher performance. Or also, the other way around. I mean, we have a model that has very high accuracy, but eventually, since the product was aimed to try to decrease false positives, the product or metric was way lower. It's really how you look at it. And that's the tricky part, I think.
Lukas :
I guess what metric then could you move to, and then, what would you do to improve that metric?
Nimrod :
Once you define the metric, you can always try and see where are the weak cases, and maybe how you can strengthen them, even if it's more data, or even if it's... Especially kind of foundation, alpha augmentations, but again, those things can be under the radar if you don't give them enough weight. I mean, that's common failure case that actually happened in the past.
Lukas :
Wait, can you explain one more time what happens there in this failure case?
Nimrod :
Yeah. Let's say, for example, we took an accuracy overall measure for a baby detector, but we missed the tech... Sorry, the baby, when it wasn't there, but we had the high recall which compensate that. Eventually, we got to very high accuracy. But, for example, for product purposes, the precision needed to be higher in order to give enough value to the product. Actually, another way of looking at it is looking over the precision as the biggest parameter for us. And so, once we changed to look at that, we could clearly see the problem and fix that.
Lukas :
How do you fix a problem like that?
Nimrod :
So, collecting the data might assume, much dedicated way to your problem, maybe see whether you're actually collecting the right data and not just, maybe random sample the data at some point, but actually, direct yourself to the places, how the model will look when it's in production. So, you want to try to imitate that and collect data from those parts in order to make your model trained on what it's actually going to see and not on what's easy to collect. It's one of probably the best solutions.
Lukas :
So, collecting data of the cases where you think your model is struggling and adding that, as opposed to random sampling?
Nimrod :
For example, or maybe collecting the right data to your problem. I mean, you can collect data in many ways, and collecting the data that suits your problem is the first thing, actually. I think you need to do and put a lot of thought about it. It's actually my first bullet on the guidelines, start by defining what's the right data for you. Don't just collect data and start working on a model because you're going to waste time.
Lukas :
Do you have ways of explaining to a business person how to justify the cost of data collection in terms of some metric that they care about? Is that an important thing to you?
Nimrod :
We try, at Nanit, to keep close connection between the product and the algorithm performances. Because data collection is very expensive and our time and our resources are very expensive, so we try not to make perfect models that will have no effect on the product. Yeah, I guess this process is pretty easy for us because this is one of the first priorities when we start a project.
Lukas :
And are you also, in parallel, experimenting with different kinds of algorithms, or doing hyperparameter searches? Is that important to you at all, or is it really just the data collection?
Nimrod :
No, no, no, no. I mean, data collection is good, but we actually, we're doing all sorts of hyperparameter tuning and choosing models, and we have really organized methodology about what to do first.
Lukas :
Can you tell me your methodology?
Nimrod :
Well, I mean, not in particular, but I guess a good thing to do is maybe start with trying to get the best model you can get, and trying to get an upper bound of performance, and ignore run time, for example, just to see what your upper bound is from the program. Because in many cases, the algorithms are working on public datasets and detectors work on MS COCO and justification, for example, on the ImageNet, but not in all cases it's a good proxy to your problem. Medical images have their own datasets, but some other parts, the data is not always natural image styles. So, you got to try models and many hyperparameter tuning. It's most of the work from the training, I mean, it's not actual work, but it takes a lot of time.
Lukas :
And then, once the model is deployed, do you stop there? I would imagine you'd have new problems that would come up. Do you see data drift as an issue for you? Like how do you think about production monitoring?
Nimrod :
We put a lot of effort in production monitoring. I think it's really important. And people sometimes underestimate that, because once you deploy a model, I guess, it's not ending, it's actually the beginning, because it's much harder. And you need to invest a really good planning and making your monitoring systems to be reliable enough and give you enough confidence, because once you deploy the model, that's the only thing you can see. And the performance on the test that you get before you deploy the model is just a single time. After that, you'll get many timeframes with performance decision, and you need your monitoring to be reliable enough to spot some shifts and maybe sudden drops, and try to understand what happened. I guess I can say that we never stop with the models. We always look on the monitoring and see where we can see any problems, and what it's connected to.
Lukas :
I think one of the issues is you don't really have ground truth in production. So, how do you know if there's a problem?
Nimrod :
It's true, it's pretty complicated. So, we always consider prediction distributions and common stuff like that. We also use other routes as well. For example, user satisfactions and maybe tickets they open, so we can spot maybe problems there that we didn't caught up in our monitors. We try to find the source whenever we can. And usually, from other parts of the company as well.
Lukas :
Interesting. I always wonder how people do... I've heard different variants, but... Well, you actually file a ticket against the ML team if you find a bad prediction. Like what do you do with a ticket like that?
Nimrod :
Well, they don't file it specifically to the ML team, but yeah, people file tickets for bad predictions because everything is actually based on that. You can get wrong statistics and bad results and you're a parent, you want to get the data for your child, you pay for this product, and you want answers. It's actually quite a challenge, I mean, since we have so many users and we need to keep our models in a very high performance level in order not to make so many tickets for us. And also, make the experience for our users much better. So, it's a challenge.
Lukas :
One thing you talked about in your paper or your medium post was preparations before deploying a model to production. Can you talk about how that works?
Nimrod :
Yeah. We try to simulate as much as possible how everything will be in production. For example, we actually create a production-like environment, and we also get some of the users to use that. Of course, they are supportive, and they are aware that there's going to be changes. And we try to monitor everything we can there in order to see that our model form the way we expect, that we don't see any issues. And that, of course... In parallel, we also do all of those end to end tests of all of our algorithms together to see that the new model behaves as it should be, and it doesn't rise any special problems, for instance, new block, or maybe improving them. That's most of the work that's done there.
Lukas :
Got it. Got it. Could you tell me a little bit about how Weights & Biases fits into your workflow, and how you use the Weights & Biases tool?
Nimrod :
Yeah. With Weights & Biases, we manage all of our experiments, which is great. We also use your visualization tools in order to compare between experiments. Since you have everything so shiny and dynamic, we can also try different parameters and see what could have been, without running the older model over and over again, which would save time. I'm a pretty huge fan of the reports that you can do because, as I said before, we are really tied up with the product team about the algorithms we do, which actually makes a way to show them what we do, and visualize on real time how each parameter affects the results. And we talk about what should be better for the product in the algorithm team together. So yeah, we used, tried a lot and...
Lukas :
So, you actually use reports to share results with the product team?
Nimrod :
Yeah. We also use reports to summarize and share with product team, show them some maybe model weaknesses, whether we want to deal with this now, or maybe deal with this later. For example, how changing parameters can help. It's better for a mutual work and transparency because sometimes, you tend to be a little bit suspicious from things you don't understand, and once we understand their job and they understand our job, I think the mutual job is much better. We've seen that once you talk about it and you explain, and they can understand your world, and you can understand theirs, we can make decisions which are much more good for the company. So, it's actually pretty useful for us.
Lukas :
Do you often go down paths where there's like a product feature you might want to make, but you're not sure if you're going to be able to make the machine learning algorithm accurate enough or powerful enough to actually make the feature possible? Do you ever get in situations like that?
Nimrod :
All the time. This is one of the main challenges we have when working with this scale and working on such sensitive data. I mean, we got such so many cools ideas and papers and works, and it's really hard to get them into production. This gap is sometimes pretty big. I can just name one example that pops into my head, GANs. GANs, for example, they're amazing example for that. They do marvelous things. But it's really hard to get them into production. I mean, they often tend not to converge, and it's worked well on this dataset, but not in this dataset. And these datasets work not good enough. So, it's a pretty big challenge how to be innovative and giving good and valuable features, but also, reliable and accurate, which is...
Lukas :
What might you do with it again, I'm trying to picture that. Like I don't want any deep fakes of my baby.
Nimrod :
No, no, no, not deep fakes but there are many other uses of GAN that we can use maybe for enhance images and make your nice fun features, that you can celebrate like your baby with a different background and stuff like that. So, the so-called... all sorts of stuff that GANs can be really useful, but again, there's a big gap between an experiment and paper, and actually getting into production.
Lukas :
I mean, I know that in the last couple of years, there's been a lot of advances, almost like a tsunami of advances in computer vision. Have any of them been relevant to you? Do you take recent stuff and get them in production? Or is that stuff too kind of theoretical to really matter for the practical stuff you're doing?
Nimrod :
We always try, take state of the art and trying to adapt them to our domain, in our fields, which is easier. Mainly object detection, we talked about it, so it's... Since tasks are pretty much solved, let's say, or pretty much comfortable to get them into production. So, yeah, it's much easier. But there are other fields that we try. I honestly say we try all the time, sometimes, really hard to bridge this gap, but it's definitely something that keeps us motivated and try to do it all the time. I mean, if you stay behind in this field, you probably won't exist that long, this is what I do.
Lukas :
Sure. Yeah. Is there, I guess, any paper or like line of research that you can talk about as being especially relevant to the work you're doing?
Nimrod :
I can talk about some nice researches we did lastly, and all of them are actually somehow related. I mean, they're all using the sleep metrics that we have, which have the algorithms at the back. For example, during the pandemic, during COVID, actually Nanit helped to kept families together. For example, when the grandparents can't see their grandchildren, and Nanit allows that. And we also checked, during the COVID, what are the effects on babies. And we actually is trying to study the difference between children, that their parents were essential and went to work as usual, and the parents that stayed at home. And we actually saw at the first few weeks like from end of March, let's say, for the first few weeks, we saw that the sleep of the babies actually got worse.
Lukas :
Oh...
Nimrod :
Yeah. But it was actually improved after a couple of months. We saw that the sleep of the babies that their parents stayed at home actually got back to normal, which is pretty amazing. It's actually means that babies are resilient to the change and they adapt fast which is kind of cool.
Lukas :
Can I ask you, so this is... I mean, this is like, I think, for a lot of parents, the most drama-filled topic is sleep training the baby where you leave the baby and let them cry at various lengths and teach them to go to sleep on their own, instead of with you holding them. Do you have an opinion on that?
Nimrod :
Well, since I'm not a sleep expert, I can only say, from my experience, it's important to let the baby sleep on their own, I guess, not in any cost, but...
Lukas :
Do you have any data on that? I guess you do sort of track when the baby falls asleep on their own.
Nimrod :
Yeah. Yeah, we do. I'm not sure if I have any relevant research that we've done in this field, but again, this is the beauty of Nanit. I mean, you can actually test your assumptions, I would say, because if you believe in that, and then, the objective data tells you that it's right, so that's good. And if not, so, you might really want to reconsider, but that's up to you. I mean, you got the data, you can decide.
Lukas :
Do you publish like aggregate statistics like that on different things that help babies sleep?
Nimrod :
We do have our researches that we publish. I'm not sure regarding those, what helps and what doesn't specifically. We did publish research about the screen time and how it affects babies and young children. And it's actually pretty amazing. We found out that, for example, touch screens have bigger effect on the sleep of babies, as opposed to, for example, television. I mean, television has less effect, which pretty amazed me. I mean, we sort of, touching our... causing fragmented sleep and less sleep time overall, which, it's really amazing. You can conduct a research and see it quickly, because we have large user base and engage users that can allow us and answer questions. This one is also a good research tool, I guess.
Lukas :
That really is amazing. Yeah. And it seems like, I guess, from your app, I feel like your benchmarks of sleep are actually a little less sleep than I see in sort of like the parenting books that I read. Do you think because you're actually monitoring it, instead of getting self-reported data? Do you see systematic bias in the self-reported sleep data? Like it'll tell me like how my daughter is doing, like can I compare it to averages? And it's funny because the, the app is kind of telling me she's doing pretty good, but then, when I compare it to books that I'm reading, it seems like she's sleeping a little less than average. So maybe you're just trying to be positive and helpful, but I also wonder because we try to write down every time she wakes up and when she goes to sleep and when she gets up, and I always kind of feel like our written notes imply a little more sleep than the data actually shows us that she got. And so, I kind of wonder if previous studies are relying on kind of parents' memories, end up making us think that babies are sleeping more than they're actually sleeping.
Nimrod :
What I can say about it, I guess that's, sometimes, true. Also, I guess, getting data for babies for sleep... Especially from babies is really expensive. I mean, I'm not sure researchers can do thousands of babies, and then, record their sleep, what Nanit actually can do. So maybe there's some small portion. This is why you see some big variance between studies about sleep, I guess. I guess that would be the reason, this is my assumption.
Lukas :
I guess, is there any other takeaways besides avoiding touchscreens to help a baby sleep? Any conclusions you've come to, with your large scale data collection?
Nimrod :
So, most of... actually, the significant tips that we see are actually incorporated in the app. So, helping baby fall asleep on his own is, of course, a remarkable sign for that. Because once he wakes up during the night, you can comes back to bed. And so, I guess what we see and what is... we're trying to translate it and validate it, of course, and send it as tips, if possible.
Lukas :
Cool. Well, I guess we always end with two questions, and I want to make sure we have a little time for that. The second to last question is what is one underrated aspect of machine learning that you think people should pay more attention to than they do?
Nimrod :
I would say building a good process for deploying the models. I mean, making something that works as a system, and not occasionally work and not, because sometimes, people tends to, yeah, okay, let's take the data. Let's train it's. Okay, it's very good on accuracy. Okay, we can deploy it. And then, the performance are bad. And now, the model is in the air, and it's much harder to fix it. So, I'd say conducting this methodology, this pipeline of how to work better is something that people should pay more attention. And I think that's what we see, at least, what I read on Twitter and LinkedIn and stuff like that, people are paying more and more attention to that. And I think that's important for the industry.
Lukas :
And are there tools that you use to help with that?
Nimrod :
In building those pipelines? So, we use whatever... For example, managing experiment and showing the report and see everything really helps us to get understanding on how it's exactly done. Try and simulate the production line, this is what works for us, but I know there are several companies and there are several products out there that can do many things. And this is why I wrote it as guidelines, because probably, some of the tips there, it could be useful for many people and some of them are not.
Lukas :
Totally. And then, I guess maybe you answered my last question, but I'll ask it anyway. So, when you look at machine learning in general and making it work into production, what do you see as the biggest challenge from going from like research to deployed model working for customers?
Nimrod :
Yeah, as I said, I think this gap is, sometimes, is really big, it's fact. Maybe the ability to understand which paper is nice, but will it hold in production? It's a pretty big problem. You need to foresee it. And we've tried a lot of cool features that we saw in conferences and papers, but it didn't hold on our radar or maybe they weren't good enough. So, we had to drop them.
Lukas :
Well, I really appreciate you being kind of public about your work and willing to do case studies and things like that. I think it really helps a lot of people learn best practices as they try to get models in production. So, we'll put some links to some of the work that you've put out, but I would say, please, keep doing it if you're open to it. It's super helpful for our community.
Nimrod :
Yeah, I totally agree. This is how we learned, and this is how we can share the knowledge. And I think as much as people will share the knowledge, it will be better and everyone could have great productivity, which I think is important.
Lukas :
Totally. Thanks, Nimrod. Really appreciate it.
Nimrod :
Thank you so much.