Mircea Neagovici — Robotic Process Automation (RPA) and ML

Mircea explains how machine learning unlocks the next level of potential for robotic process automation (RPA) and how ML teams differ from engineering teams.
Angelica Pan, Cayla Sharp
Created on April 11|Last edited on April 21
Comment
﻿
﻿
About this episodeMircea Neagovici is VP, AI and Research at UiPath, where his team works on task mining and other ways of combining robotic process automation (RPA) with machine learning for their B2B products.
Mircea and Lukas talk about the challenges of allowing customers to fine-tune their models, the trade-offs between traditional ML and more complex deep learning models, and how Mircea transitioned from a more traditional software engineering role to running a machine learning organization.
Connect with Mircea:﻿LinkedIn﻿
﻿Careers at UiPath﻿
Listen﻿
﻿Apple Podcasts﻿﻿    Spotify﻿     Google Podcasts    YouTube ﻿﻿
Timestamps0:00 Intro﻿
1:05 Robotic Process Automation (RPA)﻿
4:20 RPA and machine learning at UiPath﻿
8:20 Fine-tuning & PyTorch vs TensorFlow﻿
14:50 Monitoring models in production﻿
16:33 Task mining﻿
22:37 ﻿﻿﻿﻿Trade-offs in ML models﻿
29:45 Transitioning from software engineering to ML﻿
34:02 ML teams vs engineering teams﻿
40:41 Spending more time on data﻿
43:55 The organizational machinery behind ML models﻿
45:57 Outro﻿
Watch on YouTube﻿
TranscriptNote: Transcriptions are provided by a third-party service, and may contain some inaccuracies. Please submit any corrections to angelica@wandb.com. Thank you!
IntroMircea:
The ML team has to take more chances. You cannot have the ML team work on a schedule and have clear times for when something is done. Something might never be done.

It's also okay to fail. If someone starts a project today at UiPath and there is no result, but they do the right thing and you learn from that, that's a good project. Sometimes, you have to spend some time to learn that something doesn't work.Lukas:
You're listening to Gradient Dissent, a show about machine learning in the real world. And I'm your host, Lukas Biewald.

Today, I'm talking with Mircea Neagovici, who is the VP of AI and Research at UiPath. UiPath is a company you might not have heard of, but they're a leader in the space of RPA, which is essentially a way of automating a lot of the tasks that companies do.

Mircea is an expert on real world machine learning, getting something working for tasks that actually matter to businesses. This is a very interesting, practical interview.
Robotic Process Automation (RPA)Lukas:
I thought a good place to start would be your current company, UiPath, because I think the applications there are things that a lot of our audience might not know is an issue for businesses. I thought maybe you could describe what UiPath does and then get into how machine learning fits into that.Mircea:
Yeah. UiPath is in the RPA business. This means robotic process automation. What it exactly means is...basically, programs that can do repetitive tasks that humans don't want to do. Simple tasks.

It's been a good business for the company since about 2015 or so. But if you made those robots smarter and if you put machine learning into them, then I think we can take this company to a new level. So I think RPA has a lot of potential, but RPA plus AI has a lot more potential.Lukas:
I totally agree. But before we get into the AI, could you give a few examples of where RPA might affect someone in their day-to-day life?Mircea:
I think in all areas, you see people doing repetitive tasks. Opening an email. Opening an attachment. Look at some data. Copy a number into a form, then go back to that email. Take another number. Put it in the form. These kind of very simple things. But they take time, and they can actually fail. 

Robots, once they get started, they are more reliable. Lukas:
How do you actually set up, today, an RPA task? Is this a programmer does it or can anyone do it?Mircea:
We have a concept of RPA developer. RPA developer is our target for most of our products.

An RPA developer is not a software developer. It's maybe more like a basic developer from 20 years ago. They understand data, they understand processes, they know what has to be automated. And then they create the workflows.

There is a separate question about "What do you want to automate?" And I think we'll probably cover that a bit later when we talk about the project. It's not always clear what to automate, especially in a big company.

But once you know what to automate, the RPA developer's job is to take a process and make a workflow of.Lukas:
Got it.

So, UiPath has been one of those really phenomenally successful companies that a lot of people might not have heard of. What's the killer use case for UiPath that's made it successful so far?Mircea:
I think it's the broad usage.

I don't know if we have a killer scenario, but we are able to save cost and we are able to have those repetitive processes be taken care of by a robot, which allows people to do more other things.

I think we all have experiences when we have to do something that we don't exactly like to do, like move the data from one place to another. From Excel to a form, fill the form.

This is, I think, the power of RPA. Being able to do a lot of those processes. 
RPA and machine learning at UiPathLukas:
What do you think is the current level of use of ML in UiPath today? How much ML is actually working in the product right now as opposed to in the future? Mircea:
We have been starting putting ML into RPA about four years ago.

Our first project was a computer vision project. Our robots usually work because they know the Windows APIs, and they know what's on the screen, where to click, and where to type.

But this is not always the case. If you run in a remote desktop, there is no Windows API available. You only see a picture. If you are in an operating system other than Windows, the same thing. 

For us, falling back on the picture and making our robots work with the picture, without the APIs, the the was our first project. And it is a competitive advantage for us, as far as I can tell. Our competitors don't have this computer vision feature.Lukas:
What exactly is it that the feature does, it like finds the button to click on based on the screen?Mircea:
It finds all the controls from the screen, which are available to you if you are on Windows and we can actually use the Windows APIs. 

But if we have the picture, we can find everything in the picture. Then we have a design time and the run time.

In the design time, we detect all the controls. People can design their workflows. And then at the run time, we have a picture that's different. Different solution; it's not the exact same picture, but it looks the same. 

Then we find the controls we know where to click or to type, given what we have done at design time. Lukas:
So you started with this vision task, it's actually a really interesting task. And then what were kind of the follow-on tasks that you made work with ML at UiPath?Mircea:
Then the next thing we realized is that we have now the control from the screen and we want to do OCR. There are many cases when we want to do a screen OCR. 

We were using at the time Google and others, but we thought nobody really optimized OCR for screens. We thought we had an opportunity to do a better OCR for our use case. We implemented an OCR around three years ago.

It was not different than others.I mean, it's still the same idea. You do detection first, you find where the text is, and then do recognition. We didn't invent a new OCR, but we did train on our own data and our own use case. We built a significantly better OCR in the process.

The same thing for document OCR later. We don't have such a big advantage in performance, but we have more flexibility. We can put it on device, we can put it in a services. We can ship it in any way we want.

So OCR for screens and OCR for documents was another project. 

And then also during 2018, '19, we were hearing from customers that they want to do document processing. Emails, semi-structured content, unstructured. There is very many scenarios from very many customers we've seen.

I think it was quite clear that the number one thing in document processing is doing information extraction from semi-structured content. Invoices, receipts, purchase orders. And then we made some models that can actually can read those documents and extract what we really care about, including — like I said — receipt invoices, purchase orders.

And now, we have 15 or 18 document types. W-2s, W-9s and so on. We do some classification for those documents. We have models that do informational structure from unstructured contract, like legal contracts, lease contracts. 

We've put quite a lot of effort into document understanding.
Fine-tuning & PyTorch vs TensorFlowLukas:
I mean, you have a lot of custom models running in production. I feel like compared to a lot of companies, you probably have a more advanced kind of operational setup than others. I'm kind of curious what the structure looks like.Mircea:
We made the decision in 2018 to make a framework for hosting those models. The interesting thing is that we don't only want to do hosting. 

We also want to allow customers to fine-tune our models or to train our models. And then at the time, we didn't see anyone to partner with this. There were some solutions in the cloud, but nothing on-prem. A lot of our customers have more trouble to move from on-prem to cloud for our scenarios. 

People now accepted that email and documents are okay in cloud, but when it comes to processes and invoices and stuff, there is more reluctance. So we put a lot of time and effort into this. It's a big engineering project. It's also an AutoML project, but a big engineering project to build a framework that can host and train the models on-prem, online. And there are multiple configurations online. 

We call it now AI Center and everything we do is hosted or trained in AI Center. It is a very big project.Lukas:
And all these models can be fine-tuned, so you have to have kind of separate instances?Mircea:
Not all, but many of them. We don't allow the computer vision model to be trained by customers. Or the OCR. 

Although for the OCR, we have to get the feedback from the customers and improve.

But in document understanding, most of our models are retainable. And this is why. We have a model for receipts, a model for invoices, and basically, have one big model with multiple tasks. 

But then, when customers start to use this out-of-the box model, either they want to fine tune or dent on their data, which basically means overfitting on their data — but this is a good thing for them — or they have a bit of a different schema. Or they have a totally different schema. 

So for all those cases, they have to fine-tune our models.Lukas:
So I mean, 2018 isn't that long ago by the calendar. But I feel like in terms of ML frameworks, it's kind of ancient history. I mean, what did you end up choosing for your ML framework? What are these models training in and how do you actually deploy them?Mircea:
We started with a mix of PyTorch and TensorFlow.Lukas:
Oh, a mix of PyTorch and TensorFlow? Wow.Mircea:
Well, we didn't mix them on purpose. We preferred PyTorch from the very beginning, but our computer vision models...at that time, it was a lot easier to do this in TensorFlow. 

Google had a research repo implementing Faster R-CNN. It was exactly what we needed. We took the model and trained with our own data. So we used both. In document understanding, we used PyTorch* (*TensorFlow) in the very beginning. 

And then later, it became easier with PyTorch. And actually, we also got a bit of a performance boost with PyTorch. I mean, quality performance. So at this point, everything we do is in PyTorch. 

We train the models and then we ship these models in AI Center. For the customer, they only see AI Center. We cannot do AutoML. We don't expose that many hyperparameters. There are very few things that we expose.

The other thing I want to mention that was very tricky is we have to...for people to train our models, they can fine-tune our models in two ways. 

One is if they label data. This is what they do before they deploy; they label 100, 200, 500 documents and we give them our own tools to label. Or they can fine-tune on the data. 

Once we deploy, we have a human-in-the-loop concept and a validation station, and someone does fix our mistakes for the workflow to continue. And from those mistakes, we close the loop and we do learning. 

So those are the two types of data used for learning. So to come back, when we do a release, we make a branch, we train our models, we basically ship containers with code and models. And then in AI Center, they are hosted and trainable also.Lukas:
I guess back in 2018, there was definitely a sense that PyTorch is kind of the framework for research where TensorFlow was kind of more for production and deployment. How did you think about that? What was the key feature that made you prefer PyTorch to TensorFlow and choose to standardize on it, even though there was some models in TensorFlow that felt like they solved your needs really well?Mircea:
It was how debuggable PyTorch, that it was really...we optimized for developing faster. I mean, TensorFlow is a good framework, but it is really hard to use. It's been always hard to use even after they did TensorFlow 2.0. 

We didn't have such a big issue with performance. Our computer vision model runs on GPU and we have...it runs in a subsecond. So basically 0.5, 0.6 seconds, you cannot really see. Human can only notice things that take more than 0.7 or 0.8 seconds. So we did not have an issue with CV. 

But also, when we moved from TensorFlow to PyTorch, our PyTorch inference was a bit faster. We didn't exactly understand why, but it was definitely not slower. But in any case, we did not have a big performance thing. 

And then the most of our document understanding models actually run on CPU. And the request takes a second and a half, two seconds, something in this range. And for document processing by the robot, this is fine. 

So clearly, in some scenarios, TensorFlow was faster and PyTorch was too slow, but that didn't happen for us. We just didn't have a performance issue back then.
Monitoring models in productionLukas:
Do you do any kind of performance monitoring? I guess you have this human-in-the-loop system to catch issues of the models feeling uncertain, but are things like concept drift and data drift stuff that you actually kind of watch in production?Mircea:
We have to do a lot more here.

Drifting is a concept that we have to be concerned about. But for us, even before that, before the drifting, we have a hard time telling to a customer if their data is good enough or not for training.

If we don't say anything and then they start a very expensive labeling process and training process and then we say, "Your model didn't work because of the data." Why didn't you say something before? 

We don't have a good visual way to tell people, "You have to label this much. You are now 50% done, 70% done," or "Label more of this, label more of that." This is an issue for us. 

And then of course, the drifting. But we didn't solve the first problem. Lukas:
Do you do any kind of active learning in the labeling? Do you try to pick examples that are going to help the model the most? Or how do you think about that?Mircea:
That's another thing that is kind of a debt we have to do, more active learning. 

We do mostly supervised learning. We now know how to also do pre-training on unsupervised learning for document understanding and for CV. Active learning is something that we are now thinking about, it's not something that we shipped. But clearly, it is our way forward.
Task miningLukas:
Where do you think this goes? What are applications that you're really excited about building new models to do? And what would that help UiPath do that it can't do now?Mircea:
We have a very interesting project called task mining. Task mining is a product that runs on people's desktops, records what people do, and then has a nice, interesting algorithm to find what are the most common processes.

If you ask a CIO what to automate, they have a hard time to say exactly what people are doing, especially in a larger org. So we built this task mining product that instead of having analysts and a lot of people talking and figuring out what has to be automated, we try to discover this thing ourselves.

It is a very interesting project. It has a lot of potential for us. 

Basically, we start with pictures. We have a recorder that knows when something relevant happens like a click or a type. And we end up with two weeks of recording for let's say, 10, 15 users. And then we have to find some processes. 

It's a very, very interesting product and we have a lot less research from the big companies or from the universities. Nobody's really doing research on this. In CV, in DU, you can just read a paper, you know what's going on. Not so much here. So we have to do our own research. That's one project that we are very excited about.Lukas:
So the idea here is you could look at what people do over and over, where you're confident they're going to click on something or type something?Mircea:
No, not that. That one, we think about that one too. Recording people and believing what's the most likely thing they are going to do, like a language model for actions. But task mining is actually different. We look at the recording after two weeks, let's say for 15 people, and then we find the processes.Lukas:
I see.Mircea:
We found that the best process to automate is for example, invoice processing. Or the best process to automate is some look up that starts in some browser, does a look up, goes to Excel. These kinds of process. 

We just find better candidates for automation. This is just a short summary. In reality, it's a little more complicated. And we don't exactly find the process. We build an explorer for the customer to find it. But still, we believe take a process that takes, on average, 50 days down to maybe 2 days or something like this.Lukas:
Have you tried this on yourself? Imagining, I wonder what it would see me doing all day long.Mircea:
Well, us developers and engineers are not really good...if we record us, we'll see probably a lot of random stuff and coding and more watching and debugging. I cannot see myself recording something that brings any value to the product.Lukas:
I'd be just scared to...I mean, I'd be interested, maybe afraid, to know how much time I spend sort of moving around meetings or sending emails.Mircea:
We don't do a Big Brother kind of thing that tells you what to do and how you waste your time. We don't make people feel bad about it. We're just trying to find the real processes and not all the overhead and the distractions.Lukas:
I see. 

It does seem interesting though, to predict where somebody is going to click or what they're going to type. I can imagine you can make interesting UI changes to help somebody if you can sort of know what they're likely to do next.Mircea:
This is, for us, one of the things we want to look in the future. Can we tell what people are going to do? And assuming we can, what do we do with that information? 

Let's suppose you click in three edit boxes and now we know you are going to click in the fourth. What do we do? We cannot take the mouse from you and start without telling you. It's like autopilot, we cannot...so we don't know the experience. We don't know what a good experience is. But so far, we don't even know how to do that. 

The other thing we can do that's probably a bit easier is, we can see when you create a workflow. And then we can tell you that we see you doing a few clicks and a few types and we recognize that this is actually an action that we know.

So we have those simple activities when we create workflows like "Click" and "Type" and those kind of things. But we can also have more complicated activities like "Create user in Salesforce". 

We can tell after we do 3 or 4 things, we can maybe tell that you are going to do 10 more. And all those 15 steps in the end are just 1 activity, which is "Create user". 

This is the kind of thing that I think is a bit closer to us. But yeah, the ultimate goal is to just have the computer do the human work with minimal intervention from the human. But I don't think we are that close.Lukas:
Interesting.

Have advances in language models... I feel like since 2018, languages models have gotten kind of much, much bigger and kind of better at predicting words. Has that affected you at all? Do you use these modern, gigantic language models in your product?Mircea:
We use BERT models. We use all the big models. 

We don't do a GPT-3 kind of thing. Although we did some experiments with it. We don't do zero-shot learning just yet. 

So we don't use a language model for this kind of predicting the next words thing. But we do use the large models, trained with mass language models. We use them in unstructured documents and we use them in semi-structured documents. 

There is a model called LayoutLM built by Microsoft, and that's a Transformer in 2D. That one is useful for us, for the semi-structured content.
Trade-offs in ML modelsLukas:
Cool. 

It's funny, going into this conversation, I was prepared to ask a lot of questions around the mix of traditional ML and deep learning, but you seem — very much more than I thought — using primarily deep learning models.

Is that accurate or do you do any kind of traditional machine learning as well? Mircea:
We try to use the best tool we know for a task. We don't say, "If it's not neural network, it's out". 

We have all sort of smaller things, smaller classifiers that just use bag-of-words and trees and those kind of things. We have reasons for classification to use simpler models because they are more explainable. Or easily explainable. 

We usually offer choice.

In computer vision, in OCR, we don't have a simple model. We have to use a neural networks. But in document understanding and especially in classification, we have other methods as well.Lukas:
Interesting. Can you give me an example of how a model might give you more explainability and you would pick it? A lot of people talk about that, but it's hard to get real case studies. Mircea:
We had a customer who wanted to classify documents.

They want to do two things. After the model is trained, they want to see which words or which features define each class. But then they also want — at inference time — to tell which words were the main contributors to a prediction. 

It was a very interesting conversation we had with the customer. 

Before that, we were talking about explainability in more abstract terms, but this was a real use case. At predict time, they wanted to see those words who actually contribute to a prediction in the evaluation phase. 

But I'm pretty sure they also wanted it when the model is deployed. Not everybody will look at those words, but they want to have the option — when the model is deployed — to see the weights on those words.

You can do the same thing with a BERT model, but it's more complicated. You have to get the tokens. It is definitely simpler. And also, the other thing I want to say is that we are not going to train for a customer a BERT model that takes eight hours to train or fine-tune when we can train a bag-of-words model in five seconds with similar or better performance.Lukas:
Where do you think the kind of cutoff is? At what point would you switch from a bag-of-words to a more complicated model?Mircea:
I think this is really hard to say. In some cases, we have to try both to know.

We have some guidance maybe, but we cannot really tell. I think it depends. It depends more on the content, I think, than the size. It's a mix of number of documents. 

Most of our customers have very few documents and they expect us to learn from a very, very, very small number of documents. For example, they believe if they give us two forms...if they have two templates and they give us two forms for each, we should be able to do something.

And that's a reasonable expectation. 

We have some more traditional models — no deep networks involved — that actually do just that. You give us a document to look at it and we remember it. You have a second document that we believe is the same. And then we are able to match them.

We call this Forms AI, it's our newest feature. And this one doesn't use neural networks. It's just matching and searching and more traditional things. 

But I think what we are going to do is...when people have documents, we don't want to ask them to start with 1,000 documents or even 500. That's too much. 

There are cases when the documents are very much the same. And then, we should start document by document and use simple techniques internally. We should not even tell the customer what to do. But if the documents are kind of the same or very — actually, the documents are very much the same — then we can deal with them without neural networks. 

But if they keep giving us documents and we keep making mistakes after 5, 8, 10, 15 documents, there is a cutoff point where we say, "This template is just too complex for our simple tool."

Our simple tool is more like a vehicle to get you started. Where we end depends on the content. Lukas:
Interesting. That kind of reminds me, do you do any kind of AutoML? Is hyper-parameter search something that you do all the time or in certain cases? How do you think about that?Mircea:
We implicitly do AutoML. You cannot...at this point in 2022, you cannot tell a customer that, "We give you a classification, but you have to change the learning rate, you have to change the batch size."

You cannot do that. You have to find a way to do auto... Whether you like the word, the term or not, you still do some sort of AutoML internally. 

There are models that are kind of easier to generalize and you don't have to change as many hyper-parameters, and some of them are harder. But the ideas that you had before, like if you remember the Azure ML product where you give people 50 choices and the 50...I think we are past that and people expect you just figure out what to do. 

But if you internally want to train 1 model or 50 and choose the best one, I think that's up to us.Lukas:
But it's interesting because it seems like from a lot of the examples you gave, sometimes your goal is not to make the most accurate model, but the model that will kind of fine-tune the best on customer's data. Does that mean that you're optimizing something special? How do you know if the model's good in that kind of situation?Mircea:
We have some evaluation framework. But you're right, we don't necessarily...

Let me give you an example. 

If you train a model for too long, you might end up with a slightly better model, but the confidence scores are worse because the way that the overfit works and the way that the numbers get too close to 1. Basically, you make very...you are very confident for wrong predictions. You get most predictions right. But the ones you get wrong, you are very confident. 

And this is a thing that we have to figure out. What is the trade off between overall model performance and other things? Fine-tuning is an aspect. 

One thing that our customers really care a lot about is our confidence scores. Everybody will take us a model that's 3 points worse in terms of quality, if the confidence scores are perfect. Because the confidence scores will tell them when to get a human involved. 

So yeah, it's not only about getting the absolute better, best model, like a paper kind of goal for us. The goal is to make the product work, not necessarily just have the highest score for the model.
Transitioning from software engineering to MLLukas:
I really appreciate that perspective.

I guess, switching gears a little bit, but something I really wanted to cover is looking at your background, it looks like you've gone from more traditional software engineering to running a machine learning organization. And I know from talking with people that enjoy these interviews, that's the perspective of a lot of people watching this. 

So I'm kind of curious if that's actually true, if you kind of learned machine learning mid-career? And either way, if you have any advice for someone that's trying to do the same type of thing?Mircea:
I was at Microsoft for a very long time doing software engineering. And then after 12, 13, 14 years, something like this, I wanted to do something new and I didn't exactly know what to do. 

I was very lucky to talk to a few people at Microsoft that actually made me see there is this machine learning opportunity. Then I started to learn and I was really fascinated to kind of go back into learning mode. 

Now that I look back at the last 20 years, I had a gap. I kind of thought you joined Microsoft to learn on the job. And this is true to some extent, but I don't think it's enough. So then, I went back into more learning mode and did some math and some statistics that I had not done for the previous I don't know how many years.

And then after about 18 months or so of doing this for maybe 4, 5 hours a day, nights, and weekends and so on, then I thought I was ready to change jobs. 

I moved from my previous engineering role to a Microsoft Research team. That was a very good move for me. I was just learning these things. So they hired me to help them more do the engineering, but also, they understood that I want to do more machine learning. 

But then, I thought actually, I also want to go back to school. And I started a master's program in computer science at UW. 

So yeah, so basically, what happened is that I spent about 4 years or so learning online. Coursera, at the beginning. And then this master's. And then I was able to transition from a software engineer role to this ML thing.Lukas:
Do you have any advice for your younger self or someone that wants to make this transition? Mircea:
I think they have to be motivated. This is a long journey. I think if you believe you do this in two months, I think that's not setting the right expectations. You have to be prepared for a longer transition. 

And I think you have to go back and do some math. It depends after how many years you want to transition. It's a lot easier to transition early. And also for younger people who now go to those good universities, they have good knowledge about math that's fresh in their mind and they have good ML courses if they are interested.

So I think what I can say is more for people who've actually spent 10, 15 years in software engineering. Just prepare for a longer journey and try to learn the fundamentals. 

If you rush into it and you...it is not enough to be able to say "model.fit" and put some parameters in that. That's not going to do it. 

I strongly recommend those master's programs. I think they are good programs and they kind of force you to put more time, and you have to do a lot of projects and homeworks. 

The other thing I thought was a good resource is to do Kaggle competitions. was in three of them and it was just a great experience, but very...the second part of it was very intense But overall, Kaggle is a great resource, I think.
ML teams vs engineering teamsLukas:
I love that answer. 

Do you think that your background in software engineering makes you approach machine learning differently in any way?Mircea:
I don't know what to say about that one. I think it's good to have some software engineering experience.

A few things happen. If you don't do software engineering for a few years – I didn't do software engineering for 5 years now — you are not current anymore. Things happen that you don't exactly understand. I hear people talking and more and more it happens to me that I don't understand what the details of what they're talking [about].

I think it's very hard to do ML and do engineering — basically both — at a good level. This is why, at UiPath, we have a separation between more science/ML team and the engineering team. 

But I think it's good to have the background. It's good to understand memory and processors and threads.Lukas:
Are there differences in the way that you think teams should approach an ML problem versus an engineering problem? Is even the cadence of shipping different? Mircea:
The ML team has to take more chances. You cannot have the ML team work on a schedule and have clear times for when something is done. Something might never be done. 

It is also okay to fail. 

If someone starts a project today at UiPath and there is no result, but they do the right thing and you learn from that, that's a good project. Sometimes, you have to spend some time to learn that something doesn't work.

It's harder to do this in engineering. In engineering, you have more strict schedules, more products, and all those hurdles and sprints and so. 

So yeah, I think you have to to organize somehow differently. We are a more hacker kind of org than engineering. 

We're also more flexible, easier to move people from one project to another. For us now, our CV model, our DU model, our task mining model, they have a lot of things in common. Lukas:
It's funny. We were talking to Jeremy Howard, the fastai founder, and he was saying that he thinks that engineering software is kind of more fun because you make incremental progress that you can really see. And I was kind of reflecting on that. 

I think my background is more in ML, but actually adding features to the Weights & Biases product is definitely more satisfying for me than training ML models. I feel like ML models, mostly they don't work and the debugging cycles are way longer and harder. 

Is that consistent with your experience? Or there must be something about ML that you love.Mircea:
Yes, but I mean, we do new features, although we don't...I mean, it depends how we define engineering. 

I think the way I look at it is this. You have people who do research science and they write papers, they create new knowledge. We don't do much of that. We have one researcher and we want to hire a second one. But for the most part, we don't do research. 

We do applied science though. Most of our team is an applied science team. So we do build new features and it is...our work is, I think, maybe 10% in training the models. And the rest is to just make something happen. Make them work somehow, put them together. 

But then, it's the engineering team who actually puts those things in production and create the containers and deploy in our data centers and takes care of scale and availability and all networking and all the other things.

So that's why I'm saying it depends where we draw the line. 

We, in this team, don't just train models and then tell others, "Okay, take the models." We do the post-processing. In most cases, there is more post-processing than the model itself. We do pre-processing, we do data manipulation. 

So we build some feature, not just a model that doesn't do anything, it's just nice and shiny. But I know what you're saying. We also like to build features.Lukas:
Are there different ways that your team collaborates together? Is it a different kind of collaboration than an engineering team? 

Even though I know you're applied science, it's still kind of a different thing than software engineering, I think. So are there kind of different ways to do code reviews and things like that on your team?Mircea:
We have less process than engineering teams I'm aware of. I don't know in detail how an engineering team functions now, but I think many things are in common. 

We want people to write good code, we want people to write the simplest code possible and not complicate things to the point that nobody understands. 

So there are some things that are similar, but there are also some things that are different. When we merge a PR, we don't ask people questions, like, "Have you seen this one in production? What is the impact? What is the latency difference? Where is the telemetry?" Although we want to have telemetry. 

I think the coding part is quite similar to engineering, but the way we change our mind and the way we choose the project and what to do and what to not do and the flexibility, I think, is the main difference.Lukas:
Interesting.Mircea:
And then the testing is also something that we are very...I mean, testing is very important. You cannot ship a good product if you don't have unit tests, automated tests, and so on. So some of those end-to-end testing are owned by engineering, but we also do significant testing. So that's another thing that's similar between us and engineering.

I think it's really the flexibility that's different. If we now believe a project is really important, we can easier move people around.

We now have a semantic automation project that tries to make the robots understand better what's going on, not just click and type. And this is a mix of CV and document understanding. And we can apply the same knowledge or we can use the same graphs. 

Yeah. There are many, many, many things that are similar between our team and engineering.
Spending more time on dataLukas:
Interesting. Well, we always end with two questions and I want to make sure that we give you some time to answer them. 

So, what's an underrated aspect of machine learning or deep learning that you think people should pay more attention to? Or maybe what's something that if you could go back to school or had more time to look into, you'd spend some time engaging with?Mircea:
I think there are two ways to answer the question. 

I think people spend a lot of time on models and I think people should spend more time on data. This is changing in the last year or so. You see this more and more.

If you want to improve the product, you look at the data more than you look at the models. So that's something that people are talking about. 

To me, I would also like to see more effort into more business kind of data. All those nice models are trained on Wikipedia and...but customers have very small datasets with all those semi-structured things; there are no paragraphs, no sentences. 

It's quite hard to take a good BERT model — or all those NLP models — and apply them on the documents that you see in the enterprise. There is a lot less context. The graphs are less connected and so. 

So this is about datasets and about customer data and business data.Lukas:
I totally agree with actually all the points that you just made, but I guess I want to ask about the data thing. 

People have been noticing that it's a better use of time to spend more time with data for 20 years at least — as long as I've been kind of watching it — and yet it seems so hard to get teams to look at the data as much as they should, by team's own admission. 

I mean, what do you think is going on there? Why is it so hard to orient more towards the data than the models?Mircea:
It's not clear who is motivated by that job. I mean, people have been talking about it in theory. Not really do anything about it. And if you look at what...people really love to train models. Even before the neural networks, people love to train trees and so on.

But not many people are passionate about the data in itself. All our good people do the data manipulation and the clean up of the data just to build better models. 

There are now companies who help with the data, including what you guys do.

But what is the profile of a person for us to hire to actually really focus on the data is unclear.  Do you want to have software engineers, you have data engineers? It's unclear. I think this job really belongs to the applied scientists, but they rather do something else with their time.

So I think this is why everybody says we should do more progress, but actually, nobody really does.
The organizational machinery behind ML models Lukas:
Right. Okay. That makes sense. 

My final question for you — and this is an interesting one, because you've put probably more models into production than most people in the world, most people on this show — what's the hardest part about getting a model from conception to running live in production? Mircea:
When we build something, we start to do the ML parts. We see if a project has legs. But to ship it, you need a big machinery in place. 

You need testing and you need engineering and you need product and you need alignment and people to sell to customers. The real thing, not to oversell or undersell. 

I think building this whole machinery is, to me, the biggest part. In the end, when you are done, you realize that the ML part that you love so much is just a small thing. Whether it's 10% or 15%, I'm not sure, but there is a lot more work on top of that.

People in ML should give more credit to engineering and product managers and pre-sales because without those people, there is no ML in production. 

And then having everybody aligned and kind of see the same...go in the same direction, this is tricky.

The other thing that's tricky is to have more experimentation in the product. We struggle with convincing our product managers and our engineering to do more experiments. Put more stuff into their code so we can experiment and maybe ship a better product. 

It is very hard to take time off the schedule for something that has the potential to give you nothing. On the other hand, if you don't do this and you are only...if you only exploit and don't explore, it is not good. 

So this is another tricky thing. How do you convince the whole org to have the right mix between exploring and exploiting?
OutroLukas:
Awesome. Well, that's a great answer. Thank you very much.Mircea:
Thank you.Lukas:
This was super fun. I appreciate it.Mircea:
Very nice talking to you, Lukas.Lukas:
If you're enjoying these interviews and you want to learn more, please click on the link to the show notes in the description where you can find links to all the papers that are mentioned, supplemental material, and a transcription that we work really hard to produce. So check it out.﻿
Add a comment
Tags: Gradient Dissent, Articles, Robotics
Iterate on AI agents and models faster. Try Weights & Biases today.