Productionizing machine learning models, one thoughtful change at a time with Josh Tobin
Josh Tobin, former researcher at OpenAI and creator of Full Stack Deep Learning talks about professionalizing ML workflows for the real world, his work with the Robotics team and FSDL.
View all podcasts
Gradient Dissent - a Machine Learning Podcast · Evolution of Reinforcement Learning and the Robot Hand

Josh Tobin is a researcher working at the intersection of machine learning and robotics. His research focuses on applying deep reinforcement learning, generative models, and synthetic data to problems in robotic perception and control.

Additionally, he co-organizes a machine learning training program for engineers to learn about production-ready deep learning called Full Stack Deep Learning. Josh did his PhD in Computer Science at UC Berkeley advised by Pieter Abbeel and was a research scientist at OpenAI for 3 years during his PhD. Finally, Josh created this amazing field guide on troubleshooting deep neural networks. Follow Josh on twitter and on his website.


Lukas: I think for a lot of people listening to this, just knowing our demographics, I think a lot of people would probably be most interested in learning about machine learning and they might even know you from some of the classes that you teach which I think, in my opinion, are some of the best classes out there. I've learned a lot from watching you teach and I'm curious how did you get the idea of teaching a class? How did that come up?

Josh: It sort of all started to happen around two years ago. I was working at Open AI at the time and Open AI was going through an interesting transition, I would say. When I first joined, it really felt like a very traditional academic lab. It felt like the lab that I was at at Berkeley, except with more resources. And, you know, at some point they figured out that there was a type of work that they were uniquely suited to do that a typical academic lab is not well suited to do, which is this sort of larger projects that involve instead of just a couple of researchers working together, maybe a team of twelve or fifteen folks, you know, a mix of engineers and researchers with bigger budgets, more ambitious goals and really, really trying to push out these projects that would clearly mark a move forward in the state of the field. And so while this is happening, a big part of that change was we needed to figure out how we are going to professionalize our process of building machine learning models, right? And so on the robotics team, which I was working on at the time, we were figuring out stuff like how do we write good tests for machine learning code? So that you don't lose the ability to train a model that you were able to train a couple months ago, which happened to us multiple times. I actually manage a team that has both folks that are doing speculative research stuff that may not be able to really measure their progress in any given week and also people who are doing very traditional engineering work. That is, where you can easily say this is the goal for this week and have we met that goal? So, we were trying to sort out all these things. And around the time I was talking to my PhD advisor at Berkeley, Pieter Abbeel, and a friend of ours, Sergey Karayev, who was running at the time, a machine learning for an education company called GradeScope, and we were swapping notes on how we were approaching these things and how Pieter had seen other companies approach these and how Sergey had approached some of the stuff of GradeScope. We realized that there is this whole emerging engineering discipline I will call it, around, you can go online and learn the math and the algorithms behind machine learning; you can learn what neural network is; you can even really learn how to use TensorFlow and how to code this stuff up in an effective way, but at the time, there was very little on everything else that you need in order to actually make this stuff work in the real world. So, you know, not only the things I described, but also, how do you troubleshoot models? How do you choose projects? How do you manage teams? How do you deploy things into production? How do you monitor them once they're in production? And so we realized that everyone that we knew was reinventing the wheel on all of these practices and the number of people that are actually good at this is very small and they have to be tracked in a small handful of large technology companies in the Bay Area, let's say. So we just thought it will be really good for the field if we wrote down everything that we knew about the stuff and everything that our friends knew about it. And so that was the genesis of the Full Stack Deep Learning class. I guess what's amazing, I hadn't really thought about it this way but I feel like I spent my career and I'm a little older than you, studying, making machine learning models work in the real world. But watching your class, I'm learning a ton and I'm seeing you as the expert. How did you get up to speed on this stuff so fast? Was it just the experience at Open AI, because your classes are amazingly deep?

Josh: Yeah, it's a good question. I mean, I think I was at Open AI at the most interesting point for this, because we were figuring this stuff out from first principles. And so there are tons of conversations around what tests should we have for machine learning models? And it was a really brilliant group of people there who would like to take a problem and break it apart and look at it from the ground up. And so I think I was able to like look at things from, all the way down to the first principle's level through that and then I think it was really just about trying to talk to a lot of the folks that are working in the field and seeing how they approached some of these things as well. A lot of the content we put together in that class came from about thirty or forty interviews that we did with practitioners and just trying to understand. We had a good sense of, what are the hard things? And what questions do you need to ask if you're putting together an applied machine learning team. So just getting a range of answers on those was also really helpful.

Lukas: You know, you have a unique background, having been like a McKinsey Consultant for a number of years. Do you think that informs you at all like that? Do you think about how that might affect the way you approach this stuff?

Josh: I think one of the things I learned from McKinsey was, how do you approach abstract problems? Like what should our company do? Or, you know, what does your organization structure look like? Show like these problems where it's like okay, where do I even start thinking about this problem? I think the question like how to make machine learning work in the real world has this flavor and then figuring out how to break that down into parts and structure your thinking around it is definitely one of the essential things that you have to do as a management consultant. And so I think that definitely informed the way that I looked at this problem.

Lukas: So is there a piece of your curriculum that you feel particularly proud of?

Josh: I think the thing that I put the most, emotional energy into was the troubleshooting guide.

Lukas: That is actually my favorite part too.

Josh: That was the piece that I would say I was writing it for myself a few years ago more than anything else, because I was just trying to answer.. to... My perspective when I was writing that was, "how could I have saved myself months of time if I had gone and started over in this field?"

Lukas: I would have to say that I got a chance to work for you briefly for maybe a month or two and I think my big takeaway from you that I always hear in my head is you are like, "You should always slow down and change one thing at a time." I mean, I feel like that actually applies to more than machine learning obviously. But boy, does it apply to machine learning like Oh my God!

Josh: Yeah. It's so essential in machine learning, because fundamentally I think the thing that makes machine learning so hard is that, when you're writing software, like when you're writing code, we have this pretty mature ecosystem where if you make a mistake, usually the system that you're building or that you're interacting with will tell you. First of all, will tell you that you made a mistake. And second of all, might even give you a hint as to where that mistake is. But the insane thing about actually trying to make progress on machine learning projects is that most of the time when you have a bug, the only thing that happens is that the performance of your model doesn't get better as quickly as it should. And so there's no way of knowing that you actually made a mistake a lot of time. Unless you happen to have really strong intuition about what your learning curve should look like, right? And so I feel like that's why it's so essential to move slowly when you're building your machine learning models.

Lukas: Although it's kind of funny because I wonder if programming Web applications is the outlier here, because I think about just trying to make an advertising campaign work well or trying to get my old motorcycle running again, it's always better to change one thing. Yeah. Because it's so hard to tell what will happen otherwise. But I guess maybe we have this better telemetry or API, you know, programming python or if I don't know...

Josh: I feel like that's right. I mean, I wouldn't consider myself a world class web developer, but when I've done it, it's also been helpful to still just change one thing at a time.

Lukas: I just feel like it's just good advice for all situations.

Josh: Yeah, it might be. Anytime you're building anything, I feel like as you get better at something you can increase the increment of what you can change at a time. For example, if I'm training an image classifier or something, I can pretty much just start with newer architecture, like a resonant or something like that, because I've done that enough times that I know what I can expect the result will look like if it works and what common things that can go wrong are. So I feel like I can skip a couple of steps but when I'm writing Kubernetes Code, right? Something I'm not very much good at, I have to still move very, very slowly.

Lukas: Would you be down to walk me through your troubleshooting steps and how you think about them, I know a few people will be interested.

Josh: Yeah. You know, the core concept is what we've been talking about, right? Which is to start simple and then layer on complexity one step at a time. So the first question you might have is what does it mean to start simple, right? I think that one of things I've noticed with people that are getting to the field is that there's a tendency to get all this excitement around neural network architectures and the latest and greatest state-of-the-art model on image net. And so I think people tend to overthink the question of architecture selection and selection of all the other pieces around that; like what Optimizer are you using and things like that. But in reality, I think when you're starting on a new project, the goal is to just choose a reasonable default and start there, even if it's not state-of-the-art. And then once you've convinced yourself that everything around that is working, it's your data loading code and your training code and all that stuff, then you can gradually move closer to a state-of-the-art architecture.

Lukas: How do you convince yourself that this stuff is all working?

Josh: Yeah, it's a hard question. I think there's some tricks that you can use, right? So the first thing that I  recommend people do when you're training a new neural net for the first time is just make sure that you can first. I mean, first of all, just get the thing to run.

Lukas: Right. Like, literally just like output something

Josh: Just output anything. Which is not always as easy as it should be. Let's say that you've done that. Then the next thing that I think you usually want to do is try to overfit a really small amount of data; like a single batch of data. It seems really simple and a lot of people skip over that stuff because of that. And, you know, 80% of the time, it's not really necessary, but 20% of the time you can catch some pretty nasty bugs early on.

Lukas: So, I often recommend this, citing you, and I'm sure that this is not obvious to most people, why do you want to overfit a small amount of data?

Josh: So like any reasonable model architecture, optimizer, training loop and data type, you know, you should be able to get your loss down to zero on a single batch of data, right? You have enough parameters of the neural net, should be able to just memorize the data. And so, you know, basically if it can't do that, then you know that you must have like a pretty bad bug in one of those things.

Lukas: What kind of bug, for example?

Josh: Like you flip the sign on your loss function, right? And your loss actually goes up rather than going down. Or, another one I see all the time is in a lot of these neural network libraries, the inputs to the loss function is maybe like the low digits right? So it's like something un-normalized. But, you know, maybe you took the softmax of that first and so it's things like that, right? Where it's just like you wrote the code the wrong way and this is a quick sense check for figuring out like, "Is the code that you're running reasonable?"

Lukas: Okay. Sorry I cut you off. Then what do you when you can overfit one tiny subset of your data.

Josh: Yeah. So when you can overfit a tiny subset of your data... Then I would say, one way to think about the process of making your neural net better and better over time is there's like an outer loop and then there's an inner loop, right? The outer loop is basically you generally trying to do one of two things, you're either trying to reduce the amount of underfitting that your neural net has or reduce the amount of overfitting that your neural net has. And there's a lot of strategies for doing both of those things but the best strategy for reducing underfitting is to make your model bigger and for reducing overfitting is to add more data. And so if you think about what we just did with overfitting a single batch of data or with driving loss down to zero on a single batch of data, we're basically saying, let's take the smallest possible dataset and let's overfit it, right? And so now the next question in your decision tree should be, "All right. Now we know that we're overfitting because we can drive loss down to zero. So the next thing that we should do is reduce overfitting." And the simplest way to do that is to add data; but you want to do this gradually, right? So typically, what I would do next is I would move from a single batch of data to a smaller or more simplified version of the dataset that I was working with. So maybe it's like, I don't know, maybe it's a million images, you only take a thousand or ten thousand of them to start out with. Maybe you make a synthetic sort of toy version of the problem that you're working with. You know, if you're doing reinforcement learning, maybe you work with one of the standard simple benchmark problems like cartpole or something like that. And so you just make the problem one step more difficult than a single batch of data.

Lukas: I see. So you add one piece of complexity?

Josh: Yeah, that's the way I think about it.

Lukas: Why wouldn't you just add all the data that you have? Because like your conclusions, I imagine that could change at different scales of data, for example.

Josh: Yeah, definitely. I think there's two core reasons, right? So one is; a lot of time.. I guess maybe the simplest one to explain is that it just reduces your iteration time, right? So if you're working with a smaller dataset or a simpler dataset, then typically your model will change faster. It'll be cheaper. And so you can just try to output things more quickly, which is super key but I think the deeper and more interesting reason is that a lot of times in machine learning, you're working, you have some degree of confidence that this model should actually be able to solve the task that you're working on but a lot times you don't actually know that for a fact, right? Like, maybe you're doing image classification, but you are not doing it on image net. You're doing it on some other dataset. Maybe you're classifying whether a person is wearing a hat in the image or not. And so it's like intuitively you feel like it should be possible to solve this with a neural net, but you don't actually know that for sure. And so you want to try to isolate the sources of error in your problem, right? And so if one of the possible sources of error is that this data set, it's just too hard, then it makes sense to start with a version of the dataset that your model should be able to do well on. And so smaller datasets, less complex datasets allow you to do that.

Lukas: But wouldn't a smaller dataset make the problem harder?

Josh: In what sense?

Lukas: Say, I'm trying to classify if someone has a hat on or not, if I have less training data, I would expect my accuracy to be lower, right?

Josh: Hmmn. Yeah. That's certainly true. So I guess this comes back to the overall process that we're trying to follow, right? I think of it as iterating between eliminating under-fitting and eliminating over-fitting. And so if you're in a situation where your model is doing perfectly well on your training set, then it makes sense to increase the complexity of your training set. If you're in a situation where your model can't do well on your training set, then you need to figure out, is it that my training data is too hard? Is it that I need a bigger model? Is it that I need a different architecture? Is it that I need a different optimizer, different hyperparameters? And so, working with a dataset where it's easier to get to that point of your model overfitting reduces the number of things that could be wrong with your model.

Lukas: Interesting. Are there more steps to this?

Josh: I mean, that's the high level flow right, you know, solve your problem, make it harder, solve your problem, make it harder. And then there's details about how to make each of those things work well, right? Like, what are the steps you should actually try when you're underfitting and you need make your model more expressive? That's the overall picture.

Lukas: We'll have to put a link to this and some people can find it. Do you plan to teach more of these classes?

Josh: I think so, yeah. We don't have concrete plans to do another one. I mean, it's not a great time for in-person classes.

Lukas: Maybe a virtual one..

Josh: Yeah. Maybe a virtual one. That could be fun.

Lukas: Do you have any advice I suppose to folks wanting to get into machine learning? I'm sure you probably watch a lot of students learn it or not learn it. Do they have any sense of what's required?

Josh: Some people look at something like machine learning and they say, "OK. This is a really deep field and there is a lot to learn here. There's a lot of complexity. So many papers, thousands of papers coming out every month and so I want to just drink from the firehose and try to learn as much as possible." And then on the other extreme, there's folks that say, "Look, this field is so complex that I want to just pick a problem and solve that problem." And I think there's failure modes on both ends of that. I think I work with people who see the complexity of the field and react to that by just like learning more and more and more, but never actually really getting their hands dirty and figuring out how to make this stuff work for the problems that they care about. I think that typically doesn't work. You know, I've seen probably just as many people who don't want to deal with the complexity, like don't wanna learn the Math, don't want to understand how a concat works. I think that also limits your ability to make progress in the field, because ultimately, it's closer to a science than engineering discipline right now, I would say. And you need to balance spending time on actually doing stuff and following tutorials and making things work. And then also going back and backfilling like, "OK, I've trained, convnet on this image classification task, I know how to write the TensorFlow code. Now, let me actually go back and understand how convnet works."

Lukas: The folks that you've seen that have been successful, like, they have learned this stuff and have started to get good careers as successful people. Do you think they spend more time on the theory on average or more time on the practical hands-on stuff, or is there some other third thing that they're doing more of that makes them successful?

Josh: I would say more time on the practical hands-on stuff. One of the interesting things about machine learning is that although there's a ton of complexity, there is a relatively small number of core ideas that you actually need to really deeply understand in order to be an expert in the field. Understanding intention in neural net is really important. Understanding how back propagation works is really important, but understanding all the different state-of-the-art architecture for doing object detection is not really very important unless you happen to be working full time on that problem. So I would say that the people that I know that have successfully learned the field have spent more time with a smaller number of ideas and rather than trying to read five new papers every day, they've gone out and talked to people and figured out what the five most important papers are. And then have spent weeks with each of those to really deeply understand them. But then have also spent the balance of their time actually trying things and implementing things.

Lukas: That makes sense. When you look at the papers that you've written, do you have a favorite?

Josh: I think my favorite is actually the first one I was the lead author on, which was the Domain Randomization Paper.

Josh: Cool, Sim to Real?

Josh: Sim to Real, Yeah. Yeah.

Lukas: Can you tell me the real process of thinking of that idea and then trying it and how that all happened? Well, first describe the idea because it seems like one rare paper that you can really succinctly describe it.

Josh: When I was starting to work in this field; the Intersection of Deep Learning and Robotics back in 2015, there was a lot of excitement around reinforcement-learning being applied to robotics. So with reinforcement learning, you have an agent that interacts with environments. It takes some observations of the environments, decides what action to take, and then gets a signal back from the environment, which is a reward that tells it "did I do a good job or a bad job?" And then over time, it iteratively learns how to interact with that environment and improve its performance on whatever task it is supposed to be doing. So it's like a very natural abstraction through robotics. And, you know, back in 2015, deep reinforcement learning was starting to have a bit of a renaissance. It started work really well on Atari games. I think 2015 or 2016 was when deep mind beat the best human players at go. And so people were looking at this and saying, "Wow, this could actually be the most important technology to come to robotics in a really long time. And so I was early on in my PhD at that point and the exciting thing to work on was coming up with what's the best, new reinforcement learning algorithm; like how can we improve our performance on all these tasks? But I was very new to the field and I felt like it would not be very smart for me to try to compete with people who had been studying this stuff for years and had a lot of insights into what made those algorithms work. And so what I tried to do was think about "OK, what are the enabling pieces that we need in order, actually, for a story to come true?" For a story that deep reinforcement learning is having a big impact on robotics. And for me, the piece that was kind of missing for that story was that deep reinforcement learning is very powerful, but it's very data inefficient. Like all these state-of-the-art results that you see happen in environments where you can simulate everything that's happening. Because it takes hundreds of millions or more of interactions with the environments to actually get to the point where you have acute model behavior. And so for me, you're looking into this field from the outside. That was sort of the big question mark. Is there any way for us to get around the dat inefficiency problem for robots? Because going out and collecting a hundred million examples of a robot interacting with an environment is not very cost effective. Let's say.

Lukas: Google did this right? Arm Farm?

Josh: So it's definitely possible but do you really want to have to have dozens of robot arms running 24/7 for weeks every time you want to learn a new behavior.

Lukas: Sure. Yeah.

Josh: So coming back to this paper, the question that I got interested in was, is there any way to learn behaviors in a physics simulator, where you actually have access to hundreds of millions of labeled examples but then somehow make that work when the robot is put out into the real world? I was kind of working on this back when I was an Intern at Open AI and we had a really concrete problem that we were trying to solve. We were trying to set up a robot to make a stack of blocks. So it will like pick up blocks from a table and then stack them on top of each other. And the robot behavior was trained assuming that you actually know where the blocks are in the real world. And so then we needed to go back and backfill. Like, how do we actually find out? How do we estimate the position of each of these blocks in the real world? It is something that seems like a really easy problem, but actually when you think about "how do you make this really work?", it's more complicated than you'd expect.

Lukas: Honestly, it's so counterintuitive. I think even for me and probably for most people, that's hard. It's amazing. That's hard.

Josh: Yeah. And I think it's not like the hardest research problem in the world, but when you actually sit down and try to go and make it work really well, it's very tricky.

Lukas: Sure.

Josh: And so we're playing around with these different tag, you know, ArUco tags and methods like that where you understand the intrinsics of a camera and then it reads this tag off of an object and then it can infer the position of the objects, given the position of the camera. And we just found those things to be really fragile and honestly, not really accurate without investing in expensive setups and expensive camera equipment and stuff like that. We were mostly deep learning folks, right? And so the obvious question is, why don't you just train a neural net to do this? You know, train a neural net to take an image of a table and then say, "OK, here's the position of all of the cubes on the table." But the problem is that, where do you get the labels for the dataset that you collect? You almost need to solve the problem. You almost need to know where the cubes are in order to actually get the label dataset that used to train neural net, right? So it's a bit of a chicken and egg problem. And so this is kind of the starting point for me working on this Sim to real problem is like, "this feels like the simplest possible example of a problem where maybe synthetic data, data from a physics simulator would actually help"

Lukas: So then describe what you did.

Josh: So the core idea is that if you just take data from a simulator naively and train a model on it, the problem is that there are quirks of your simulator, right? Your simulator doesn't perfectly match the real world and so the neural level overfits to any difference between the data in the simulator and the data in the real world. So if you didn't perfectly model the lighting, you didn't perfectly model the color of the cube, the neural net won't transfer. So the idea that we had was what if instead of just taking a single best physics simulator, you massively randomize every aspect of the simulator that's not critically important to solving the task? So you randomize the colors of all the objects, you randomize their positions, you randomize the position of the camera, you randomize the background and it produces images that are crazy and unrealistic looking, right? So they look like scenes from an animated disco or something. But what happens is that actually the neural net in learning how to estimate the position of the cube in all of these massively different worlds is forced to not rely on the parts of the simulator that are not essential for solving the task. So if the color of the cube changes in every single data point, then the neural net can't create a feature that depends on the color of the cube to solve the task, because that's just an unreliable piece of information. And so when we do this, it turns out that you can train neural nets on entirely simulated data. So no real world data at all, they actually work when they're deployed in the real world.

Lukas: Because you just cycle through lots of colors and shadows and other...

Josh: Yeah, exactly. Exactly. You basically show the neural net every color and every shadow that it could possibly see. And so then in order to solve the task, it needs to learn that colors and shadows, are not important; what's important is the position of this cube-looking thing on the table. And so it's not overfitting to like all the details that are unimportant and so if the details that it is looking at are the ones that hopefully will transfer over when it's deployed in the real world.

Lukas: So how far can this generalize, has this been applied to more than second blocks now?

Josh: Yeah, it's been applied to a pretty wide range of computer vision and robotics tasks at this point. I think it's been applied to.. My favorite random application was there was a paper about using domain randomization to train a robot to pick fish out of a barrel.

Lukas: Really?

Josh: Yeah.

Lukas: Wow.

Josh: Which is actually a really hard task because fish are very shiny and slippery. And in general, most object detection methods, computer vision stuff has trouble with objects that have a lot of reflections and things like that. So that's my favorite random application. But you know, it's been applied to folding cloth. It's been applied to a pretty wide range of computer vision problems. And I think the furthest this idea has been pushed was at Open AI when they used this technique to have a robot hand that solves a Rubik's cube.

Lukas: Can you say a little bit about why that was such an impressive task? Well, I guess there was some, maybe there was some controversy about.. Is this sort of a stunt or is this like a real deep task? Where do you land on that?

Josh: So maybe the different sides of this issue would be on one hand, if you look at the types of tasks that people have been able to solve with robots over the years, a task of this sort; using a high-dimensional dextrous robot to manipulate complicated objects are very few and far between in the robotics world. And it's generally seen as high-dimensional, contact-rich, dexterous manipulation has been one of the grand challenges of robotics. And so I think one point of view on this is that even just something like a proof of concept level, to show that it's possible to even do this once, is a big step for the field because there's very few examples of projects that have pushed robotic manipulation as far as being able to solve a Rubik's cube. I think the other perspective would just be that, if you look at the details in the paper, the algorithm actually works about 20 percent of the time. And so you might argue that.. And it was a pretty big effort to actually make it work for the first time. So, a pretty big team working on it for a long time. And so you might argue that obviously, if you put 10 or 12 really brilliant people and have them work on one tiny sliver of a problem for a long time, then obviously they'll be able to make it work once. I would say that...

Lukas: That's not obvious to me. I don't know. [laughs]

Josh: I'm trying to play devil's advocate here. My bias is that it's an important result in robotics. And I think that the perspective that you have to have when you look at this, is that it is very much a research result. I think a mistake that people make when looking at results like this, and I think this is true in AI in general; is that you look at humans, like computers being better than humans at any task and you say, "OK, this means that robots are going to take this job in two years", if you look at the details of how hard it was to actually make this work, once in just about 20% of the time, it's like there's a lot more research that need to happen in order for this to become a thing that robots can do reliably. But I do think there's a lot of value in the proof of concept just to show that this is a set of techniques that this team was able to push, far enough to do this task is objectively really difficult for robots to do. And then over time we backfill and go like, "I could actually do that in a more efficient way."

Lukas: I didn't realize it only worked 20% of the time. This is a 20% success rate, meaning completely manipulated the cube to be back in the correct state, is that right?

Josh: Yeah, I think the fine print is for the hardest variant of the problem, which is like the cube randomized as much as it can be. The robot was only able to get it back to fully solved 20 percent of the time. I think on average, it did it more than that. And I think also.. Yeah, like maybe one of the other details people took issue with was the fact that the machine learning algorithm itself didn't say the sequence of actions that you need to solve the cube. It wasn't like a neural net that would say, "turn this face and then turn this face and then turn this face." There was a hardcoded solver that was saying the sequence of actions that you take and then the neural net was just saying, "OK, here's how you move your fingers in order to achieve this action."

Lukas: So the point was the manipulation.

Josh: Exactly. Yeah.

Lukas: So people are mad because it was a fun demonstration [laughs]

Josh: I think people often take issue with the way that Open AI communicates results like this more so than the results themselves.

Lukas: Because it seems like they're generating attention, is that right?

Josh: I think so, yeah. There's a bit of attention in the field right now between people who maybe have more traditional academic roots and who think that it's the quality of the scholarship that's important and whether it's truly novel, you know, whether the results are really understandable and reproducible, you know, on one hand. And then on the other hand, folks who typically are more of the more industrial research lab type places where I think that the viewpoint is more about, "our goal is to push the state-of-the-art of the field forward. And if we have to do that in a way that's not totally a hundred percent reproducible just because the experiment is too expensive, that's OK, because we're moving the goalposts forward to the types of things that A.I. is able to do." and I think there's a fundamental tension there.

Lukas: That makes sense. So I guess you've left Open AI, what are you working on now, Josh?

Josh: Yeah, it's a good question. One of the things that I learned through Full Stack Deep Learning is.. maybe one of the beliefs that I have about this field - it's that there's this narrative in the machine learning world that A.I. is going to be part of everything and it's just going to be like software where it's just sort of happening in the background as part of every little thing that we do and it's going to enable all these amazing new applications like self-driving cars. But in general, it is just going to be there in the background, making the world about 10 or 15 percent more efficient or more. I don't know. But we're not there yet. And so, one of the core questions for me over the last six months or so since I left Open AI has been "Why is that?" "What's blocking us from having just a little bit of machine learning that's just making every piece of software that we interact with smarter?" And that's the fundamental question that I am trying to answer with this company.

Lukas: It's so interesting. We actually always have been ending this podcast with two questions, and that question has been one of them. I mean, you've clearly spent a lot of time thinking about it. What's some of your conclusions? If you had to pick one thing, well, what would that be?

Josh: I mean, this comes back to our conversation about the robot hand, right? I think the field has gotten really good at doing really impressive things once but then one of the dirty secrets of machine learning is that turning something that is 90% accurate on one dataset and turning that into a reliable production system that is auditable and is maintainable and it's understandable and you can actually start to run your business on, that's really hard, right? I think figuring out how to answer that question is the big question that the field needs to answer right now.

Lukas: Yeah, I guess it's kind of counterintuitive to see a computer do something 20% of the time, and really I feel like most times that I see a computer do something that I know like, OK, it's going to do that 100% of the time.

Josh: Yeah. No, definitely. Yeah. For sure. I mean I think it's maybe one of the other things I have seen through Full Stack Deep Learning and through some of the other folks that I've talked to who are trying to implement machine learning in companies, is that oftentimes, one of the hardest things to do is figure out how to get the executives in your company, let's say, the folks that are making the decisions but are not keeping the technology to actually understand what can we really do with this stuff? I think that's one of the things that's really hard about machine learning; is that it's not always clear. There's not always a clear connection between what you read about what can do and what can actually do and communicating that, I think, is another big challenge for the field.

Lukas: Do you have any suggestions there? I mean, almost everybody we've talked to has brought that up.

Josh: So when you say make a really good class, that's like AI for everyone. Andrew NG has a class, I haven't gone through it, maybe that's the answer. But I think what you can build intuition for this stuff, but I think that doesn't come from reading the New York Times headlines. It comes from actually sitting down and looking at examples of things that work and things that don't work. So I don't have a good suggestion, but I do think there's a big opportunity to make that.

Lukas: It's funny. I've heard that IBM Watson in its heyday would fly executives to a lab and blow their minds and get them hyped out of their minds that the potential they have with really awesome demos and I've always had this fantasy of doing the opposite. I do like an hour with executives and make it really hard and boring and let them fight with the AI for a while, even just try to tune some hyper parameters, actually get the thing working I think would be a fun.. I think it maybe an informative experience for a lot of people. Maybe it'll help the executives understand why their ML teams are not producing results as fast as they're hoping.

Josh: Oh, yeah, totally. I think that also, one thing that would help a lot is the methodology. This is what we tried to do at Full Stack Deep Learning but maybe didn't really get all the way there but I think that the methodology of successfully building machine learning systems is still pretty immature. It shares a lot with software engineering, but it's really a different field. I think that if there were an agile equivalent for building machine learning systems, that would also go a long way because it's really just like the block and tackle. Like, how do you actually make this happen? So it doesn't feel as much like magic, like, what are those crazy data scientists doing in their corner over there? And it feels a little bit more like, "OK, I understand that this is the set of meetings that the team is having every week and this is how they're measuring their progress. I think something more operational like that could also go a long way.

Lukas: Seems like you could be the right guy to figure that out.

Josh: I don't know. Maybe.

Lukas: Here's the other question we always end with and I'm really curious to know what you'll say. Just at the top of your head. What's an under-appreciated topic in machine learning that you think people should talk about more? I mean, given all the hype about so many of the topics, what's a piece that people don't pay enough attention to?

Josh: I think that people don't pay enough attention to the quality of their training data.

Lukas: [laughs] I agree, I agree, Josh.

Josh: But it's so important.

Lukas: So important. It is.

Lukas: Nice. All right. Well, that was really fun. Thanks for that chat.

Josh: That was fun. Thank you for having me on.

Join our mailing list to get the latest machine learning updates.