Peter Welinder — Deep Reinforcement Learning and Robotics
Peter Welinder, Robotics lead at OpenAI talks about his love of robotics, the early days of reinforcement learning, and the evolution of the robot hand.
Created on May 5|Last edited on January 25
Comment
Listen on these platforms
Peter's Bio
Peter Welinder is a research scientist and roboticist at OpenAI. Before that, he was an engineer at Dropbox and ran the machine learning team, and before that, he co-founded Anchovi Labs a startup using Computer Vision to organize photos that were acquired by Dropbox in 2012.
Connect with Peter
Timestamps
0:00 Intro
1:01 How Peter got into deep learning
3:50 Working at Dropbox
12:23 Joining OpenAI's robotic team
16:51 Believing in reinforcement learning
24:29 Tackling the robotic hand problem
31:44 Building a research team
38:15 Switching from TensorFlow to PyTorch
41:43 Releasing internal OpenAI tools
44:55 How OpenAI uses Weights & Biases
49:57 Over-confident models and trying simple things
54:04 Outro
Transcript
Note: Transcriptions are provided by a third-party service, and may contain some inaccuracies. Please submit any corrections to angelica@wandb.com. Thank you!
Intro
Peter:
When you train a reinforcement learning algorithm from scratch, it has no respect for the delicacy of the hardware. It would just push all the motors to the maximum speed in different directions. We would bring in the manufacturer of the shadow hand and show them what we were doing and they would just watch it in horror, like "We only run it for five minutes and you have it running all day!"
Lukas:
You're listening to Gradient Dissent, a show where we learn about making machine learning models work in the real world. I'm your host, Lukas Biewald. Peter Welinder is a research scientist and robotics lead at OpenAI. Before that, he was an engineer at Dropbox and ran the machine learning team. Before that, he co-founded Anchovi Labs, a startup using computer vision to organize photos. It was acquired by Dropbox in 2012.
How Peter got into deep learning
Lukas:
I'm most excited to talk to you about the OpenAI stuff but I think we should start with your career, which I think is pretty interesting. You've done a startup. You've run machine learning at Dropbox and then gone into Open AI as a researcher. Could you tell us a little bit about how you first got into deep learning? I think it was the startup, right, where you first used this?
Peter:
Yeah. I did machine learning in grad school. I didn't really know what I was doing when I went to grad school, I knew I wanted to learn about how intelligence worked, like AI. The place where I started was in neuroscience. I spent a fair amount of time just sitting in a basement and building these little micro(?) you implant into rats' brains.
Lukas:
Oh really? Wow, I didn't know that. Wow!
Peter:
Yeah. So I did that for probably half a year. And I realized how lonely that work was; you study twelve hours a day because you're a grad student and you have to get to work really hard and build this thing. This whole thing takes like three months to build, and then at some point, you need to make a surgery and implant this into the rat. And if something happens at that time, you're all screwed. You have to go back to square one again and build from scratch. Yeah, if you go down this path, it would take a really, really long time to go through grad school. So, I realized at that point, neuroscience wasn't really for me. So then I ended up wanting to focus more on robotics. But, you know, robotics has a similar problem, where everything takes a really long time to do because you have to first build your robot and get the robot to work and that's probably three quarters of your PhD. And then you have to do the experiments at the end. So, instead I decided, "Let's pick one aspect of this instead, which kind of goes towards something more useful. If it lies in robotics that's awesome, but it's great if you can have it in other applications." So I ended up doing a lot of computer vision and that's how I got started on my startup, which was doing image organization. Things like finding faces in photos, finding what photos are about. That's how I got started into more machine learning and computer vision and eventually ended up at Dropbox doing more of that stuff there.
Working at Dropbox
Lukas:
What were the problems at Dropbox that you worked on?
Peter:
Initially...the premise that we had when we started was really interesting, which was — this was back in 2012 — that half of the files in Dropbox were images. There were many, many billions of images, of photos, and it was sort of like the dark matter of Dropbox. It took up all the space, but nobody knew what was in it and it was not useful at all for our users. The mission for me and my co-founder, after we joined, was to see if we could start making sense of all that data and actually make it useful for people. There was a lot of pretty mundane things that you had to clear up first, like just being able to sort photos by date, stuff like that. Extracting the metadata from the photos...it was a fair amount of time on "How do we index these billions of photos and how do we just do simple things on them?" Like if you want to search by GPS or search by timestamp or something like that. The idea was that eventually we could actually start extracting useful information from images. So, one thing that we realized through that work — and one of the main features that I worked on while there, at least on Photos — was it turns out that a lot of people use their Dropbox for a lot more than say family photos and stuff like that. Over time, the use of Dropbox shifted towards more business users and turns out that if you're a business user, you take a lot of photos of documents. So photos are really, really boring. They're just photos of documents. People are kind of too lazy to put stuff in a scanner, so they just take a photo with their phone and they hope that they would find this photo somewhere. Obviously though, as soon as you've taken that photo, like two days later it's lost in your photo library with baby photos, photos of food, photos of all the random crap you take photos of. You're just taking photos of everything, right? So we built this thing, which first of all was just finding those photos that had documents in them — just bringing them up and showing them to the users — and then doing more useful things on that, like actually extracting the text that was in the documents or-
Lukas:
-oh wow, so you would OCR the photos? That must be such a magical user experience.
Peter:
Yeah. That was a really fun thing actually, I honestly use these features probably a few times a week still. Probably not so many people know about it, but it's actually still there in the app. I'm very, very proud of that. You can actually scan your documents using Dropbox as well. Just take a photo in the Dropbox app and scan it. So that was the other part of it. Even building that OCR experience was really fun. My past in computer vision...this was like pre-deep learning, this was like old-school computer vision. There were like HOG features, (?) features, bag of words kind of stuff, you know, all these weird things. People don't know about them anymore because they don't matter anymore.
Lukas:
Yeah.
Peter:
When I was at Dropbox, that's when really this whole deep learning revolution happened. It was kind of mind-blowing to work on computer vision problems before and after this. Before, it's like that XKCD comic where you want to see if you're a national park, you just look at the GPS, but if you want to take a photo of a bird and recognize the species, you need five years and a research team. Anytime we would brainstorm about features, it would be like, "I don't know, maybe in a decade, we can make that work." I think once we started using deep learning for stuff, it was like...I remember with OCR, we took the best OCR systems that were out there and we created a benchmark using those — using Google's OCR, ABBYY was a big text recognition company — and we sat down and we just started building our own OCR engine from scratch. We extracted the text, we did the text recognition, word recognition...in three months we had beaten all of the public dataset benchmarks. That was just mind-blowing to me. That's the stuff that would have taken so much longer before.
Lukas:
Wow! That's amazing. What year was that?
Peter:
That must have been 2014, or something like that.
Lukas:
Wow. Were you using Caffe? What was even...
Peter:
Yeah. It was also one of those times...we started on it when Caffe was still a thing and by the end of it, TensorFlow was a thing. It was just so much changing from month to month in terms of deep learning. I think we even had...Theano was probably the stuff we started prototyping in. The libraries are just changing every month, you know. But yeah, I think the first version...the thing we shipped in production initially, some part was probably in Caffe, probably not anymore.
Lukas:
That must have been a challenge to just run that on every document. I mean, that seems like a huge production challenge.
Peter:
The truth of deploying machine learning systems, in general, is that...we did that three-month stint where we just created the algorithm and we got it all working. And then it was like a year to ship the feature because all of that stuff. Actually putting it together in production, making sure that the errors are not disasters, but then also scaling the thing up and doing it in a way where you take the cost of running it for phone photos and then you multiply it by a few billion. There are very high numbers, and then you give that to some finance person, and they're like, "What's the actual value we get out of this?" There's a lot of fun optimization work and stuff like that, but we got it down to a place where people are happy with it. I don't know what the status is now, but I think it's probably one of the things that you still have to actually be a paid user for Dropbox to run this. I don't think we run it for our free users.
Lukas:
Where there some tricks to getting the size down and the cost down? I'm trying to remember what people did back then. Did you do quantization and stuff yet?
Peter:
I think it was in the very early stages of that. So I don't think we did stuff like that. At that point it was mostly...once we had gotten everything working, it was a very manual process. Can we get away with a smaller network? Can we have five layers instead of six layers? Finding where everything is, but also that was pretty early in the state of these neural network libraries, so even doing optimizations on those libraries and doing little optimizations to just make them run on the particular architectures that we had on the machines, those things mattered. All of that is done automatically for you now, but that was the good stuff. I think we had at least one or two people who worked full time on this for a few months, just to reduce the speed and the compute footprint for these things.
Joining OpenAI's robotic team
Lukas:
Then you left to go to Open AI and you work on the robotics team, right?
Peter:
Yeah, exactly. I kind of always wanted to work on robotics since I had been in grad school. But again, I abandoned it because I thought there was a little bit too much work actually working on the robots. If you build a robot, you have a whole system and all of the things were kind of broken. Computer vision was the promising part and the really cool thing that I had started noticing was that...that was when a lot of these results came out, where deep learning was doing really well along on simple computer games and so on. Deep reinforcement learning in particular was the thing that people actually started to get to work I started feeling at that point that deep learning solved a lot of these perception things in robotics. Before deep learning, it felt like you just didn't know if anything was going to ever work. After deep learning, it was like, "Yeah, it'll probably work if we have enough data." It's a very different feeling, a feeling that you kind of know how to get there if you just get enough data. Obviously that data is really hard, but it's more of a solvable engineering and product problem to figure out how to get data. But you still have the thing with robotics. There's this other aspect, which is the control part. Control is also really, really hard. What was really promising of that early deep reinforcement learning work was that suddenly there was a learning-based approach to control that seemed to scale to more interesting action space. What I mean by that is you can just manipulate all the joints on a robot, for example. I knew some of the people who were working on that at Open AI, and they were just starting up a robotics team so it seemed like a really good time to just get into deep reinforcement learning and see if we can actually get robots to do much more interesting things using deep learning.
Lukas:
So how has it evolved? Do you still feel like deep reinforcement learning is as promising as it felt in 2017 or whatever year you joined the robotics team?
Peter:
Yeah, I think at that point I felt like there is a chance that this could work. Now I feel like this is totally the path. This should work. There might be other ways to get there, but this should work, in the limit this will work out. How long it would take to get there, it's really hard to say. I feel like it's always one or two years away. People thought we'd be one or two years away for more than one or two years. But I feel like there's something fundamental with deep learning and with deep reinforcement learning where it really feels like this should be able to solve the problem relatively far. By the problem I mean getting to more general-purpose robots. robots that can do more of the things that humans do. Actually move around at the home, not be locked into a factory, but actually deal with all the complexities of the real world. I very strongly feel like there's something that the way you need to tackle this — just because of the complexity of the world — is really through learning and deep reinforcement learning is just such a simple paradigm where it seems like most other things are going to be much more complex. I guess my bias is that complex things never really work, it's the simple things that really, really work. That's kind of what I saw at Dropbox. It was always the simplest approach that worked. If you tried to be a little bit clever with algorithms and stuff, usually you would end up being disappointed. The most important thing was really setting up the data. I think that's something very fundamental with deep reinforcement learning that makes me think that we can push it really. We're really just getting started.
Believing in reinforcement learning (pt 1)
Lukas:
When you say "work" or "push it really far", what are some of the things that you see so far that make you think that it works? And then what are some of the things that would make you feel like, "Wow, this is really successful"?
Peter:
That's a good question. First of all, I don't know if all the listeners would know what deep reinforcement learning is, so I'd like to describe that a little bit more. Reinforcement learning is really about learning from trial and error. A lot of machine learning is based on supervised learning where you show examples and then you have a label. But reinforcement learning is like trial and error, you do a series of actions and you get some kind of score at the end. We call this the reward. Like you do something that gets rewarded or punished at the end. But usually people talk about reward, we're all more optimistic. But this is the core algorithm. It's very, very simple. The reason I feel it's promising is that the biggest issue around...the biggest criticism that reinforcement learning gets is that you just need lots and lots of experience. You just need to do so many of these trials and errors in order to learn anything. So people usually don't like reinforcement learning for robots because you cannot do that on a real robot. First of all if you do anything on a real robot and you don't do it very controlled, you're going to break the robot. You're going to break the things around the robots. It's kind of dangerous to do it. The reason I think some of that criticism is misplaced is that we can just do a lot of that in learning in simulation. Some of the things that we showed at Open AI over the past two years has been that...we have really focused on this problem of seeing if we can solve robotics problems in simulation and take those agents that we have trained in the simulator and put them into a real-world robot and see if we can do the same thing that we trained it on in a simulator on the real-world robots. The hypothesis behind this — one somewhat controversial hypothesis — is that if you have any problem in a simulator, you can solve it using reinforcement learning, if you just have enough compute. You can have really complex problems like Go or DOTA, which is a computer game. These things require a lot of strategies and so on. You can take those and you can still solve them with enough compute. We trained an agent in the simulator to operate a humanoid robotic hand. And we got this to solve a Rubik's Cube in a simulator. Then by setting up the environment in the right way in the simulator and throwing lots and lots of compute at it, we were able to train a robust enough algorithm to then put it on a real-world robot and have it solve a real-world Rubik's Cube. I feel like this was a hard enough problem — a manipulation problem, this is tricky for humans even to do it. We had one hand that was fixed to a wall, and we moved it very much — and it can still do this thing. So it's a hard manipulation problem but still, we can solve it using reinforcement learning, solve it with a real-world robot. In some way that gave me enough confidence where I now feel like there must be more problems we can tackle using this approach, like a lot of things would be easier than solving a Rubik's cube.
Ad
Believing in reinforcement learning (pt 2)
Lukas:
When the robotics team got started, what was its charter? Did you know that you were going to do simulation, did you know that you were going to do reinforcement learning?
Peter:
I think the short answer is going to be that we didn't know at all what we were doing. We had this goal. We wanted to build general-purpose robots but I don't think we had a super clear idea of how to get there. I think one core belief we had was that deep learning would be a big part of it, reinforcement learning would also probably be a big part of it. But exactly what kind of different flavors of reinforcement learning and so on, that we didn't really know yet. There was a philosophy around, "Can we take some of these approaches that are pretty simple and by really pushing them super, super, super far, can we solve really hard problems with them?" I think that was our overall strategy. We kind of hoped that just taking really simple reinforcement learning algorithms and putting them on a really, really hard problem would be successful. I think we were somewhat scared for the first two years, maybe this won't work out. It really felt like that a lot of times. Every time this robotic hand broke and we had to send it off for repairs and we'd have like a month sitting there and thinking about our mistakes and thinking, "Will this ever work? I'm not sure. It is probably completely the wrong path." But in the end, now I think our belief in that is stronger than ever.
Tackling the robotic hand problem
Lukas:
Why did you choose to manipulate a hand? I feel like if I was trying to build a general-purpose robot, I might even leave out the hands. It seems like the hand has got to be the most complicated thing and I feel like in the movies robots don't even have hands. Maybe they don't even need them, I don't know.
Peter:
Yeah. You know, it's interesting how that started. The first problem we tackled, we didn't use a robotic arm. We had one of these fetch robots, which is basically a mobile robot with a robotic arm and a two-finger gripper. It's a super simple robot and we would even just screw it into the floor so it couldn't move. So it was just like a robot arm, basically. A very expensive robot arm. That's how we started. But what we realized when we were doing that...we started with the simplest of problems in robotics, which is block stacking. People have been stacking blocks for 60 years. I kid you not. There's movies from Stanford in the 1950s or 60s where they have robots stacking blocks out. So, you know, we got to start simple. That was one of the first things we were doing. We had this realization that even the simplest thing of just picking up the blocks and the manipulation of those blocks was pretty hard. So then we were like, "Okay, we need to solve this problem of doing manipulation." And then we were like, "We need to be pretty ambitious about this. Let's do the hardest thing we can think about. Let's have a robotic arm." There was another thing that we did at the same time, we went to a robotics conference. We asked people what is the hardest thing...you know, a bunch of roboticists from across the world and we asked them, "What is the hardest thing you can imagine doing in robotics right now? If we picked a really hard problem and we can show that deep reinforcement learning work in this, where would you be impressed?" And then all of them would answer like, "Well, the problem I'm working on is really, really hard." If you push them enough, two things became really clear. One was high degrees of freedom, like having lots of joints in your robot. That's kind of hard because a lot of the control-theoretic approaches just don't scale very well with a number of joints on the robot. A hand is like...if you have a robotic arm, it's like five robotic arms on your hand. It's really complex, it doesn't really get more complex than that. The other thing people said was doing things with contact is really hard, where you're actually manipulating objects. That's why we felt, well, if we really want to convince people that deep reinforcement learning can solve really complex robotics problems, let's just pick a really hard problem. If we solve that, a problem where we solve the manipulation problem for that particular robot, we won't be afraid of manipulation problems anymore in some sense. So a Rubik's Cube was...once you have the robotic hand, then it's like, "What can you do with a robotic hand, if it's just stuck to the wall? Well, you need to put something in the hand, and a ball or something is not very exciting. So what is the most complex object we can think of? A Rubik's Cube. It's pretty complex." So that's how we got started on the Rubik's Cube in the robotic hand. In hindsight, I don't know how smart that was, but it gave us a really tricky problem to work on.
Lukas:
Interesting. You would have done something else in hindsight?
Peter:
When we started out with this project, it was kind of pretty crazy. We did this thing where we started solving it in a simulator and we thought, "Okay, this is going to maybe take half a year to solve in the simulator. It's going to be tricky to come up with the right reinforcement learning algorithms. We probably have to iterate on the kind of algorithms and stuff like that." Then we started on it and then within two or three weeks, we had solved it in the simulator. So we went, "Holy shit, that was simple. We can probably solve it in the physical hand in another month or so." And those were like famous last words, it took like two years from that point. I definitely feel like there were certain things we didn't know about these robotic hands, like just the fact that nobody had run reinforcement learning algorithms on these robotic hands before. When you train a reinforcement learning algorithm from scratch — also if you want to train it in the simulator and deploy it on a real hand — it has no respect for the delicacy of the hardware. It would just push all the motors in the maximum speed in different directions. We would bring in the manufacturers of the Shadow Hand and show them what we were doing, and they would just watch in horror like, "We only run it for five minutes and you have it running all day?" Within the hour, you'd have one of the fingers, the thumb, or the little finger would be loose or hanging off by a thread. We would completely destroy these robotic hands. The iteration time on this hand was really, really long. I definitely feel like it's one of those things...if we had picked a simpler problem, we'd have completed it faster just because of the physical aspect of waiting for repairing hands and figuring out the dynamics of these really complex hands. It's definitely easier to tackle a problem if you start from a simpler problem and make it more advanced, than if you pick a really advance problem and just go at it because then you don't know where the issues are. It took us a long time to narrow down and shrink the complexity to then be able to expand the complexity again as we were solving this task. I think this is the main thing. If you could buy super robust, industrial-grade robot hands that move, that might have been different. But basically, it was like two companies in the world that made these robot hands because nobody knows how to use them.
Lukas:
I'm surprised it's even two because I've never seen a robot hand except for your robot hand.
Peter:
Right. Oh, my God. You know, they sell it to these research institutes, and they tell us that they go to these researchers and they sell it and then two years later, they visit them and the hand is in this pristine condition. Nobody dares to touch these robots because they're so complicated.
Building a research team
Lukas:
Tell me about the team; how big is the team working on this? How do you divide up roles? How do you set goals? How do you break apart such a difficult goal that might actually be impossible into smaller pieces? What even does a performance review look like? Do you actually do that?
Peter:
Those are good questions. We've learned a lot about this because it's very different from a lot of other situations. When I was at Dropbox, it's all about "You want to ship a product and you do everything to ship the product." In here, at OpenAI, we have a pretty ambitious goal of building just more general AI algorithms and eventually general intelligence. So we want to set really ambitious goals for ourselves where we can really feel like we can push the envelope on what we can do with AI. That's really tricky to make that into something concrete, especially when you have lots of people working on it. Because the other thing that we pretty strongly believe, especially in the robotics team, is this idea of having just more of a team effort to achieve big things. There are just too many things with robotics that you need to solve where you can't just have one or two people working on it. You need a bigger team. And right now, we found that sweet spot has been around somewhere between 10 and 20 people, in terms of the size of the team. If you get bigger than that, overhead starts slowing you down quite a bit. But if you're smaller tham 10 people, it's relatively hard to make progress just because there are a little too many things to do. What we tried to do is to have pretty concrete goals. For example, we knew that we were working towards solving a Rubik's Cube for like two years. This was a very concrete goal. Once we can see this robot hand solving a real Rubik's Cube, then we've solved this problem. Having a very concrete goal like that makes it easier to focus and not digress too much. If you're doing research, it's like walking through a forest and you want to get to a mountain, and there's all these nice fruits and berries around. It's like you just want to go, "Oh, this looks really good. I want to taste this for a while and see what I can cook with it." It's very tempting at every point in time to just stop and explore for a really long time. But if you want to solve a really big problem, you have to be much more focused than that. And seeing this thing in the race and this clear goal helps a lot. That's been one core component of how we do things, clear goals and then having the whole team work towards that. More in the philosophy of a startup, but less maybe short-time priorities...we kinda have to try out really ambitious things and things that will fail with very high likelihood. We want to leapfrog a lot of other approaches with the projects we take.
Lukas:
I guess my question is like...I'm imagining — and that makes total sense — but what is everyone doing?
Peter:
It's a good question. What is everybody doing? If you ask anybody at any point in time what are they doing, they're going to tell you, "Well, I have this bug. I'm trying to figure out this bug." It's like an engineer's life. It is what you're doing almost all of the time, fixing a bug. But it's different levels of bugs. Usually the work is split between doing some engineering towards building up tooling to understand more of where we're going in terms of our experiments and running our experiments and so on, or engineering in terms of running our training — training our models and stuff like that — or a lot of research on where we work. What I mean by research is more coming up with new algorithms, trying them out,. Coming up with hypotheses, trying them out. Figuring out the best way to set up experiments. Sometimes that involves doing something that we have come up with by ourselves based on where we are in our research. Sometimes that's a new paper that's come out that might be promising, let's re-implement that and see how that compares to our work towards our baseline. There's just a lot of different things going on. Which is really interesting because one thing that's very different from, say, working at a company where often you're working on a feature, you're working on that feature for oftentimes, at least a quarter, often many quarters of a year. You're working on the same thing. Here, things switch very, very quickly. It's like you're working on one thing for a week. Then you're working on another thing for maybe three weeks and then another thing for a week. Each product is very different. It might be like, "Let's make these things faster, let's dig really deep into CUDA optimization for training faster," and another day is like, "How do I control this new robot that we got?" and another day is like, "How do I render things really quickly in open GL or Unity?" or something like that. It's just highly varied work.
Lukas:
That sounds so fun, I wanna work with you.
Peter:
I could tell you it's pretty fun. It's definitely one of those things like whenever you get bored, there's another product around the corner that you could jump onto. You learn a lot. It's really fun.
Switching from TensorFlow to PyTorch
Lukas:
I actually don't know if you have thoughts on this, but I was wondering — just one really practical thing I was wondering — I think when I talked to you maybe a year or two ago, you were completely like, "Hey, TensorFlow is the best language. It's clear that's the thing to use." And then you guys switched to PyTorch, and I was wondering why. What happened and how you even...it seems like switching a framework mid-project sounds unbelievably daunting. What prompted it? How did that come about?
Peter:
I think most of these things happen pretty bottom-up at Open AI. Everybody has their main project and this one or more side projects, it's just like a natural thing, right? And then for your side projects, you always want to try some new tool so you can learn a little bit more. People started playing around with PyTorch for their side projects. You pretty quickly realize that your code is much much shorter, and much much more pleasant to read, and much faster to iterate on. You can just get all the data out in the middle of your network without running it through your graph and extracting it from a graph, like you do in TensorFlow. It was just a much more pleasant tool to use for people, for their own projects. So then what happens is that you have this for your side project. Then, when you start your next project...I think some teams at Open AI are smaller and they have a product that runs like a month or two and then they try different products. When they switch products, that's a pretty easy point at which you can switch to a new tool. So that's what started happening. Some teams started building those tools upon PyTorch, and then the other teams are like, "Oh that tool looks really nice. Oh, it's in PyTorch." And then suddenly, this FOMO starts growing within the teams and eventually it's too much. And I think we just realized that people had adopted this tool and we should just go with the flow and everybody should adopt it. I think the other thing was we started building more and more really good tooling and we wanted the whole company to start using that tooling because it made everybody move faster. Luckily for robotics, for example, I think robotics was probably one of the biggest teams that had to face this switch. We were pretty lucky in that when we released our results with Rubik's Cube, we had some time where we could take a step back and do a little bit more refactoring of our tooling and change the framework. I wouldn't have wanted to do this in the middle of a project. As you said, that seems like a recipe for disaster. There's this thing, whenever someone reimplements a reinforcement learning algorithm, even if it's the same person that reimplemented last time, it's still going to take them a month to get it right, because there's so many subtleties.
Releasing internal OpenAI tools
Lukas:
What other internal tools are you really proud of? What stuff have you built? Do you have any plans to open source any of it to other people, or is it just for people at OpenAI?
Peter:
There are a few things that we have released that we feel have been really useful for ourselves and I feel like a lot of people have adopted them. So that's some recognition that it's been useful for other people. I think the biggest thing is OpenAI Gym, it's been there since the beginning of OpenAI, more or less. It was just one of those things, where it was...when OpenAI was founded was around the time where reinforcement learning started to work. Again, with deep reinforcement learning. People would just reimplement all these very basic environments in which you would benchmark your algorithms. OpenAI kinda built this library called OpenAI Gym, which has all those environments that people are benchmarking on. Or they implement it so that people could just use that. And then it has a really simple abstraction layer and a very simple interface. People would just build more environments on top of that API. So that became really popular. I think that's a really good one. I think there are two others. Whenever we come up with new algorithms that we find that we use ourselves a lot, then we release them. For example, we have this baselines library, which has a lot of implementations of reinforcement learning algorithms. Getting those implementations right is really, really hard. And so releasing that, it's good because we've seen that it saves people a lot of work. So we've done that. The robotics team, in particular...we want to, as soon as we can, separate out some core components of our workflow. We tried to do that. Like we did this with something called mujoco-py, which is a Python wrapper for a physics simulator called MuJoCo, which we use in all our work. We just released that. We released it quite a long time ago but, you know, once it was stable enough, we released it. Similarly, a rendering pipeline we call it Orb, we have also released that. Usually, we try to open source things. Now, the tricky thing is that we cannot open source all the things we're working on, not because we don't want to, but because it would add a lot of overhead. Code in our repositories, it's not very long-lived. Most of it, I would say, like 90% is not used after half a year to a year. There are all these hypotheses that we're trying out and most of them fail, you know. You're left with a bunch of code that you basically have to delete because it doesn't matter. We don't want to release stuff just to release it if we don't really believe in it. It's really the stuff that survives that we want to release. That's kind of the philosophy we have around it. But when we do have those components, we just try to release them.
How OpenAI uses Weights & Biases
Lukas:
Interesting. I also want to ask you...I mean, this is a kind of a loaded question coming from me I realize, but I feel really proud that you guys use our product Weights & Biases, or wandb. I'm curious if you could say a little bit about how you use it. I'm not trying to turn this into an infomercial. I'm genuinely curious about what your workflow is around it because I see you using reporting more and more. Yeah, I'm curious how you think about it.
Peter:
Yeah. I mean we have been using it for a while now. It's also one of those things that we started using it in the robotics team because at some point...as the robotics team grew, we were just sharing a lot of results...everybody was running their own TensorBoard graphs on their computers and pasting some graphs in Slack and sharing with each other. It was really tricky to keep track of all that stuff. We ended up using it a lot for tracking our experiments. I think that brought us a certain level of sanity in all the chaos that was all of the research that we were doing, as the team grew. I guess that the latest feature that we have now started using quite a lot are these Reports. I feel like a pretty pro user of the feature in some way, because it's not just plotting the graphs, but it's also putting them together in a nice report. It's one of those funny things where it's adding a certain level of process and bureaucracy to how people create these reports but we've found it to be super useful. When you're a small team — where you're two or three people — you're talking all the time about your progress and so like everybody has this mental states of what is happening. But once you get bigger than maybe five or six people, then giving each other feedback and understanding what other people are working on and so on, it can be really hard. It's like this n-square problem where you need to talk to everybody. Figuring a way to fan out the information from one person to all the others in an efficient way, it's really, really important. The way we use these reports is that we're actually pretty strict about it now. If you're running anything — an experiment or you have some kind of research hypothesis that you're going after and you think it's going to take more than a day or two — we would like to very strongly push everybody towards writing a report. What goes into a report is What are you doing? What is the experiment that you're going to be running over the next few days? You're probably going to spend thousands of dollars in GPU time, lots of your own time on it. It's good to spend at least a few minutes justifying for yourself and others what are you going to do. It's not like...I think we never say "No, you shouldn't do this," it's more like we can say, "Oh, I don't know if I believe in that, but, you know okay, that's fine. At least now I understand it." If you can write down what it is you're going to do...we tried to make it actually pretty clear, from a scientific standpoint in terms of, "Here are my hypotheses and here's my plan for my proving or disproving these hypotheses." The report is usually a number of graphs and stuff that we have from our training runs and so on. They're like example photos that we've generated as part of our evaluation scripts and so in these reports. We've just found it super, super useful because it's a place...it's a little bit like rubber ducking. You know, you're talking to yourself as you're doing this. It forces you to clarify your own thoughts. It gives a way of having other people learn, both from the positive outcomes as well as the negative outcomes, in terms of the experiments you run. Another thing it does, is it also forces a reduction in stigma around things not working out. Because ultimately, again, in order to do the stuff we're doing, most things are not going to work out. But if you sweep those things under the rug every time it doesn't work out, then it looks like somebody is only doing amazing work and things are just working all the time. You don't hear about all those 90% of the times it didn't work. That's usually how it works out in papers. You see all these papers from people coming out and it's like everything is working, but you don't know about all those things they tried that didn't work out. What we do internally, then, is you can look at those things that didn't work out. You see that, "Okay other people are doing this experiment. They had this belief that it would work, but it didn't turn out to work." You don't feel as afraid, yourself, to pursue experiments. As long as you have a good reason for why it would work, it's OK if it didn't work out, you know?
Lukas:
That makes me so proud to hear that. I'm so glad that it's useful for you.
Peter:
Yeah. It's awesome.
Over-confident models and trying simple things
Lukas:
We always end with two questions. I'm wondering how you'll answer these. The first one is, what's a topic in machine learning that you feel like people don't talk about as much as they should, like an underrated topic?
Peter:
I don't know if this is something that people don't talk about, but it's definitely a thing that we don't understand well enough, which is understanding when our algorithms are uncertain about what they're doing. For humans it's very natural. When you don't know what's going on, you slow down and you are more perceptive. You think more, and so on. The algorithms that we have today just make split decisions all the time. They don't think very much at all. They just open their eyes, see something, and just react and that's it. It's like when we're walking into a dark room, we run around and flail our arms. No, we feel our way around it, we take it easy. Our algorithms don't do that at all. So, giving them a sense of self-confidence. My Ph.D. advisor would sometimes — a bit meanly — comment on people being "high confidence, low competence". That's very much how our algorithms are, a lot of the times. It should be a little bit lower in confidence, a lot of the time, and not try to be as much, doe these split decisions.
Lukas:
I love it. All right. Here's my last question. When you look at the projects you've been involved in from conception to deployment, what's generally been the biggest bottleneck? Or the thing that makes you the most worried if you're doing another project, to get it deployed? It sounds like hardware was the biggest issue.
Peter:
For robotics, totally. It's definitely one of those things. If you can get hardware that's really useful — really reliable — then you should just pay all the money you can. I remember when we started at OpenAI, we were like, "We can get by on these $100 webcams." Now it's like, "How much is this camera? Is it $10,000? It's probably worth it, let's just pay $10,000 for this camera. It would save me like half a year of my misery." So, you know, that's a big thing. One thing that always gets me a little bit worried is when you don't start with the simplest things. I really think this is one of the core things. For every project you start out with, you should start with a very strong but simple baseline. If people don't start in that direction...if you try out more complex matters, it's just gonna be...they're often got to be — in terms of the parameters — exponentially harder to get to work, basically. You can do all this work and you're gonna find that you may make it work then, but then if you tried the simpler approach and it works, you just gonna feel really embarrassed. That should just teach you that you should always start with the simplest thing. And then you could try these more complicated things. But if you cannot beat the simple thing, after a while your warmth for this simple thing increases and you're like, "Actually, maybe I should just use a simple thing," and you kind of learn to appreciate the simple things. I think this is one of the core things I always look for is, are we trying the simplest thing possible? Because that's probably the thing that's going to work
Outro
Lukas:
Well, what a great way to end. Thank you so much, Peter.
Peter:
Thank you so much. It was great being on your show. Thank you so much.
Add a comment
Tags: Podcast, Gradient Dissent
Iterate on AI agents and models faster. Try Weights & Biases today.