Pieter Abbeel — Robotics, Startups, and Robotics Startups
Pieter talks about the state of affairs and challenges of robotics in 2021, and shares the stories behind founding Gradescope and Covariant.
About this episode
Pieter is the Chief Scientist and Co-founder at Covariant, where his team is building universal AI for robotic manipulation. Pieter also hosts The Robot Brains Podcast, in which he explores how far humanity has come in its mission to create conscious computers, mindful machines, and rational robots.
Lukas and Pieter explore the state of affairs of robotics in 2021, the challenges of achieving consistency and reliability, and what it'll take to make robotics more ubiquitous. Pieter also shares some perspective on entrepreneurship, from how he knew it was time to commercialize Gradescope to what he looks for in co-founders to why he started Covariant.
Connect with Pieter
1:15 The challenges of robotics
8:10 Progress in robotics
13:34 Imitation learning and reinforcement learning
21:37 Simulated data, real data, and reliability
27:53 The increasing capabilities of robotics
36:23 Entrepreneurship and co-founding Gradescope
44:35 The story behind Covariant
47:50 Pieter's communication tips
52:13 What Pieter's currently excited about
55:08 Focusing on good UI and high reliability
Watch on YouTube
Note: Transcriptions are provided by a third-party service, and may contain some inaccuracies. Please submit any corrections to firstname.lastname@example.org. Thank you!
Kids don't get scored, there's no reward, no feedback most of the time. They just play. Can our agents or our robots do the same thing, just play around in the environment? Essentially that is the reinforcement learning equivalent of pre-training. We know in computer vision, natural language processing, unsupervised pre-training is what powers all the latest and greatest models, so can we do the same thing in RL? Well, that means it has to be reward-free.
You're listening to Gradient Dissent, a show about machine learning in the real world, and I'm your host, Lukas Biewald.
I was really looking forward to talking to Pieter Abbeel. He's one of the most well-known people in machine learning today and one of the most cited professors. He's the director of the Berkeley Robot Learning Lab and always has really interesting insights on the state of the art of reinforcement learning and robotics, two things that are obviously some of my favorite topics. He's also a very successful entrepreneur. He started several companies, including Gradescope, which sold to Turnitin a few years ago, and Covariant, which is a company trying to build a universal AI for robotic manipulation. So much to talk about. This is a really fun conversation.
The challenges of robotics
I guess the place I was interested in starting, if you're up for it, is you've been doing robotics for a long time and it feels like robotics is one of those fields that I think felt harder, or maybe it's unexpectedly hard it seems to me, to get robots to navigate our world. Has it been surprising to you, the challenge of making robots manipulate the world, or was it obvious to you when you started on it that it was going to be a huge challenge?
Yeah, I think you go through phases as you work on problems. At the beginning, maybe it's like, "Okay, we can do this," and a few months later, you're like, "Wow, this could be harder than expected," and you go back and forth for a while. But maybe the way I'm seeing it is that our world has so much variation in it. There's always something new and whether it's for driving or for robotic manipulation. Driving, its new...driving scenes, new things you encounter, especially in cities, when you're driving. For manipulation, it's just new objects or objects in different configurations surrounded by other objects. I think it's really intriguing how essentially sometimes once you measure the amount of variability you need to be able to deal with, you essentially know how hard a problem is going to be.
So when you look at robotics where it's successful....you can think of car manufacturing, it's very successful. I don't think we could buy cars at current prices if we didn't have robots helping out, and the same consistency, and so forth. You look at the assembly line, and you walk in there, and you're just stunned. You're like, "Wow, what's going on here? The robots are building the entire car." But then when you look carefully, you see, " Oh, these robots are amazing but they're doing a very precisely orchestrated motion over, and over, and over."
That's what they're so good at, but then when you want to take it beyond that, all of a sudden you need a whole lot of progress in AI to actually do it because the robot needs to see, understand what it's looking at, make sense in terms of what decisions to make, react to things quickly. And all of a sudden, it becomes so much harder. I think going from robots that do repeated motion to robots that actually understand the world and have to react to it, it's a very large gap. That's pretty clear at this point.
Sitting here in August 2021, how would you describe the state of the art in robotics? I feel like I hear such conflicting messages. People talk about how laundry folding is this impossible challenge, and yet you can see videos of robots folding laundry super well. I feel like OpenAI had the thing where they manipulated a Rubik's cube, but it was funny showing that to my wife, who was kind of like of all the different OpenAI things they've put out, in a way this is the least impressive. A child could do that too. She was reflecting back to me, "Why are you so excited about watching a robot manipulate a Rubik's cube?" What's your sense on what's possible and not possible today?
Yeah, I like what you're saying there, Lukas, because I think the reason it's so confusing of what's possible and not possible is because a lot of the challenges are in achieving consistency. It's like that in many parts of life. I mean imagine you like playing sports and you make one free throw, you might be like, "Okay, I can make a free throw." You make a video of that one free throw you made and that's the only video you ever play on repeat, now it looks like you're making every free throw, but actually being consistent is what's hard. Same for robotics applications, consistently being successful is really, really hard. Part of it is, of course, how we all communicate our early successes, but once we have an early success, of course we're excited.
First time a robot folds laundry — that example, of course, is from my lab and it's close to my heart because I love robots folding laundry — and you watch the video and it's very impressive, but then once you look at the details, it's like, "Okay, but it's always towels of a limited range and size, and it's in a very specific lab set up where there's a nice table for the robot to fold on, and it starts with a pile not too far away." And you start realizing there's just a lot of specifics in place that make it not as general as it might seem, and I think that's where it's always tricky.
Sometimes when you see a robot do something, it's impressive, like solving a Rubik's cube, folding a towel, maybe picking something from a bin — which is actually a really important problem — you see it and I think most people will generalize and will say, "Hey, if a person can do that, that means they can do all these other things also," and that's actually not the case.
We kinda know that. That's the funny thing. We know that because nobody takes a self-driving car to our DMV driving test and is like, "Oh wow, you solved the written perfectly. That's impressive. Oh, you did a two-minute drive around the block. That's impressive," because we know it doesn't generalize the same way. And this notion of generalization that's in our head often when we see a robot do something makes us think, "Oh wow, we're so far along," but then when you start putting into practice you want to commercialize it as, of course, we're doing at Covariant, you realize it's all about the number of nines of reliability you achieve.
I think that's where robotics is a bit more unique than most other AI applications because for most AI applications, you don't need that many nines to be helpful. If you have a spam filter and it removes even just 30% of your spam, that's already a help. Of course, you want it to be better, but it's already helpful. But if a robot only succeeds 30% of the time, that means 70% of the time it's probably making a mess that takes even more effort to clean up than it was helping out.
It's one of those things where the number of nines of reliability you need with a robot is just very high before they become actually valuable. I think that's where it's so hard to gauge that because you see this cool 30-second clip and you're like, "Wow, seems like things are good," and then you realize well, if you watch it for an hour uninterrupted, it's not consistently that good, and now all of a sudden it's not viable.
Progress in robotics
And so to get those higher levels of accuracy...maybe let's talk about Covariant, if you're up for it. I mean obviously very outwardly successful startup. Are there breakthroughs that need to happen...is it science to get those extra levels of accuracy or is it fixing lots of little details? Does it feel more like engineering?
I think it's actually both. I think you cannot get away with just doing one of them. So if we look at Covariant, what are we doing? Essentially, we're trying to build a brain for robots to be able to see what's around them, and react to it, and do interesting things in the world. We concluded that the first, most relevant space to go into is warehousing and logistics, it means a robot doing pick and place. You go online, you order something, something has to be retrieved, put into a box.
Well, there's a lot of essentially robots on wheels for many, many years doing the long distance travel of all the goods in these warehouses, but there isn't really manipulation robots helping out with the specific "Pick one object, place it there, pack it," and so forth. So that's the first thing we're focused on. When you try to chase the level of reliability that you need, and of course also speed of operation — if you're reliable at one item per day, I mean that's also not very useful. You want speed and reliability — what you realize is that well, first of all, even though we're an AI company, you can't ignore the hardware.
If your hardware cannot pick something up, it's just not going to happen. A lot of things can be picked with suction cups, and that's a lot of what we do, but then if it doesn't match that paradigm, let's say you have something that's netting of some kind, it has to be rerouted somewhere else. You also have to recognize that this is not for our robot, reroute to a place where it can be picked with something else than a suction cup. But then when you get to the difficulties, what you realize is that you encounter a lot of things that you encounter in every machine learning application, and that is you typically need more data than you initially anticipated to train on, and so you need a way to collect that data.
In robotics, collecting data means either going out in the physical world or having really good simulation. Simulation is getting better and better and definitely helps, but again, you actually need real world data. Now, you get this chicken and egg problem, because if you want actual real world data outside of your lab or office environment, you actually need to have something that kind of works. Otherwise who's going to let you collect the data?
It's a big challenge to bootstrap in, which of course we've been doing over the last couple of years. Once you're getting the data to come in, in some sense a whole lot of fun starts because a lot of people will say, "Oh, just bigger data, it'll just work," but that's actually not true. Once you start chasing many nines of reliability, you'll realize that you actually also need to re-architect the neural nets very often to be able to absorb that data, choose a new loss function, find a different way to annotate to provide more signal to actually get to the levels that you want to get to. It's really a mix of a lot of these things. How you collect your data, how you annotate it, what are the right losses to maximize signal you get, and then what are the neural net architectures that can actually absorb that signal?
If you had started Covariant five years ago or a decade ago, would it have worked as well as it does now? What's changed in the state of the art that allows this application to work?
The big reason I believed it would be possible to do this now, with the right effort, and we're also proving it possible, is essentially the combination of progress in computer vision and in imitation/reinforcement learning. Computer vision...naturally if the robot can't see, it's not going to pick and place reliably objects in a warehouse. I mean if you can't see them, how are you going to pick them? These are unstructured environments in many, many ways.
But vision, of course, is not enough. You need to understand how you're going to interact with these objects. Is this one that you can pick or will it actually fling others out of the bin? Is this a reliable way to pick it or will you rip the thing apart by picking it over there? And so forth. There's a imitation learning, reinforcement learning aspect to that that is also really important, and what we saw in 2015, '16, '17 was just a ton of progress on all fronts of vision, imitation, reinforcement learning that made us believe that with the right additional push that's really oriented towards bringing it into the real world, that this should work. Luckily, yes, we were right and it is working.
Imitation learning and reinforcement learning
That's fantastic. Can you talk a little bit about imitation learning? I think a lot of people talk about it but obviously you've done a lot of work in that field.
Yeah, so maybe the place I would start is the thing often people are most excited about is reinforcement learning because it's learning from scratch. Your robot or your agent is just trial-and-error learning, and gets scored periodically based on how well it's doing, and then gets better over time. And of course, famous results...learning to play Atari games out at DeepMind, and then out of my lab at Berkeley robots learning to run, to get up, and things like that.
Now, the tricky thing with reinforcement learning, despite all its beauty and I think being probably the most beautiful discipline to work in, is it can be very slow. Learning from scratch, it takes forever for most problems. In practice, I often think imitation learning as a way to bootstrap reinforcement learning agents who can keep learning on their own, but by first getting some examples from humans of how something is done and imitating that, which actually at its core, in many situations, comes down to supervised learning.
So, a person might maybe joystick a robot, and in doing so, they are telling the robot every moment what action they should apply to each joint or to each motor, and then the robot will actually do supervised learning. Train a neural network from what it's seeing to the action the person commanded, and by doing so that supervised learning model will actually be pretty good at generating similar behaviors to what the person was doing. The beauty is once you have that, it turns out you have a really good starting point. And a lot of the challenges in reinforcement learning are about having a good starting point.
Reinforcement learning will be very effective at fine tuning and zoning in on the final details of your solution. But the exploration problems are really hard. And you kind of sidestep it by saying, "Hey, humans know what the solution roughly looks like, no need to explore everything. Let's just imitate humans." I think imitation learning is, for practical purposes, almost guaranteed the way to go in many applications, at least as a starting point, even though as you know in academic research, there's a lot more focus on reinforcement learning because it's in some sense the harder, more open-ended problem.
So for example, in a factory you'd have somebody control the robot, aim a suction, pick up a thing. The robot starts there by trying to imitate the actions that the humans taking and then you switch to reinforcement learning strategy once it's close to something good. Is that a good summary?
That is one way to picture it. Then in practice, often you take further shortcuts because...Imagine you are joysticking a robot to pick something up. Well, could you be even faster by maybe not even using the robot, but just maybe in your hand, holding the suction cup that the robot would be using and going there. If you can track that motion as a human, you can demonstrate so much faster.
Or, if you are very confident about how things are done, you might say at times, "Hey, maybe for some situations, all I need is see the images and in the images I can already annotate directly where I would be doing what." There's a trade-off between very precisely demonstrating what the robot should be doing, which is very informative, but might be time consuming to get that informative signal.
The amount of information you get per time spent might still be low, even though each single demonstration has a lot of information versus high throughput data collection where you annotate fast or maybe a bit more noisy, but there's so much more data that it makes up for it. There's always a lot of choices to be made there and it's not a clear-cut, easy decision for any problem, but I think it is easy to know that you should definitely consider the spectrum of speeding up your data collection and maybe having a bit more noise on it because the neural net will be robust in that.
So then what happens? Because I picture reinforcement learning to beat every human at go. It seems like such a powerful strategy, especially the way you've laid it out, it makes so much sense. The robot tries the thing and gets feedback and then improves. What does it run into when you're actually in a factory and you're actually trying to pick something up, what errors creep in or why isn't the problem just totally solved right now by that strategy?
When I look at the deployments we do at Covariant and what makes them successful and also hard at times, what I see is that even though some things might look very similar to humans, you might think for a person — let's say doing pick in place of pharmaceutical goods versus groceries versus apparel versus electrical supplies versus shoe boxes versus anything else — it's all the same for humans. One person can do one thing, they can also do the other thing. Same day, no problem.
But when I look at our today's learning systems, when you switch from one industry to another, actually you need a good amount of new data for that. What I see is that within industries that we've already collected a lot of data, my sense is that I would be surprised if it wouldn't work on the next one. But when you go to a new industry, there's a whole new data collection. Of course, at some point we'll have checked off all industries and it'll be all taken care of, but there's a way humans generalize that is still a bit different from the way I think our today's most viable AI robotic systems generalize.
I do think it actually ties back to where a lot of the academic research is these days, which is massive pre-training and then fine tuning on new tasks. But if you want to get to many nines of reliability, just massive pre-training on arbitrary data and then just some quick small data regime fine tuning, it's not clear that, that's working today. It'll work for 90%, sure you can get to 90% quickly, but 99.9, it's harder to get to that.
I would say that's really...in robotic automation that's vision AI-powered, I would say getting to somewhere 99.5 to 99.9 tends to often be the sweet spot where things become commercially viable, where the amount of supervision required from a person becomes essentially negligible, and it's really a robot doing the work rather than the supervisor actually being just as busy as they were before when there was no robot.
Obviously self-driving cars, you need even more nines, and that makes it, I think, even harder. But get to 99.5, 99.9 that's where things become viable. I think that's just the challenge, ultimately. We call that a challenge of autonomy. It doesn't mean 100% always being exactly right, but getting that 99.5, 99.9 spot in the industry that the robot is operating in is really key.
Simulated data, real data, and reliability
How much does simulation or simulated data or physical simulation matter here? I know a lot of people have been talking about it, but it's a little unclear to me if this is a theoretical thing or something that's really best practice in working in the real world right now. How do you think about that?
I think you probably remember from when you were spending some of your time at OpenAI, and Josh Tobin and I and you and Wojciech, we were working on this domain randomization approaches back in 2016, '17, right? I would say a couple lessons learned from that.
One is that simulation can be surprisingly helpful even when it's not super realistic. That was the domain randomization whole spiel and result was, "Wow, even when none of the renderings look realistic, training in these simulations does help a lot in the real world." I think that's still a very powerful thing to rely upon. Of course, some counterpart of that is doing large data augmentation on real data. But I think ultimately you do need real data, you can't just get away with today's simulated data to get to a high reliability for real world operation.
But again, that's part of what's so interesting and exciting about robotics is that you need these high reliabilities to be valuable, and this is going to be very different than other application domains that are pure software, where often there's a lot more room for error and already providing value.
What about something like laundry folding? I feel like a robot that could fold 90% of my clothes, I'd be pretty happy with that. I would buy that for sure.
Yeah. I think that's a very good point. Maybe we should revisit laundry folding some time and knock at your door and see if you'll take one from us.
There are some subtleties though that maybe...I think the big subtlety here is the following — and why things like robotics and self-driving have this high reliability requirements — is because it's not just that it's high reliability requirement, it's that when something goes wrong, often it causes a lot of work or a lot of damage.
It's not that...when you have 99.9%, let's say, typically it's not the case that that means 99.9 are going great and then one in a thousand is just the robot knows this one's too hard, come and help me out. It's more likely that the one in a thousand that goes wrong is something that the robot actually makes a mistake and maybe the wrong thing gets shipped off to somebody. It might say, "Okay, we can bound the cost on that, somebody gets the wrong item," or maybe somebody in a later station, that's more typical, would sanity check it and would fix it, and so it'd be some fixing work, but fixing work often is more work than doing the original work. And so you cause need for fixing work.
I think that's why it's so interesting. When you talk about laundry, if the threshold for you is if it folds at least a few things and the rest can still be a mess the way it comes out of the dryer, then I think it's a great case. But if you wanted everything folded and just one item, just sitting there on the side not folded, I don't think that how it's going to be. It's going to maybe have folded almost everything and at the end it makes a mistake and everything's back on the floor and it can't reach the floor and you have to come in and put it back on the table.
I mean, I'm seeing it a bit more cynical than it would actually be, but it's just that the reliability requirements are there for a reason typically. And that is because when it doesn't succeed, there was a lot of work related to fixing.
Right. And it does seem the....collecting training data that really represents the real world might be harder than something like ImageNet, or it does seem like there's not quite the same sets of useful data sets that everyone can play with, or am I wrong? Am I not aware of the stuff going on in the space?
I think it's not so clear how to collect data that's actually in distribution for these problems, unless you are actively in the space helping solve these problems.
We all know that today's machine learning works way better in distribution than out of distribution. I'd say there is a reason that we're building this as a commercial solution. One way to think of it is, and that's part of it is, it's awesome to put something in the world that actually works and helps out. That's part of the goal and why we are excited. But part of it is exactly what you're getting to, which is...my belief is that if we want to solve robotics in the foreseeable future, meaning not by first building an AGI and then let the AGI take care of it, but actually a more direct approach to solving robotics problems, I think the only way to do it is to get the right data by going into the real world problems with robots and collecting the exact data that's needed and then training on that data and then letting the robot run, improve based on what you see happen, and iterate.
I think without that iteration process, I don't think you can do it. Maybe somebody can figure out how to do it, who knows? I mean, I don't want to exclude somebody doing something really amazing, surprising somewhere, but my money is on that the way these near term AI robotics problems will be solved is by being very focused on real world deployment data collection and on that loop.
The increasing capabilities of robotics
Do you imagine a world where this starts to work and we suddenly have robots doing lots and lots of tasks around us? I feel like voice recognition has snuck up on me where, as a kid, it was incredibly annoying and now, it seems like most people use it for various tasks on their phone or Alexa. Do you sort of picture the same trajectory for robotics?
What I see happen is...sneak up is maybe one way to think of it. But what I see happen is a gradual increased capability in terms of where robots are viable. Until recently it was only repeated motion-type settings, carefully pre-programmed motions.
Now what we're seeing, I would say roughly this year, is a transition into feasibility for robots doing interesting things in warehouse. Pick and place type tasks. I'd say that's really the first place where robots are really looking at things, reacting to it, interacting with the objects, and then achieving something. In that sense, it is interesting because it's a first. It's where it's happening first and there is no reason it couldn't expand from there.
I mean, I can see all the work involved in terms of iterating over, and all that architectures, data collection, loss functions, all the things we do to get to the reliability we need to get to. But I also see that that same process in principle should apply just as well to other domains. I think, maybe agriculture, maybe some construction problems, maybe some more difficult manufacturing problems where it's not just repeated motion that can do it. I think any of those semi-structured environments, I would say, where maybe it's not directly interacting with people — because I think that's always much harder because people are very unpredictable — but in a kind of semi-structured environment, the robot can kind of do its thing. Yeah. I think it could grow relatively quickly in the foreseeable future. Yeah, we'll see.
It seems to me the funny thing about, or the counterintuitive thing about, software, unlike hardware, it's like once it's working, copying it is free, right? It makes sense that you would start in this really high value task in warehouses, but then if you could really pick things up, maybe I could let the robot loose in my house and clean some of this clutter out of here.
It's very true. I mean, I could imagine essentially a Roomba with an arm on top of it.
Seems useful, right? I don't know.
Yeah. If anything's on the floor, it picks it up and puts it into a basket somewhere, or maybe even knows where to go deliver it in the house. That kind of pick and place should actually be easier than the warehouse situation, because you don't have a clutter of objects, typically, it's just maybe an isolated object on the floor here, on the floor there.
I think a big part of it is also when you think about the economics, it comes down to what you said, software is very cheap to copy. So the question is when you go to hardware, if this robot picks things up from the floor and maybe it costs you, I don't know, a certain amount of money, well, unless it's doing that multiple times a day, you might not feel it's worth it, right?
That's also part of why I see robots arrive first in these semi-structured environments where...warehouse, there's like, all day things need to flow through. There's always a next thing to be picked. It's never quiet. Once you have the physical robot, well, we know robots don't tire, they just keep running. You get to leverage that aspect. To me, always when I think about household robots is where the equation is harder than in, let's say, a warehouse or maybe on a farm and so forth. But I mean, who knows? I mean, prices of robots will go down, right? So at some point that will be less of an issue.
It also seems like a robot...I mean, how much does a robot cost, with like a suction arm? It doesn't seem like it intrinsically needs to cost thousands of dollars, does it?
That's a really good question. It might depend a lot on what your performance requirements are for these robots. If you look at these car manufacturing robots, some of them are very expensive. They can easily cost $100,000, but also they can pick up an entire car. So that's a very strong robot and it can do it for 10 years in a row, every minute pick up a new car. You get that kind of reliability and strength.
It's not going to be super cheap. But then if you need to pick up just a toy from the floor, that doesn't put a whole lot of strain on the joints of the motors of the robot. And if it's cheaper, you might be happy to replace it every year. Why not?
I think there's kind of a spectrum of robots to be built, and today's robots that are mostly out there are on the end of the spectrum where it's like, it should work for 10, 20 years and minimal maintenance. It should just work because it's part of an assembly line that should never come to a halt, because that'll cost so much if it comes to a halt. But you're absolutely right. Once you go into homes, the design space that you're working in is very different. If it doesn't work for a day, it might not be a problem. If it doesn't work in a car manufacturing line for a day, that might cost millions.
Well, it seems also, at least from my very limited work that I did with you at OpenAI, that there was this real sense that the robots that they were using had incredibly precise calibration. Like it could go to exactly an X, Y, Z coordinate. It sort of felt like, with machine learning and actually, if you can look at where you went, you could have a little less precision in the hardware and maybe actually even learn how to manipulate yourself and deal with maybe less perfect motors, right? I was surprised that there wasn't more work in that area.
You're absolutely right in that once you have a vision feedback loop, you can actually put a lot less strain in terms of repeatable motion. I mean, some of these robots have sub-millimeter precision, some of them even micrometer-precision repeated motion. When you can see...I mean, humans don't have that. You can't repeatedly reach the same point unless you have feedback. You feel you're making contact with something and based on that you adjust and you get the exact right spot that you want. It's a very good point, and I think it's a part of design space that hasn't been explored that much.
At Berkeley actually, for a little while, we were working on this project called the Blue Robot where we were doing exactly that. The thinking was, if we have better intelligence of the robot, we don't need the same kind of blind precision. We were actually able to bring price down for an arm quite a bit. I think it was maybe in parts even bought at small scale, maybe $2,000 or even a little less for a seven degree of freedom arm with a parallel jaw gripper.
You could imagine if you buy these parts at scale you can maybe cut the price in half, and now you're down to $1,000 for an arm. Would it work for 10 years straight like some of the industrial robots? Probably not. Does it move as fast with blind precision? Definitely not, but do we need that for a practical home application? Probably not either. So yeah, I think it's a very interesting direction. We've kind of paused that project for a little bit, but I think there's a lot of opportunity to keep pushing that direction.
Entrepreneurship and co-founding Gradescope
Awesome. Well, I want to also use some of this time for the questions that we got when we crowdsourced the question collection. We were asking folks in our community what should we ask you, and I was kind of surprised because to me, you're a researcher, but I realized to a lot of people, you're more of an entrepreneur than a researcher, and a lot of the questions are around how you think about starting companies. So I wonder if you could just sort of say...You've started all these different companies, if you could sort of say what your process is of thinking of what to start and how you get something off the ground. I think a lot of people would love to hear that.
Sure. Yeah. I'll start with some concrete examples because I think trying to be general is hard in this space.
How did we start Gradescope, right? Gradescope is a company that right now provides AI to help any kind of instructors, teaching assistants with grading of their student work, whether it's exams or homework or projects and so forth. The way it started is essentially out of a need. And a lot of entrepreneurs will say that why they started their company, but it was our personal need as a professor at Berkeley and my teaching assistant at the time, Arjun Singh, other teaching assistants, Sergey Karayev.
We were looking at what we're doing with the grading work. And we feel like we have these stacks of paper and we're kind of passing these along and we all need to come together in the same room to be able to grade. Or we need to pass this along and it'll be a long delay because one person grades one day, the other person the next day. And so I'm like, well, if we just were to scan everything, we could just grade this on our laptops.
For me, one of the quirky things I was excited about...some TAs are kind of clever, not Arjun and Sergey, I mean, they're very smart, but I mean are clever in a selfish way I would say, and they will book a flight that leaves the university right after their last final exam. And they're like, "Oh, sorry, I can't help with grading. I'll already be at home. I can't access the stack of exams." And so I'm like, well, this'll be great because even the people who book their flights early will be able to help with grading.
The initial thinking was just, can we make the, in some sense, the user interface of grading better. It wasn't actually starting from, "Let's automate this." It was, "If we scan, people can grade from wherever they are." And on the first exam thereafter...one of our TAs was grading in-flight. He was super excited. He's like, "Hey, I'm grading in flight. This is so cool."
We found right away that it was really helpful to do everything digital rather than physical. At the time it was just a project we used in our own class. Because it went so well, all the TAs were so happy, we passed it on to a bunch of my friends, professors at Berkeley, and said, "Hey, do you want to try it for your class because the TAs really liked it?"
I think the next exam round there was maybe 5 to 10 professors using it in their class and they were really happy, but also, they gave us a lot of feedback about things they were not perfectly happy with. Of course, we'd go out and fix things. That was kind of the early days where we were, within Berkeley, just letting people use it and see what they think. We hadn't planned to make a company per se, but it was definitely on our minds. Like, if this goes well, if people really like it, or we can see a path to make people like it, we should make it into a company probably. But for now, let's just see if we can even build a product people want to use. To start off, ourselves but then we're probably representative of many others in the instructional space.
And then some interesting things started to happen that got us even more motivated because you start, as I'm sure you've experienced yourself, you build something, you build it for yourself. Other people use it the same way, but then people start using it in different ways. And that's when things get really exciting.
The chemistry professors came back to us and they said, "Actually..." They do quizzes pretty much every week for these pre-med students that need to be very calibrated by the end of their undergrad. They said, "Hey, thanks to your system we can grade this much faster, but actually, we're not interested in grading faster. We want to spend the same amount of time on grading and now we can ask much deeper questions and still grade them at the same time that we could grade before. We can move away from kind of canonical multiple choice, which is the only thing we could do before, and do all these other things."
And so that's where things get exciting, I think, is you see people pick up on what you're building and doing new things with it. Pretty soon thereafter, we actually started as a company. As a very explicit decision, let's make it a company. And we saw the path, of course, as we get more and more data to also build AI behind it. And so a very product-driven start.
Other things, if I generalize a little bit, one is product-driven. The other thing I would say is, for me at least, a lot of the fun in everything I do comes from working with great people. For every major project, especially a company I've started, that's pretty much the starting point. Like, am I going to do it with people that I really admire, that I know anything they do I can trust is of the highest quality? I never have to think twice about anything anybody else is doing on the team.
We're just going to have...we're going to move very fast, because I think that's a big part of it. If you're a startup, I think the only way to succeed is to move very fast. If you move slow, it's likely not going to be a success. I think being a team that's really motivated and really qualified to move fast on what you're doing is really key. Otherwise, it's probably a losing proposition.
I would imagine you have a lot of practice with trying to find great people, like just looking for RAs for your lab, but is there anything different that you look for in terms of someone to start a company with, versus someone that you want to do research with, versus the way you grade students? Are those different qualities?
Oh, they're very different. I mean, obviously there are some people who are great at everything. That's going to happen. But ultimately, I think what matters for a startup is...I mean, you know this as good as anybody else might know it, but I mean, whoever has never started a company before, it always takes longer than you think.
You think you're going to build this company and in a few months it'll be this massive thing because clearly, it should only take a few months to build this thing, but it's really always in terms of a few years, or even like...If you look at any company that is actually big, big, it tends to take 5 to 10 years to get to really big size, right? That's a long time. So you need to look for people that actually are really excited about what you're going to go for.
For example, at Gradescope, everybody was really excited about helping instructors and always listening to the instructors. Of course, really good builders of what we're trying to build, but also really excited to understand what they want, what they need, how they can be better served. Obviously, that doesn't mean implementing every feature request that comes your way. Some requests are very noisy expressions of what they actually want and need. You got to interpret those requests. But being really passionate about the longterm is, I think, just critical because it takes a long time.
Same at Covariant, we're all super excited about what robots can do in the world, how they can be so helpful in so many places. I think if you're not, it's hard to stick with something, because if you just do it because, "Oh, this is going to be a quick success," it's essentially never a quick success. It tends to be a long, long fought success rather than a quick one typically.
The story behind Covariant
At Covariant, did you also start with a particular problem in mind or was that more like, we want to do robotic stuff and then we're looking for a problem that suits it?
It's actually very interesting what happened there. I mean, for the different founders, there's a different story of how they got to wanting to start Covariant. Four founders, so myself, then Rocky Duan, Peter Chen, Tianhao Zhang. All three of them were undergrads at Berkeley, then PhD students at Berkeley. Rocky and Peter spent time at OpenAI before starting Covariant. Everybody had a different view on things, but for me personally the reason I got to it is essentially back in 2016, 2017, it felt like that transition point where I'd been working on robotics for so long and how AI can make robotics more capable, and it just seemed that all of a sudden things were becoming possible.
It was this, in some sense, technology enabler that really got me excited. It's clear if you have capable robots that they can be very useful, but as long as they're not capable, well, they're not useful. It seemed right at that time, okay, it's not possible today in 2016, 2017, but with the right effort I think we can do some really amazing things and the path should be very feasible in the next several years to get to viable robots that are smart and do new things that weren't possible before.
For me, in some sense that was a career-long passion almost. Working for...PhD on AI robotics, then as a professor for many years, and then, "Hey, this might actually be practical now. I want to take that next step and build a company." But then also at the same time I was like, "I cannot build a company on my own. It's not going to be successful. I want to build it with other people."
Actually, I emailed around to my students. I don't know if it was just current students or current and former students, I forget. I remember sending an email and saying, "Hey, I think AI is at an interesting point where a lot of applications are becoming possible and I'm curious if anybody is thinking the same and maybe would be excited to take something into the real world, rather than stay focused on writing the next papers."
Rocky and Peter replied saying they had been thinking the same thing, that the time is now to do something like that. And Tianhao at the time, in my labm had the most impressive and relevant project breakthrough. And so we also went to talk to Tianhao, we were like, "Tianhao, what do you think?"
He was a few months into his PhD. He was not planning, I think, to do that short a stint as a PhD student and already go do something else. But once we talked with him he got really excited and, yeah, the four of us took it from there.
Pieter's communication tips
That's cool. One more question that I really wanted to make sure I got to is from an interview that I found earlier where you talked about how Andrew Ng, your advisor, told you to take a class on communication or improve your communication skills. You talked about how you think you're more of a communicator than anything else.
I guess I was curious, do you feel...I have noticed that your communication skills are very good and do seem to be improved since I first knew you. Do you feel there's been little tricks that you've learned that have made you a better communicator, or is it really just practice, or do you have any advice to people wanting to become better communicators?
Yeah. There's different kinds of communication, of course. One of them is written and the other one is verbal, at least these are the two main ones for me.
For writing, actually, here's how the story goes. I'm a PhD student and I try to write my papers and I bring my copies of my drafts to Andrew Ng, he's my PhD advisor. He just looks at them and he says, "I'll take a look," and then he gets back to me later and Andrew says, "Yeah, no, really good draft. Great shape. I just left a few comments." I go get my copy back and the copy is more red than black. And I'm like, "Okay, that's what Andrew calls just a few comments, something that's already in great shape."
I look at all his comments and I'm just, "Okay, these comments are great. I mean, these are no-brainers, I should just incorporate all these comments. He just knows better. This is how I should be doing it." But I had a hard time seeing the pattern in terms of how to do it myself. I'm looking at him like, "Yes, Andrew's feedback is always making it better and I can easily tell it's making it better, but I don't know how to generate that."
And then actually I went to...I was at Stanford at the time and I go to Stanford bookstore and I browse all the writing classes, books that the professors for writing classes were recommending students to buy. I browse essentially all of them in the bookstore, not reading but quick browsing. And then I took three of them home and I read them quite thoroughly and some of them had exercises and I worked on it quite thoroughly. There's one by Williams called "Lessons in Clarity and Grace".
That one, when I started working through that one, it was just like everything made sense. Literally every comment Andrew had left on my paper drafts, it was just a thing they explained. Like, "This is something you want to pay attention to in writing, this is the way you want to structure your sentence or your paragraph, or the sequence of sentences, all that stuff." And I was just, all of a sudden like, "I think I can do this now." That book was really eye-opening.
Verbal communication, let's see. One part of it is just...I mean, the nature of my job is a lot of practice, that's for sure. Same for writing of course. I think maybe there is...even in verbal communication there's different things. There's one-on-one communication and there is group communication.
I think one-on-one is usually easier for most people just because, well, it's a conversation back and forth. In terms of group communication, I think that the main thing I've learned to pay attention to — and it's a very simple thing but it helps a lot — is just if anybody already hasn't spoken up in a meeting and just checking in with them. Obviously not blatantly putting them on the spot and making them feel awkward if they have nothing they want to say, but finding ways to make sure people who are not speaking up, maybe wanted to speak up but just feel they didn't get the opportunity, I think that's just a really helpful thing to get many more ideas to surface in any meeting.
Oh, cool. Thank you. That's super useful.
What Pieter's currently excited about
We always end with two questions. The second-to-last one we always end with is what's a topic in machine learning that you think doesn't get the attention that it deserves? A topic you would work on if you had a little bit of extra time to explore something.
I would argue as a professor with that hat on at Berkeley, there is always opportunity to explore new things because new students come in all the time asking for projects. It's not like there is projects that are just sitting there waiting because there's always new students who want to work on things, but maybe I'll twist the question a little bit and I'll say some of the recent things I'm most excited about that we've started working on.
One of them is play, or formalizing how kids play in reinforcement learning. This notion that kids don't get scored, there's no reward, no feedback most of the time, they just play. Can our agents or our robots do the same thing, just play around in the environment?
Essentially that is the reinforcement learning equivalent of pre-training. And we know in computer vision, natural language processing, unsupervised pre-training is what powers all the latest and greatest models. Can we do the same thing in RL? Well, that means it has to be reward-free, some kind of play. I think that's a really exciting area.
The other area I'm really excited about, and for me was sometimes the most surprising result this past year in my own research, was the "pre-trained Transformers as universal competition engines" paper that was led by Kevin Liu. The idea there was that, "Hey, Transformers are so good at being pre-trained language models. What if we just take a pre-trained language model, we put one linear layer in front, one linear layer in the back, but now the input's going to be an image and the output's going to be a classification of an image? Or the input's going to be a protein sequence, the output's going to be some property of the protein sequence?"
And it actually kind of worked. Which is really surprising to me because it means that somehow all these pre-trained layers which were frozen for that new modality, somehow...well, we don't really understand it but what in my mind is happening is something where it has a general compute pattern, a general pattern recognition in it that generalizes across different sensory modalities, which is really, really cool. I mean, of course it's better to train the whole network on the specific modality but the fact that it already does quite well when it's a frozen pre-trained Transformer on a different modality really surprised me and is something that I'm excited to keep digging into.
That really is amazing and evocative. I can see why you're excited about that.
Focusing on good UI and high reliability
Finally, what in your experience has been the hardest parts of getting machine learning models to actually work in the real world? You've done it now at several different companies. What are the surprising pitfalls when you take a model that you've trained that seems to be working, and then you try to build an actual useful thing around it?
Yeah, it varies a lot. The cases I know best are of course Gradescope and Covariant.
At Gradescope, essentially the way we did it is we build really good user interfaces around the model. Train models to effectively automate grading, but we knew in the beginning they're never going to work that well, definitely not 99.99 performance. Much lower than that. And so we spent so much time on the UI of have it proposing things to the grader and then the grader can correct it and be really, really fast at getting through things. But this human interface to supervise all the decisions was where, I would say, at least as much effort went into that and making that really good as went into the machine learning models behind it.
I would say at Covariant it's in some sense very similar but also a bit different. You can't just say, "Oh, we're going to put a great UI on it," even when it's just in its beginning reliabilities. It already has to be very high reliability. There, I think what's been interesting is, for me, that I've never in any other capacity chased multiple nines of reliability on any problem and just that notion is just so different and it's been so interesting to go after that.
Awesome. Well, thank you very much. This is super fun.
Yeah. Same here, Lukas. Thanks for having me.
If you're enjoying these interviews and you want to learn more, please click on the link to the show notes in the description where you can find links to all the papers that are mentioned, supplemental material, and a transcription that we work really hard to produce, so check it out.