Peter Norvig, Google’s Director of Research – singularity is in the eye of the beholder
We're thrilled to have Peter Norvig join us to talk about the evolution of deep learning, his industry-defining book, his work at Google, and what he thinks the future holds for machine learning research.
View all podcasts
Gradient Dissent - a Machine Learning Podcast · Evolution of Reinforcement Learning and the Robot Hand
BIO

Peter Norvig is a Director of Research at Google Inc; previously he directed Google's core search algorithms group. He is co-author of Artificial Intelligence: A Modern Approach, the leading textbook in the field, and co-teacher of an Artificial Intelligence class that signed up 160,000. Prior to his work at Google, Norvig was NASA's chief computer scientist.

Peter's website:
norvig.com/

TRANSCRIPT

Topics covered:
0:00 Sneak peek
0:35 intro
2:29 How spaCy was started
6:11 Business model, open source
9:55 What was spaCy designed to solve?
12:23 advances in NLP and modern practices in industry
17:19 what differentiates spaCy from a more research focused NLP library?
19:28 Multi-lingual/domain specific support
23:52 spaCy V3 configuration
28:16 Thoughts on Python, Syphon, other programming languages for ML
33:45 Making things clear and reproducible
37:30 prodigy and getting good training data
44:09 most underrated aspect of ML
51:00 hardest part of putting models into production

Peter:

So one thing is singularity is in the eye of the beholder.

Lukas:

Sure.

Peter:

So if you're Kurzweil all the curves are exponential and they're going up. And right now is a special time. But if you've got log paper, then all the lines are straight lines and there's nothing special about right now. It was a straight line yesterday and it'll be a straight line tomorrow. And so I guess that's more of my philosophical point of view.

Lukas:

You're listening to Gradient Dissent, a show where we learn about making machine learning models work in the real world. I'm your host Lukas Biewald. I've admired Peter Norvig for a long time. He's a director of research at Google and before that, he directed Google's core search algorithms group. He also wrote Artificial Intelligence: A Modern Approach, which is probably the most famous textbook in artificial intelligence and the one that I used when I was first getting into the field. Prior to his work at Google, Norvig was NASA's chief computer scientist. I could not be more excited to talk to him today.

Lukas:

Peter, thanks so much for taking the time to do this. You keep coming up in my life. I love doing Project Euler and you have an incredibly useful set of Python libraries, like little Python libraries that really help with the Project Euler challenges. I was wondering how many of the Project Euler problems that you've done and if you have a favorite one or one that was memorable to you?

Peter:

Yeah, I guess I lost count of how many I've done because they had a problem once where they had a security breach and then they changed all the passwords or something. And if you didn't get the message then, within a month or two, then your account got locked out. So that happened to me. And so I lost sort of my main account. And then every now and then I did one or two, but I lost contact with that. So I've been doing other things like Advent Of Code and so on, but my utilities are still out there. I might sometime actually publish my answers because originally they said they didn't want anybody publishing answers, but I think they've given up on that because they realized that they're all out there anyways. So maybe I'll go back and clean those up a little bit and publish them someday.

I guess I had a lot of fun with the Monopoly simulation just because I could get it down to a half a page. And it seems at first that the rules of Monopoly are really complicated, but if you're only worrying about the part about the question was asking what's the probability distribution over the squares. If you're only worried about that part, you can simplify it a lot. So I remember that one as being fun.

Lukas:

I remember doing that one. Do you have a favorite, in Advent Of Code and your pytudes repository, do you have a favorite piece of work there?

Peter:

I guess in the pytudes, I like a couple of different ones on probability. I think I have like three different notebooks there. And one of the things I think is really interesting is I've changed the way I think about probability, or at least the way we teach it. And one of the things that really struck me is I went through and I said, "Here's a bunch of problems, the typical kinds of things where you're dealing cards or you're picking colored urns out of a bowl or marbles out of a bowl or whatever." And then they came to one where in the textbook, they said, "Okay, now this one's going to be completely different because this is a Bayesian one where you have to reason backwards." And you show the math, and there's like 10 lines of equations and you get the answer.

And I was going through and I said, "I don't have to do anything different. I can solve this in exactly the same way I solved all the other ones. And there's nothing special about it being Bayesian." And the reason they said it was special was look, you can solve it in only 10 lines of math using this formula. And if you didn't have that formula, you'd have to consider a million different cases. And that would be completely infeasible. But I looked at it and I said, "Well, this is so simple. There's only a million different cases. So why do anything different? Why not just think of it as exactly the same as all the other questions? And only this time the computer has to enumerate over a million values rather than over a hundred values or whatever."

So I thought that was interesting. It changed the way you think about how you solve these problems. And all of a sudden, problems that look like they were different are actually exactly the same. Just go ahead and enumerate them. As long as they're discrete problems, you can almost always do that.

Lukas:

Reminds me of that problem, have you heard of the problem that apparently it didn't fool John Von Neumann, where the dog runs back and forth between the two people that walk together? And he just summed the infinite series...

Peter:

So he's got a gigahertz computer in his head, so he could solve it a different way than everybody else.

Lukas:

Maybe we should call that the problem that didn't fool Peter Norvig. So one thing I really was curious about is I saw the new version of your famous textbook came out this year, right?

Peter:

Yeah.

Lukas:

The Artificial Intelligence" A Modern Approach. I was wondering, I feel like one thing that everyone talks about is how hard it is to stay current on all the kind of topics happening with the vast amount of research. And I'm wondering what the process was like for you to even just picking the topics that felt relevant for a textbook.

Peter:

Yeah. So it's really hard, both in terms of staying up to date, because things are moving so quickly, both because just it's an exciting time and things are happening and also because the process is a lot faster now, because previous editions, we only had to read the journal articles and the conference proceedings. And now you have to look at archive every day.

Lukas:

Right.

Peter:

So things move a lot faster. I remember I complained to David Silver of Deep Mind and AlphaGo. I said, "Here's the third time you made me rewrite the chapter on games and I better finish the book quick or else you're going to do it a fourth time."

Lukas:

Were there any topics that you kind of regretted not being able to include in the book?

Peter:

I guess I regretted not having better answers. I think things, they're still in a state of flux. So we added a chapter on probabilistic programming, but I liked to have had a better advice for how that fits in with other types of machine learning. And when do you take a model based approach and when do you take a data based approach? I think we don't quite understand that trade-off yet.

We kept a lot of the old material, even though we know people are skipping it. So professors aren't teaching resolution theorem proving anymore, but we kept that in. We cut down that material quite a bit, but it's still there. And I think the idea of understanding how representation works is important. And I think that's really what deep learning does is it invents representations and it's able to build on those because it is deep and has multiple layers. And it's just that we don't quite understand what those representations are. I think that's important and it's important to see what the possibilities are. So I think those chapters should be there, but I wish we had a better take on the story of how they all fit together. Now it feels like here's a bunch of different technologies. You should know about them, but it's up to you on how to integrate them.

Lukas:

What percentage of the book at this point is what we would call machine learning?

Peter:

Probably a third of the book is sort of officially about machine learning concepts, but then all the application areas are very heavily into machine learning. So we have natural language. We have computer vision. We have robotics. And all of those are very much dominated by machine learning.

Lukas:

What parts changed the most since ... It was like 2009 that the last version came out. Where were the biggest changes?

Peter:

So all the deep learning stuff is new. And then I think the philosophy part is new. And also I guess the practical parts. I mean, we didn't really try to say this is everything you need to know to be an engineer, but we try to give some hints of what's practical. So all of these issues around privacy and security and fairness and how to do testing within the machine learning model, that was new. The philosophy part, we had a philosophy chapter before, but it was mostly like Searle's Chinese room and things like that. And now it's much more focused-

Lukas:

Yeah, I remember that.

Peter:

Yeah. Now it's much more focused on unfairness, autonomous weapons and these much more important issues.

Lukas:

Interesting. One thing I always wonder about with the AI ethics stuff is it seems like a lot of it would generally apply to all technology. Any technology in a weapon seems scary and fairness seems like always a problem. I mean, how much do you feel like these issues are AI specific?

Peter:

Yeah, I think that's exactly right. In fact, I was involved once with yet another one of these efforts to lay down an ultimatum and a set of AI principles. And I did work on the Google set and I think we did a pretty good job of that, but I was involved in another group doing that. And at one point I just said, "I don't think we need this. I think these things have already been said elsewhere. And I think, as you say, a lot of them are general principles for engineers and some of them are general principles for anybody living in a society.

I think we need less of sort of AI and machine learning specific principles and we need more at either a higher or a lower level. Right? So we need it for engineers in general, and there are some codes of that for engineers. And then we need much more specific ones because if you try to do it at the machine learning level, then you have these very vague principles of what's privacy and what isn't. I would rather say, "Okay, well now we're going to have it at the machine visions level and we're going to have a set of principles for them." And then you can start asking not just what privacy is good, but you should say, "Well, what exactly can I do with face recognition? And what are the limits there?" And then it's easier to formulate that if you're at a much more specific level.

Lukas:

That makes sense. Another thing that you wrote that I really loved was your unreasonable effectiveness of data paper quite a long time ago, which was one of the inspirations for me to start my last company. And then I saw Google came out with, 10 years later, basically it sort of showed that it still holds up. Still even more data seems to be even more effective. I'm curious if you feel like anything that's happened with data since then has been surprising or there's anything from those observations back then that yo would actually take back? Because when I reread it, it feels so completely true, like almost shockingly true, I guess 15 years later.

Peter:

I don't know if I would take it back, but I think it promotes one point of view that I think was important at that time. But there's another point of view that says that data isn't everything. And I think that needs to be said as well. I mean, everything has to be balanced out. Some of that has to do with these things we were talking about with the privacy and security and fairness and so on. We still found places at Google where more data helps. We've also found places where data is a liability rather than, or maybe as well as, an asset.

Lukas:

Interesting. What's a place where a data feels like a liability? Do you have any stories around that?

Peter:

We're doing a lot of effort in this federated learning where we say, "Well, maybe we don't want to hold on to your data. Maybe that's too sensitive." Particularly for things like speech recognition, where we do want to build a model that works better for you personally, but we don't want to have your private conversations in our data center because that's a liability. If we screw up and reveal them, that's a bad thing. So we'd rather build the models on your device.

So we'd rather build the models on your device and then have a way of sharing the parameters of that model without revealing any of the underlying data. And so a lot of work has gone into those kinds of approaches.

Lukas:

Although I guess that's still a case where data is useful. It's just trying to sort of use the data in a comfortable way.

Peter:

Yeah. It's still useful, but it's trying to figure out the best way to be safe and willing to compromise some in order to achieve that.

Lukas:

I guess, what else would you say under the theme of data isn't everything? What other points would you make to argue that side?

Peter:

Well, I guess another thing people are concerned with power and now, especially on your small devices. And so you could say, "Well, the best thing is this huge model with billions of parameters, but now I want it to run on the phone. So now I'm going to build something that throws away 99% of the data, but hopefully performs 95% as good." And so a lot of work is into that kind of approach.

Lukas:

It's funny. I guess I think of you as kind of a tinkerer. I don't know if that's right or wrong, but I guess, for me, as maybe a fellow tinkerer, it seems a little sad to me that a lot of the AI breakthroughs have really been through applying massive compute. I mean, maybe that's just a fact of life, but it does seem like it might really inhibit research if you need millions of dollars to build a breakthrough model.

Peter:

Yeah. I think that's definitely an issue. And certainly we've seen that. These GPT models take a long time to train and a lot of computational power. So both in terms of the expense that it takes to do that and just the availability of who can work on it, those are definitely issues. I think that as we get better at transfer learning, that some of those issues will go away, and that the typical way to do things will be to say, "We'll start with this model that's already been built, and then we'll modify it a bit." And so the expense for doing that should be a lot less.

And certainly there are possibilities now as a lot of the cloud providers are offering credits for researchers and so on. So some people can get through and get access to that kind of power. But of course, not everybody can, and you've got to have some way to prove yourself worthy. And I don't know if that selection process is always fair.

Lukas:

Right. I mean, I guess that leads me to another question down my list, which I'm sure a lot of our audience is going to wonder about, which is, just because you've been so successful in your career and your career has lasted so long, do you have advice for a young researcher maybe starting a PhD program or coming right out of undergrad? What would you guide someone to work on, or what would you work on if you were in that position today?

Peter:

I guess I'd probably try to understand biology better and work on that because I think there's a lot of opportunity there. There's a lot of data. It's important for the current COVID situations. That's obviously one big application. All aspects of health are important. So for me personally, I'd probably do that. But my advice would be find some area that you're interested in and concentrate on that.

Lukas:

We've had a couple of different biologists that are different lenses come and talk on this. Is there any new research in biology that you're seeing that's particularly exciting or predictions you have about biology and machine learning?

Peter:

There's lots of different areas. Understanding human health and personalization, I think, is important, and I think we're just starting to do that. Understanding the genome protein folding and drug discovery and understanding how neurons work, I think, is important. Recently, we've seen a couple of cases of people that have published connectomes of various organisms and so on. So we're just starting to be able to see maps of that, and we're starting to get better tools to understand that.

Lukas:

When you look at deep learning, it sort of feels like that came suddenly, but a lot of those techniques were around, in fact in your book, I remember quite far back. Do you think that the field missed something, or was it just not possible to run at the scale necessary to show that these neural network techniques were working better than people expected in the early aughts?

Peter:

Yeah. I mean, if you say suddenly, right, we've got a sudden leap in computer vision and image net after Hinton had been trying the same thing for 30 years, right?

Lukas:

Right.

Peter:

And then it finally worked. And I think the biggest difference was the computing power. Definitely there were advances in data. So we could do image net because Fei-Fei Li and others gathered this large database, and that was really important. There are certainly differences in the algorithm, right? We've got a slightly different squashing function. Instead of shaped like this, it's shaped like this. I mean, I don't know how big a deal that was, but we learned how to do stochastic gradient dissent a little bit better. We figured that dropout gave you a little bit better robustness.

So there were small things, but I think probably the biggest was the computing power. And I mean, I certainly remember Geoff Hinton came to Berkeley when I was a grad student in 1981, I think, when he talked about these neural nets. And we fellow grad students thought that was so cool. So we said, "Let's go back into the lab and implement it."

And of course, there was absolutely nothing you could download, so we had to build it all from scratch. And we got it to do exclusive or, and then we got it to do something a little bit more complicated. And it was exciting. And then we gave it the first real problem, and it ran overnight, and it didn't converge, and we let it run one more day, and it still didn't converge. And then we gave up, and we went back to our sort of knowledge-based systems approach. But if we had the computing power of today, it probably would have converged after five seconds.

Lukas:

So I remember Daphne Koller telling me, maybe 2003, that the kind of state-of-the-art handwriting systems were neural nets, but that it was such an ad hoc kind of system that we shouldn't focus on it. And I wonder if maybe I should have paid more attention to that and tried harder to make neural nets work for the applications I was doing.

Peter:

Yeah, me too. And certainly Yan LeCun had success with the digit database, and I think that was over-engineered in that they looked at exactly the features they needed for that set of digitizations of those digits. And in fact, I remember researchers talking about, "Well, what change are we going to do for sample number 347?" Right?

Lukas:

Oh, really? Okay.

Peter:

There were individual data points that they would perform theories on, so that was definitely over-tuning to the data. And it should have been an indication that was a good approach. It was better than other approaches at the time.

Lukas:

I guess so. Although that does sound like damming level of over-fitting the data, I suppose.

Peter:

Right. There was only a couple thousand data points. I forget exactly how many. Maybe it was 10,000. Maybe it was even 100,000, but it wasn't many.

Lukas:

I guess more broadly, when you think about what you were thinking about at the beginning of your career and imagining into the future, what's been surprising in the development of artificial intelligence?

Peter:

I guess when I started, I did it because I thought it was really interesting, and it was an academic approach. And I guess I was surprised at how much it's had an impact on everybody's everyday life. That wasn't something I was expecting. I mean, I knew it was probably a more practical field than 13th-century Italian poetry, and I figured my salary is probably going to be higher going into this field. But I still thought of it as an academic challenge that was obscure and not as something that would touch everybody's life every day.

Lukas:

I guess, has there been any approaches that you thought wouldn't work but then worked better than you expected?

Peter:

Yeah, I guess, in general, people are surprised that these deep-learning approaches work as well as they do and as wide a variety that they do. And I grew up at a time when there was a real emphasis on saying, "We need to understand representations inference," and focused on that. And I think that's still true. That's still important.

But I think we learned a couple of things. One is that you can do more with just the pattern recognition. And maybe we were exhibiting some speciesism of saying, "We're humans, and we do a lot of this higher-level reasoning. So maybe that's the really important thing." But there's lots of other animals that live long lives and do a lot of cool stuff, but without having a lot of that higher-level reasoning and long-term planning, and they can do short-term plans.

Lukas:

That's a good point.

Peter:

And they're not thinking about that in the same way we are. So I think we kind of missed that, and my hope for the future is that we can bring those back together. So I think it is a good idea to be able to do reasoning, to form representations, to simulate into the future and choose courses of action. I think where we went wrong is that we were so seduced, like first-order logic, in saying, "Oh, it's got such a cool theory of inference." But the problem is once you get outside of mathematics, this idea of kind of fixed predicates that are either true or false just doesn't hold up, right?

So yeah, we can define what a triangle is and say, "If your two sides are equal, then your opposite angles are equal." And we can reason through that. But once we get into, okay, now you're driving a car, and we say, "If there is a pedestrian on the sidewalk, then what?" Well, first of all, we don't know for sure it's a pedestrian. All we've got is a point cloud. And secondly, every sidewalk is different. All the predicates are vague, and all of the situations are unique enough that this kind of if-A-then-B reasoning falls down.

So I'd like to get back to something where we combine this, "I'm going to do pattern recognition, I've seen something like this before, what's similar?" But also some ability to say "Yes, and in addition to all these neuron weights that I'm seeing, I can also extract something that's abstract. And I can reason forward a little bit, as long as I don't take it too seriously." Right?

Lukas:

Do you think that the reasoning needs to be something that is kind of understandable by a human?

Peter:

I think that helps debugging, but I don't think it's necessary. So there's a couple problems. One is we trust it more if we can understand it. It enables us to debug it and enables us to take the advice more seriously if it's talking our language. But there's no reason it should because computers have different powers, so they should think different ways, right? And I'm sure B has a different representation of the world than I do because its visual system is so different, and we shouldn't try to have the same approach.

On the other hand, it's certainly possible that if we don't understand what they're doing, that they'll solve the problem in the wrong way, right? And so I saw something yesterday. They were trying to distinguish between Husky dogs and wolves and trying to figure out what the salient features were. And they decoded that said, "Well, one of the most salient features was whether there was snow or not."

Lukas:

Right.

Peter:

Right? And that's good if the only thing you're trying to do is maximize your results on this particular dataset, but it hasn't really helped you solve the real problem. And so I think we have to be wary of those kinds of accidental coincidences that our machine learning systems are very good at picking up at. And I guess part of that is having a better theory of how the future is going to be different than the training data, right? We can easily imagine, okay, here's all these pet dogs that are inside houses. Could they be outside in the snow? Well, sure, of course they could. But our machine learning systems don't pick up on that.

Lukas:

It does seem like though, and when you think about even AlphaGo or Alpha Chess, that they're successful enough that it feels like they must be building some kind of higher level representation within the models that they're building. Do you think it's possible that you could take the types of algorithms we have now and make them bigger and add more data and they'll sort of build higher-level representations that make them functionally similar to human like intelligence or do you think there's some real change or different methods needed?

Peter:

So that's a good question. Part of it is I don't think we really understand how powerful it is to have a perfect memory and have gigahertz level reasoning capabilities. And I think you can do a lot with it. We thought would take much more complex reasoning, but it doesn't. And so I think that comes up a lot. I remember a very good Go player saying, "I can't beat AlphaGo even with the search turned off." So-

Lukas:

The search turned off?

Peter:

Yeah.

Lukas:

Not look ahead.

Peter:

It doesn't look ahead, but it also just has one network that says out of all the possible moves, which is the best. And if he can't beat it only with that turned on. So it's doing something there, right? It's not just that it's very good at searching and figuring out where to search, it's that it has some abstract representation of what a good move is or isn't. But nobody quite understands what that representation is, both when it's playing alone and then also when it's combining it with the forward search. So there's something definitely going on there and we're not quite sure what it is. I think there's been some interesting hybrid approaches. So one of the things I think is interesting is they're improving, which is one of the few places where logical reasoning actually works.

But if you talk to mathematician, this combination of following the rules of inference and then some intuition. And there's some work on trying to combine that. So Christian Szegedy and Sarah Loos have the system where you take sort of a regular theorem prover and you give it a problem. And then you have a neural net decide out of the million axioms I have, which 100 are most relevant to this problem. And then you feed those axioms to that theorem prover and now it's able to brute force prove. Whereas if he gave it all the axioms it would get lost and it wouldn't be able to find it. So I think that's a nice way of saying mathematicians have two things. They have the power to correctly under rules, and then they also have intuition of, I think this is the way it's going to go.

So I'd like to see more of that kind of approach, where you have these very powerful general techniques that you can call on but then on top of that, you try to learn the patterns for how to use them. Another example I think about is we have things like mathematical induction where if it's true for one and it's true for N plus one, then it's true for infinity. And that's great in math, but in the real world it doesn't work. And we have these paradoxes like, well, I have a mountain and then I take away a grain of sand, is it still a mountain? Yes. Well, what if I do that an infinite number of times, then it's no longer a mountain. When does it not become a mountain, right? So we don't quite have answers to that.

The way I would approach problems like that is say, well, you got to learn two things. One, you got to learn this rule of, I can take away a single grain of sand and then you have to learn the applicability of saying, well, for sand, I can't do that too often, but for integers, it's fine to do it an infinite number of times. And we as people, we figured that out, but we don't have good ways of saying that to our computers. I think it'd be interesting if we could figure out how to say that or to teach them that.

Lukas:

Interesting. You imagine teaching computers sort of facts at large scale?

Peter:

So here, I guess, I'm talking more about control strategies or applicability effects.

Lukas:

I guess you touched on this to some extent, but another question I have just because you've been doing this for longer than most, when you look at the applications, it seems like some applications have turned out to be much easier than others, and it's been pretty surprising. Do you have a sense of things that actually surprised you? Because you've actually, I think, been very good at predicting the difficulty of different problems to machine learning. Have there been any applications that have been surprisingly harder, surprisingly easy throughout your career?

Peter:

Well, we still haven't quite figured out the self-driving cars and I'm not sure how surprising that is. And certainly people made predictions that we'd have it by now. I guess I'm not that surprised just because I think it's so complicated and there's so many different possibilities and that the stakes for going wrong are so high. One thing that's surprising to me, slipping away a little bit from your question, if you had asked me 10 years ago, would it be a good idea to voluntarily give up the keyboard and the screen attached to your device and just have a speaker sitting on the shelf that you can talk to?

I would say, "No. That's dumb. Why would I want to give up all these good input and output device and just to have that? That's crazy." But some people like that for a lot of things. And so I thought that's interesting. And why did I make that prediction wrong? And maybe it's because I'm too tied to the devices and would prefer to not be as tied to them. I still think we have a long way to go with these assistants that are getting pretty good at recognizing your voice. And I can tell it to play a song and I can ask it for the weather report. But then there's 10 more things I can do. I can ask it for a recipe and so on. But after a dozen or so, things, now I'm stuck and now I'm not quite sure if my next query is going to work or not.

And I like sort of the security of you open a new desktop app that you haven't learned before, but you can poke through the menus and you can get a good idea of what you can do and what you can't do. But with these speech based assistants, you have no idea what's going to work and what's not going to work. And so I think that's interesting. And so either we have to have a better theory of how we teach people what they can do, or we have to fulfill this promise of, well, it's just like talking to a person and you can say anything. And we haven't fulfilled that promise yet and we haven't given people a good model of what works and what doesn't work. So I think that's a real challenge.

Lukas:

One fact that seems remarkable to me even though it's so clear and we see it all the time is how little machine learning seems to work in the real world with robotics, right? I think it's incredible that computers can beat the best person at Go, but might have trouble picking all the ghost towns off the board every single time.

Peter:

I think we're getting better at robotics. I guess it was... That's a good example of something that was harder than we thought. And I remember Marvin Minsky saying, "Oh, it's a waste of time working on robotics because these trivial little stuff is so hard you'll never make any progress. If you want to make a PhD, do it in simulation rather than do it in robotics or else you won't graduate forever." And I think maybe that was good advice for people wanting to get a PhD but I think it was bad advice for the field as a whole, because I think these problems were hard because they're important not because they were this trivial thing off to the side.

Lukas:

And I guess if you applied that advice everywhere, you'd never do anything new, right? Synthetic data is something that seems really interesting and promising. Is that something that you covered in your new book?

Peter:

A little bit, yeah. And I think that's important and certainly very important in robotics. And I guess particularly in computer vision is the easiest place to come up with synthetic data because we really understand how optics work pretty well. And if you want to say, take this image and rotate it or put it under different lighting conditions and so on, we know how to do that because we understand the physics. Other kinds of data, we don't necessarily know how to do that, right? So we make synthetic data by, let's make some random changes and hope they're not too bad. But if you don't have a strict physics model, you can pull yourself to some degree. But I think that's important and we've done that a lot. In robotics, I think the sim to real transition is really important, right? Because a lot of times you're limited in doing things in real time, but simulations can run much faster.

And now you'd have to make sure that the simulations are calibrated and work into real life. And I think for the most part, we've had pretty good success with that. And some of that takes a long time, right? So you got to have the vision models, you got to have the physics models. And I say part of the problem with self-driving cars is the first billion miles are always the hardest because you can't build a good simulator until you've been on the road and seen 1000 new really weird things that you never would've thought of putting into your simulator. And once you have that, then progress is going to be 100 times faster if you can run in simulation rather than having to run on the real road.

Lukas:

Right. Right. What about then language? We were talking to Anthony on this same show from Kaggle a month or two ago. One of the things he told me that just really surprised me was that the winning Kaggle strategy in some of the language tasks is to use Google Translate to just take the sentence or document, translate it into some foreign language, translate it back into English. And so you get some natural changes, but maybe the underlying semantics is somewhat preserved. And I guess that really works as a synthetic data generation strategy.

Peter:

Yeah. Yeah. Yeah. So that's interesting. And I guess people also just break language up into pieces. I think that's an area where transfer learning has worked pretty well. Where of course we got probably even better than vision. We've got lots and lots of language available, so it's easy to find stuff to train on. But one, it's not on the topic area that you're dealing with, but we've found, in a lot of cases, it still helps a lot to have that. So that's really useful. And then we also found, I guess this was really surprising to me, that transfer across tasks worked really well. So you can train on question answering and then that helps you do a summarization for going across different tasks. And I guess that was a little surprising to me.

Lukas:

Okay. I feel a little self-conscious asking this question, but I need to ask it. I mean, what are your thoughts on singularity? Do you believe in some form of that? AI is going to be better than humans at all tasks and then continue to improve? Is that something you imagine happening?

Peter:

Yeah, not really. So one thing is singularity is in the eye of the beholder.

Lukas:

Sure.

Peter:

So if you're Kurzweil, all the curves are exponential and they're going up and right now is a special time. But if you've got log paper, then all the lines are straight lines and there's nothing special about right now. It was a straight line yesterday and it'll be a straight line tomorrow. And so I guess that's more of my philosophical point of view. That things will get better but... Actually I did talk at one of the singularity conferences and I tried to answer that question of, is this a special time right now?

And the way I did that was linguistic research, is I did a search over past machine learning papers and broke it up into decades and I searched for a couple of key terms like, "Unlike other systems, our system ... " And so I found all of the sentences that were like that, and then by hand, I sort of made a histogram of what was the breakthrough, and the answer was there wasn't anything special about right now, and lots of the same breakthroughs were being made 20 years ago. Somebody said, "Like other systems, our system does X," and today, some of the same things, the same X.

Lukas:

The same X, the same value of X. What was the common value of X?

Peter:

They were all over the map.

Lukas:

Interesting.

Peter:

So my answer was we still don't know what we're doing, and I think the other thing is I think people talk about these hard takeoffs and soft takeoffs and so on, and I think everything's going to be gradual and we're just going to get used to it. I mean look at the changes we've had already. So now everybody walks around with a device that has access to all the information in the world, and that seems like that should be a huge thing that's really different, and yet mostly we say, "Yeah, well what's the big deal? Of course, everybody has that." So I think that's going to happen in the future. So there'll be robots you can talk to and can have real conversations and they can do things for you and people will just say, "Well, yeah. It's just another thing I have. I have my phone, now I have my robot. It doesn't really change my world that much."

Lukas:

So just to ask it more concretely, do you expect a world where that robot is smarter than you in every way?

Peter:

No, I don't think so, and -

Lukas:

Because even with a straight line on log paper eventually.

Peter:

Yeah. Yeah. I mean I could see that argument eventually, but I'm older than you. So I don't have to predict out as many years into the future. I'm very confused about my predictions. On the one hand I say I don't think there's going to be these big changes coming, and so if I had to bet, I would bet against people like [inaudible 00:38:25]. On the other hand, I look at past predictions and I'd say, well, [inaudible 00:38:27] has probably done a better job than me, so I should probably bet for him. And I haven't quite been able to figure out that contradiction.

Lukas:

Interesting. All right. Well, here's another kind of a little bit off the wall topic, but it's been surprisingly interesting with various guests. Do you think that Python will continue to be the kind of main programming language of ML for the next decades? Or do you think something else is going to come along and unseat it?

Peter:

Yeah, I don't know. I've been doing Python for awhile and I came to it actually because of the textbook. So we had the textbook, it's got pseudo code in the book because we didn't want to tell a professor what language to use. When we did the first edition in 1995, we implemented all the pseudocode in List because that was the style in AI at the time. Then over the years List started to fall out of favor and students would complain, "We don't understand how this code works. What's all these weird parentheses doing there?" So I said, okay, I got to reimplement all the code from the book in some other language, and I said, "Well, what's the most popular language? Java, I'll do that." So then I said, "Oh, well this is such a mess, that I can't take the pseudo code and implement it directly. I can't just have let's create an object X. First I need an X factory and then just ..."

Lukas:

Right, right.

Peter:

I just got complicated. It wasn't a good match to the pseudo code we had written.

Lukas:

Yeah.

Peter:

Instead of saying what's the most popular language, I said, "What's a pretty popular language that's the best match to the pseudo code?" And I didn't know that much about Python, but I looked at it and just said I must have been cheating and channeling Guido when I came up with my pseudo code, because my pseudo code with almost exactly Python. So I said, "This is going to be the easiest thing to do, so that's what I'll do," and it turned out that was a good choice in terms of the popularity for Python really started to grow and I think that's important. I think there are some limitations. So looking at where we are today, I guess I would be happier if Julia was the main language.

Python's starting to have type declarations now, but they don't quite take them seriously. Julia does a much better job of that and Julia was written to be more efficient sort of from the start. So I think that's probably a better choice. I guess some people are using Swift. I don't know too much about that, and there are other languages, but I think the popularity is going to be more important than the difference between them. And if the language is popular, people will put in what's necessary to do it. So you look at JavaScript, it was very a rushed language design, so I don't really blame the designers for that, but there's a lot of weird stuff in it, and yet, because it was the only thing that you could run in the browser, people ended up coming up with really good compilers for it because they had to, and I think that hasn't quite happened with Python yet. I'm a little bit surprised at that.

I guess probably the reason is because we didn't have to. So right now the Python compilers aren't the greatest and I think that's not necessarily because of the language design, because I don't see anything that's that much different between Python and JavaScript, but it's just that it was necessary to have a fast compiler for JavaScript and it's not necessary to have one for Python, because in the browser you have no choice, but outside of the browser you could have used D+ or Rust or something else. So it's not as necessary that Python becomes as fast. And we may end up having splits with things implemented in different languages. As long as they interface with each other, that's probably okay.

Lukas:

But you still write most of your code in Python, right?

Peter:

Yeah. Yeah, I do, in part because a lot of what I do is teaching focused and Python is good, one, because it's what's taught in a lot of the schools, and secondly, as I was talking about before, there isn't a lot of [inaudible 00:06:26]. So if you've just trying to say, "Here, I'm trying to show an algorithm," Python is very good for being a direct implementation of that algorithm without a lot of other stuff you have to worry about. If I was worried about efficiency, I'd probably be using something else.

Lukas:

Interesting. That's a really interesting reason to pick it because it looked like your pseudocode. I can totally picture that.

Peter:

Yeah.

Lukas:

That is a great thing about Python. We always end with kind of two a little bit open-ended questions. Feel free to take them where you want, but one is is there a topic in AI or machine learning that you wish people would pay more attention to?

Peter:

I guess I would pay more attention to all these peripheral issues of fairness and equity and privacy and security and operations. So we have this term of MLOps now, and I think that's good, but I think people should pay more attention to the whole life cycle of the product rather than just say, "I'm trying to get the highest possible score on my test set."

Lukas:

Well that's an incredible segue into my final question, which is when you look at deploying machine learning in the real world ... and I guess this is MLOps, where do you see the biggest bottlenecks or challenges or problems?

Peter:

Yeah, so there's a lot of them. I guess one of the biggest ones I face continuously is drift. So the data changes, the users needs changed, and you have to have some way of monitoring that and responding to it. And I think we've had 50 years or so of making better tools for software engineering and so we're much better at that now. It's harder to insert a bug into a program that it used to be, but we don't have much of that for machine learning systems. They only have a couple of years, and you've been trying to contribute to that and that's great. And we have some tool sets, but I think we're still far behind, and so we run into these problems. One of the things I see continuously at Google ... and we're a big place, so it's easy for a team to say, "I need some data that does X. Oh look, here's another team that's producing that data asset. Let's plug it in and try. Look, it works great. My problem is solved."

And then six months down the line, things are just slowly getting a little bit worse every day and nobody knows why, and eventually someone figures out this other team that, for a while the two of were on the same path and they were producing data that worked for them and worked for us, all of a sudden they veered off in one direction and we veered off in another direction. They made small changes a little bit at a time to the data they were producing, and for them, those changes were updates, and for us it made it worse, and we found we don't have good ways of tracking all that. So sometimes the team didn't even know somebody else was using their data. So they didn't pull and say, "Hey, is it okay if we make this change?" They just said, "It's good for us. We're going to go ahead and do it," but it hurt someone else.

And there's all sorts of ways in which the world changes and drifts, and I think we built a software engineering approach where we say you make a change, you get it reviewed, you run all the unit tests, you check it in, and these changes are relatively bigger events and open a level of individual check-ins that have to get reviewed and at a major product, a number of releases where they only happen a few times a year and they're big things. But with machine learning, everything's changing every day as you're getting new data and you can't go and say, "Well we're going to do a complete test of everything every time we get a new data," but you have to have some process that says, "What are we going to retest? At what level? And what are we going to monitor for? And how do we know when the world has changed out from underneath us?" And I think we need better tooling to get that right.

Lukas:

Awesome. Well thanks so much, Peter. It's a real pleasure to talk to you.

Peter:

Yeah. It was fun to talk to you, Lukas.

Lukas:

Thanks for your time.

Join our mailing list to get the latest machine learning updates.