gradient-dissent-transcription Table

Hans-ramsl's workspace

Runs

470

audio_duration

audio_path

modelsize

transcript

transcription_time

transcription_factor

3y ago

14s

Dec 28 '22 14:58

4vfqxr8c

Finished

Dec 28 '22 14:58

3121.032000

/content/stephan-fabel-efficient-supercomputing-with-nvidia-s-base-command-platform-swvoticj4je.mp3

tiny

Scheduling on a supercomputer typically is by post it. It's show, it's your cluster this week, right? But I needed next week, right? And it doesn't work that way at scale anymore. You want to interact with something that is actually understanding the use of the cluster optimizing its use so that the overall output across all of the users is guaranteed that any given point in time. You're listening to gradient descent, a show about machine learning in the real world, and I'm your host, Lucas B. This is a conversation I had with a stuff on Fobl, who is a senior director of product management at Nvidia, where he works on the base command platform, software that runs on top of Nvidia's DJX machines, which are basically the most powerful computers that you can buy to train your machine learning models on top of. And it's fun to talk about the challenges that customers face when they have access to basically unlimited compute power. This is a super fun conversation, and I hope you enjoy it. My first question for those who haven't heard of Nvidia base commands since you are the senior product manager, can you tell me what base command aspires to do? Yeah, so in a way think of Nvidia base command as so you won't stop shop for all of your AI development. So it's a SaaS offering from Nvidia, where you log on directly, or you log on via an integration partner. And you leverage the capabilities of base command to schedule jobs across a variety of infrastructures. And you do that in a secure manner. You gain access to your data and retain access to your data and data sovereignty across the infrastructure that you're scheduling the jobs on. And then it's really just a matter of optimizing that job run on Nvidia infrastructure. So that's really what base command aims to do. And so these jobs are model training jobs exclusively, or is it broader than that? Yeah, model training jobs are generally the ones that we focus on, but we also do model validation, for example. So you could have single shot inference runs as well. And are there other pain points, I guess, of model development that base command inspires to solve or tries to solve? Yeah, so I think that a lot of the issues that you have with AI infrastructure is really that's where it starts, right? Sort of the question is where do you train your models? And how do you go about it? And so most people start in the cloud to train their models, right? And that's reasonable because just any development effort would start in the cloud today. And at some point you reach a certain amount of scale, where you say, well, you know, it may not deliver the performance I need, or it may not deliver the scale. I need at the economics. I'm comfortable with et cetera. For those high end runs, typically you look at infrastructure alternatives, right? So then the question becomes, okay, I already am used to this whole SaaS interaction model with my AI development. So how do I maintain that developer motion going forward, where I don't have to teach them something new just because the infrastructure is different. And so what we have at Nvidia is this, you know, DJing super pot. And the idea is to say, well, how about we try this and develop base command as a way to access a super pot just as, you know, a cloud API would behave? And so a DJing super pot, is that something that I could put in my own infrastructure or is that something that could access in the cloud or both? How does that work? Yeah, so typically our customers for super pots, I mean, maybe we should take a step back and understand what it is, right? So, you know, when you, the easiest way to think about, or is most straightforward way to think about a DJing super pot is to think of it as a super computer in a box. And it's a packaged up infrastructure solution from Nvidia that you can purchase and it'll be deployed on premises for you and your own data center or in a color facility. And actually, we found that a color facility is the most likely place for you to put that because it is a pretty intensive investment. Number one, not just in terms of just a number of DJ access that are involved for example, but also, of course, in terms of the power draw and cooling and just the requirements that you need to bring to even run, run the space, essentially, right? So, yeah, so I mean, that's really what then dictates kind of where this thing usually is, right? So, what we did is we put it in a color facility and make it available right now. And kind of directed availability fashions. We have a couple of golden tickets for some customers who want to be on this thing. And then they get to select, you know, the size of the slice they want and access that through basement. I see. So, when you use basement, you're using DJX, but it's in video cloud and you get kind of a slice of it. Is that right? Yeah, that's right. Although, I know we call it in video GPU cloud, but really think of the whole base command proposition today as a SaaS portal that you access that is currently coupled to more like a rental program. So, it's less like, you know, cloud, versy, elastic, think of it more like, okay, I have 3DG XA 100s today and then maybe, you know, in the next couple of months, I know, I need three more. So, I'll call, you know, in video and say, hey, I need three more for the next month. And then that's, that's kind of how that works. So, maybe let's start with the DJX box. Like, I guess, what would a standard box look like? Like, what's its power draw? Like, how big is it? How much is it cost? Can you, can you answer this question? Just, what are the magnitude? Yeah, so, I mean, you're looking at about 300,000 dollars for a single DJXA 100. It'll have HEPUs and 140 gigabytes of memory that come along with that. Those are the A100 GPUs, so the latest and greatest that we have. And, you know, you're going to look at about 13 kilowatts per rack of standard deployment. 13 kilowatts? Yeah, and, you know, you're going to have, like, or just, yeah, yeah, yeah. No, no, when you fire these things up, these puppies, you know, they heat up your, they heat up quite a lot. So, yeah, I mean, they're pretty powerful, you know, and so the DJX Superpot consists of at minimum 20 of those. And, if you think about that, right, that's, that's what we call one scale unit. And, you know, we have customers that build, you know, 140 of those. Wow, and, and what kinds of things do they do with that? Well, just all the, the largest multi-note jobs that you could possibly imagine, right? Starting from climate change analysis, large huge datasets, right? That, that need to be, that need to be worked on there. NLP is, is a big, big draw for some of these customers, right? Just natural language processing and the analytics that comes with those, with those models is pretty intensive, data intensive and transfer intensive. I think it should, I should mention that, you know, we keep talking about the DJXs, right? And, of course, they're very proud of them and all of that would be. We also, you know, acquired a company called Melonox a year ago. And so, of course, the networking plays a huge role in the infrastructure layout of such a superpot. So, if you have multi-rail and finiband connections between all of those boxes and the storage, which typically uses a parallel file system and a superpot, then what you'll get is, and essentially, extreme performance even for multi-note jobs. So, any job that even has to go above and beyond multiple GPUs, you know, a DJX superpot architecture will get you there, essentially at the, I would say probably one of the best speed performance characteristics that you could, possibly have, I mean, the superpot score number five on the top five hundred. So, it's, it's nothing to sneeze at. Yeah, I guess, how does the experience of training on that compare to something that, listen, it would be more familiar with, like, you know, 2080 or 3080, which, which feels pretty fast already, like, how much faster is this? And, um, do you need to use a special version of TensorFlow or PyTor, just something like this to even take advantage of the, the parallelism? So, I'd have to, I'd have to check exactly how to quantify, like, an A32 and A100, but think of it, think of it as this, right? So, any, any other GPU that you would might want to use for training in, in a traditional server, think of it as a subset of the capabilities of an A100, right? And so, if you, use, for example, our, our make capability, you can really slice that, that, that GPU down to a, you know, T4 type performance profile, right? And say, well, I'm, I'm testing stuff out on a really small performance profile without having to occupy the entire GPU, right? And then once you, you know, you sort of have the same approach, right? From a software perspective, if you do, you know, your, your, your, your sweeps, then, you know, you do essentially the same thing. Well, you could do those on, on make instances, right? And, and then, there why you don't need that many, um, uh, DG access when you do it. But I guess I should say that, that's a beauty of, of CUDA, uh, that, if you write this once, it'll run on an A30, it'll run on an A100, it'll run on a T4. And in fact, we provide a whole lot of, uh, sort of, base images that are, uh, free for people to use and to start with, and then, uh, sort of lift the tide for everybody, right? So, uh, it is a pre optimized container images that people can build on. Hmm. I would think there'd be a lot of kind of networking issues and parallelization issues that would come up, maybe, uniquely, at the scales. That's something that, in the digitalized to help with. And that, this CUDA actually helped with that. I sort of think of CUDA as, like, compiling something to run on a single GPU. Yeah, absolutely. So, uh, if you, if you think of, uh, CUDA as the sort of a very horizontal, uh, platform piece, right, that in the software stack of, uh, uh, of your AI training stack, then components like nickel, for example, provide you with a pretty optimized, uh, communication paths for multi GPU jobs, for example, but they're also span multi multi nodes, right? And this starts from selecting the right, to exit a signal, right? And, and going to, because that means you're going to the right port and the top of the rack switch, and that means you minimize the latency that, uh, your signal takes from, you know, point eight to point B and such a data center. So, when you, when you look at, at CUDA, uh, and especially at, at components like nickel and magna, my, oh, as a, as a whole, which is sort of our, our, our portfolio of communication libraries and storage acceleration libraries. It starts from the integration of the hardware and the understanding of the actual chip itself, right? And then it builds outwards from there. And the, the big shift that Nvidia that, that we're, we're, we're looking at, um, uh, sort of accelerating with use of base command. This is this understanding that, hey, you know, Nvidia is now thinking about the entire data center. It's not just about, you know, I got the newest GPU and now my game runs fast, right? Certainly, that's a, uh, a focus area of us as well, right? But really, if you, if you take the entire stack and, uh, and, and, and work inside out, essentially, then the value proposition just multiplies the, the, the further out you go, right? And so with base command, it's a sort of the last step in this whole journey to turn it into kind of a hybrid proposition. Um, so anyway, I know it's, it's very high level right now and, and, and, and, and upstract, but, uh, but, uh, it's, it's sort of, it's a super interesting problem to solve because if you, if you think about how, uh, data center infrastructure, uh, evolved over the last, let's say, 10 years or so, right? Um, then it was about introducing more homogeneity into the actual layout of the data center. So, you know, a certain type of server, a certain type of CPU, certain type of top or x-witch, and then a certain layout, right? So you have all these, uh, you know, non-blocking fabric, uh, uh, reference architectures that are out there and, et cetera, et cetera, et cetera, right? And ultimately, now that everything is, uh, homogeneous, you can now, uh, address, make it addressable using an API because everything is sort of, is at least intended to behave in this very standard and predictable way. And so, we've worked our way up there. This has never been the case for something like a super computer. Uh, a super computer was a two year research project with, uh, you know, a lot of finagling and parameters here and then set this thing to a magic value and that thing to a magic value and then run it on, you know, five minutes after midnight, but not on two states and then you get the performance, right? And so, this whole, uh, uh, a contribution that we're really making here is this that we're raising that bar to a predictable performance profile that is repeatable, not just inside the Nvidia data center where we know, you know, five minutes after midnight and so on, right? But also in your data center, in actual random data center, we provide you can afford to cooling in power, of course. But then, you know, once we got that out, the way we're pretty good, right? So, so that's, that's a real shift forward towards enabling enterprises real, you know, bonafide true, you know, blue chip companies to actually adopt AI at a larger scale. Is it interesting? One thing I was thinking of as, as you're talking is, most of the customers that we work with, we don't always know, but I think what we typically see with our customers that are doing training a lot of machine learning models is they use a lot of Nvidia hardware, but it's less powerful hardware than the DGX, it might be like, you know, p100 or, or, basically whatever is available to them through Amazon or Azure or Google Cloud. And I think they do that for convenience. I think, you know, people come out of school knowing how to train on those types of infrastructure. And then, then the computer costs do get high enough. I mean, we do see compute costs, you know, certainly well into the seven eight figures. And so, do you think that they're making a mistake by doing it that way? Like, should they be buying custom DGX hardware and putting that in a collab? But they actually save money or make their teams more productive if they did it that way? Oh, God, no. No. So, you know, just to be really clear, it's just like I said, you know, base command is not a cloud, right? We're not intending to go out there and say, you know, go here instead of, let's say Amazon or something like that. That's not what we were saying. I mean, first of all, you can get a 100 instances in all the major public clouds as well, right? So, you could have access to those instances and just the same way that you're used to consuming, you know, the P, the P100s or V100s or anything like that, right? So, whether it's Pascal or Volta or Ampere architecture, all of it is available in the public cloud. And like I said in the beginning, it's just a perfectly acceptable way to start. In fact, it's the recommended path to start in the cloud because it requires the least of front investment, I mean, zero. And you get to see, you know, how far you can put something and an idea. And then once you arrive at a certain point, I think then it's a question of economics. And then just everything starts, both start falling into place. What we found is that enterprise is typically arrive at a base load of GPUs. So, in other words, at any given moment in time, for whatever reason, there is a certain number of GPUs working. And, you know, once you identify that, you know, hey, every day, I keep at least 500 GPUs busy. Well, then typically the economics are better if you purchase, right? Typically, kind of a capx approach works out better. It's not always the case, but typically that might be the case. And so, to meet that need in the market is where we sort of come in. So, what base command right now, first is this, it's not the all the way, you know, purchase it, right? You don't have to now have that, you know, big capx investment upfront. But it is something in between, right? You do get to rent something, it's not entirely cloud, but, you know, you're moving from the Uber model to the, you know, national car rental type model, right? And then, you know, once you're, you're done renting, then, you know, maybe one of my car. But the point is that there is, there's room here and on that spectrum. And so, currently we're right smack in the middle of that one. So, that's that's typically what we say to customers just, actually yesterday's only said, well, how do you support bursting and how elastic are you? So, that's, that's not the point here. Right? You want to be in cloud when you want to be elastic and bursting. But typically that base load is done better in different ways. So, not like what breaks if I don't use base command? Like if I, if I just purchased one of these machines and I'm just kind of shelling into the machine and, you know, kicking off my jobs the way, you know, I'm typically used to or running something in the notebook. Like what, what, what, what starts to break where you know that you need something more sophisticated? So, well, on the face of it, nothing really breaks. It just takes a lot of expertise to put these things together. So, if you buy a single box, then there's probably very little value adding that to a SaaS platform per save, right? But as soon as you start thinking about a cluster of machines and, like I said, more and more of our enterprise customers are actually thinking about deploying many of those, not just a single machine. And then as soon as that comes into play, then you're faced with all the traditional skill challenges in your enterprise that you'd be used to from just rolling out private cloud infrastructure, right? It's the same exact journey. It's the same exact challenge, right? You need to have somebody who understands these machines and somebody who understands networking, somebody who understands storage, Kubernetes and, you know, and so on and so forth, right? And as soon as you build up the skill profile that you need to actually run this infrastructure at scale and at capacity, then you, I mean, you're good to go, right? You can build your own solution, but typically what you'd be lacking are things that, that then help you make the most of it. So, all the kinds of efficiency, you know, gains that you have by just having visibility into the use of the GPU. So, all the telemetry and the aggregates by job and by user and by team. So, this entire concept of charge back, et cetera, is a whole other hurdle that you then have to climb, right? And so, what we're looking at is, you know, people who want to build a cluster typically, they want to do that because they want to share that cluster. Like I said, it's a pretty big, you know, pretty big beast, you know, so if you build a big cluster, might as well, because you want to meet more efficient and you want to make the most of it. And so, now you need to have a broker, right? Who brokers access to the system, you know, super computers, I mean, as ridiculous as it sounds, I mean, they work scheduling on a super computer typically is by post it, right? It's Joe, it's your cluster this week, right? But I needed next week, right? And it's not, it doesn't work that way at scale anymore, right? You want to interact with something that is actually understanding the use of the cluster, optimizing its use so that the overall output across all of the users is guaranteed at any given point in time. I just sensed that like many years ago, like decades ago, like, you know, when I was a kid or maybe even before that, super computers felt like this really important resource that we use for lots of applications. And then maybe in the 90s or the odds, they became less popular. People started moving their compute jobs to sort of distributed commodity hardware and maybe they're kind of making it come back again. Do you think that's inaccurate impression? And do you have a sense of like what the sort of forces are that kind of makes super computers more or less interesting compared to just like, you know, making a huge stack of chips that you could, you know, buy in the store? Yeah, it is interesting, right? Because if you think about it, we've actually oscillated back and forth between this concept a little bit for years, right? I mean, you're exactly right, you know, you had the sort of the first wave of standardization was, let's just use 19-inch rack units, right? And start from there and then see, maybe that's a little bit better, right? And then sort of the same thing happened when we decided to use containers as an artifact to deliver software from point A to point B and, you know, just standardization and form factor really is what drove us there, right? And, you know, and certainly there's value in that, right? The thing, the interesting moment happens when all of that together becomes, when the complexity of running all of that together and lining it all just upright, because you just in beginning you had one, you know, one IBM S390, right? And you know, that's the one thing you have to line up, right? And now you have 200 OEM servers across X-Rax, and you know, that's a lot of ducks to line up, right? So, so the complexity and management of independent systems that you're sort of adding together, that sounds good on paper, but at some point you're sort of crossing that complexity line where it's just more complex to even manage the hardware. And this is not just from an effort perspective, this is also from a CPU load perspective. If more than 50% of your chorus goes towards just staying in sync with everybody else, well, how much are you really getting out of each individual component that makes up this cluster, right? So, so now you, of course, you're saying, well, how do I disrupt them? Well, you disrupt it by making assumption about how this infrastructure actually looks like rather than saying, well, you know, you're dropping the ocean, you first have to figure out where you're even at. And so if you eliminate that complexity, then fundamentally, you know, you can go straight into focusing more and kind of a data plane, type of focus rather than figuring out how to control plane looks like and how busy that one is, right? So it's got a little bit of that. And I think the DGX represents an optimization that shows, you know, rather than purchasing eight separate servers that, you know, that have potentially similar to views in them, right? Here's a way that it's, you know, not only has those HEPUs in them, but it also is interconnected in a way that that just makes optimal assumptions about, you know, what's going on, between those two GPUs and what could possibly run on them. And that combined with the software stack that's optimized for this, for this layout, just brings the value home, right? So, so that's really where we're coming from. It's interesting, you know, when I started doing machine learning, the harder I was pretty abstracted away, like we would kind of compete for computing resources, and so I got a little bit handy with, you know, like unix and like nice and processes, and just, you know, it's sort of like coordinating with other people in grad school. But, but I really had no sense of, you know, the underlying hardware. I don't even think I took any classes on networking or chip architecture. And now I really regret it. You know, I feel like I'm actually learning more and more about it. And the hardware is becoming less and less abstracted away every year. And I think, you know, Nvidia has a real role to play there. I mean, do you think that over time will go back to a more abstracted away hardware model? Kind of figure out the right APIs to this, or do you think that, you know, we're going to make more and more specialized hardware for the different things that people are likely going to want to do. And of course, skill of an ML practitioner is going to need to be understanding how the underlying hardware works. Yeah, I think what you said there is, I remember a minute of like ten years ago, you used to say, well, if you're a web front end developer and you don't know TCP IP, you're not really a web front end developer. But most web front end developers will never think about TCP IP, right? So, and I think this is very true here too, right? You have an ML ops practitioner and today, you know, well, yeah, you get to think about your models and, you know, tensors have a parameter searches and all of that kind of stuff. And yes, that's important. And well, not important, it's crucial, right? But out that you couldn't do your work, but increasingly, you also have to know where you actually are running, right? In order to get the performance that you need. So, today, it's a real competitive advantage for the companies out there to increase the training speed, right? Just not just being able to, obviously what we're solving is just getting started, right? I mean, we take all that pain away, you just log on to basically an off-you-go, right? But, but increasingly, it's a true competitive advantage, not to be in the cloud, but to be training faster than anybody else, right? Like, 2012, 2013, you know, if you weren't working on a cloud initiative as a CIO, you know, that was a problem, right? Now, increasingly, if you're not focusing on how to accelerate AI training, now you're putting your company as a disadvantage. So, that means that the necessity for each individual practitioner would interact with the hardware, actually understand what they run on and how to optimize for this, is going to increase. Now, having said that though, part of our job at Nvidia, I think, is to, you know, make optimal choices on behalf of the practitioner out of the gate. So, rather than, you know, requiring people to really understand, let's say, to clock rates of each individual bus or something like that, right? We'll abstract it away, and, you know, people will argue that Couda is already still pretty low level. So, but, you know, we're actually abstracting a whole lot, and, you know, even to even get to that point. And so, I would say, I would say, well, that's true. We're trying to shield the practitioner as much as possible, and we have a leg up because we can work with both the knowledge of how the GPU looks like. And, you know, most importantly, how the next GPU will look like, but also, you know, how to expose that optimally at the application layer. And then interact with the MLops providers in a just kind of a meaningful way that, that, yeah, that just as optimal throughout. Have there been any kind of cultural changes needed to build a SaaS customer facing product, like base commander, a company that kind of comes up through making really great semi-conductors and kind of very, well, I would call it, I would call it Couda low level from my vantage point. And obviously, it's an amazing piece of software, but it's a very, very low level software. Yeah. Has an ability to make adjustments to kind of in the product development process to make base command work for customers? Yeah, it's interesting because base command is actually not a new product. We've been using this thing internally for over five years. And it was kind of a natural sort of a natural situation for us because, you know, so five years ago, we launched a first DGX. And then of course, if you launch something like the DGX, then you say that's the best thing you could possibly, you know, purchase for the purposes of training. And you have 2,600 AI researchers in your in-house, then you can imagine the next sort of the obvious next question is like, okay, well, how do we use this thing to accelerate our own AI research, right? And so, this need for creating large scale AI infrastructure on the basis of the DGX was born, right? Out of this situation. And so, with that came all these issues. And as we solved them, we just kept adding to this portal, or to this, it's not just as much as the portal, right? I mean, see entire stacks, the infrastructure provisioning, you know, and then the exposure, the middleware, the scheduler, the, you know, the entire thing, right? So it became more and more obvious to us what should be done. And so these 2006 aren't researchers that I just mentioned, I mean, less, they're hard, right? They really had to go through a lot of iteration with us and be very patient with us, you know, until we got it to the point where they would, well, it's a not complain as much. But, you know, the point is that we really tried to get it right, and we, you know, acted in a very transparent manner with a pretty large community of AI researches and developers. And they told us what they needed and what they wanted and what their pain points were. And so, really, this, this going to market now with, based command as an externally facing product was simply turning that to the outside. Have there been any surprises in taking it to market because I know that sometimes when companies have an internal tool, like I think the TensorFlow team has talked about this, that, you know, kind of made for, especially for like a really, really advanced large team, and then you want to take it to, you know, someone who's newer or smaller team, and then you have new needs that are a little bit surprising to people that have been doing this for a long time. Have you encountered anything like that as you bring it to market? Yeah, it's, it's funny to ask. So, we encounter this in, it's just in many different aspects. So, one example is that most customers, sort of like I said, I mean, we make this available. So, the, the, the, sort of the internal example that we use is, oh, you get to drive the Lamborghini for a while, right? And so, you know, so the idea is this is a short term rental. I mean, how long are you renting the Lamborghini right? Maybe a day or two, right? But, or a weekend. And so, here with, we're saying, well, you know, short term rental, they're probably going to rent this for three months or, you know, something like that. Well, it turns out, most customers want to rent this for two years, three years, right? And so, what surprised this was that there's a real need for, well, not only for a long time rental, but, especially the immediacy of access to this. I think we had underestimated a little bit how desperate the market was to get started right away. I mean, we knew that people would want to get started, right? But we always figured, okay, well, you know, the cloud is there to get started right away. I mean, just sign up and swipe your credit card enough you go, right? But no, but it's also the need for large scale training and just the immediacy of that need that that personally wasn't surprised to me. I hadn't expected that. I thought that would be much more of a slower ramp than it was. So, yeah, so anyway, I mean, I thought I was going to be in different sales conversations than I actually was found myself in. And so, so that was the surprise. As surprises are just, you know, understanding just how much people still have to go, you know, they typically, we encounter folks who say, well, you know, I really my way to scale and accelerate my training is just to pick a large issue. And, you know, and there's a big, big portion of the market that certainly has been operating that way, right? But really helping them see that, you know, sometimes it's not the, you know, sort of the scale up model, but the scale out model that might be appropriate as kind of the next step. I think that was, it wasn't exactly surprising, but it was interesting to see just how, how widespread that scale up thinking was rather than the scale out thinking. Can you say more about scale up versus scale out? What do you mean, what's the difference? So, yeah, I mean, if you think about cloud infrastructure, then scale up can't approach would be, you know, you started with a medium instance and you go to an X large or something like that. So you just choose more powerful resources to power the same exact hardware, but you don't really think about adding a second server, for example, and now spread the load across multiple instances. So here would be something similar, right? So, like I said, if you, if you always think about saying, well, okay, I choose to run this on a, on a, a, a, a, a Vullter based system, right? And now we have a Vullter based GPU and now my way to make this faster is to go to an empirical based architecture to you, right? So that would be scaling up and certainly, you know, that's something that you want to do, but at some point, you're saying, your pace of, in your need of for accelerated training actually exceeds the, sort of the cadence at which we can provide you the next fastest GPU, right? So if you need to scale faster than that and if that curve exceeds the other, then you're essentially in a situation where you have to say, well, how about I take a second, a 100, right? And then I have a multi GPU scenario and I just deal with that, right? And so on and so forth. And so then the next whole conclusion of that is, well, how about multi node jobs where, you know, they're smack full of the latest and greatest GPUs. And then, you know, how many nodes can I spread my job across? And if you, if you do, you know, I don't know, five billion parameters. And yeah, you know, you're going to have to do that. And then then you're going to be pretty busy, right? Trying to organize again a job across across a multiple sense of nodes. Do you have any sense on how your customers sort of view the trade off of, you know, buying more GPUs, buying more hardware to make their models perform better? Do they, are they really doing like a clear ROI calculation? One of the things that we see at Ways and Biasis is that it seems like our customers sort of use of GPUs to spend to fit whatever capacity they, they actually have. Which I'm sure is like wonderful for Nvidia, but you know, you wonder if the day will come where people start to scrutinize that cost more carefully or even, you know, some people have pointed out that there's possibly even environmental impact from just monstrous training runs or even a kind of a effect where, you know, no one can replicate the, you know, the latest academic research if it's only going to be done at like, you know, multi-million dollar scale, yeah, compute. How do you think about that? In the end, I think it's, it's a pretty simple concept. I think that if the competitive advantage for companies today is derived from being able to train faster and larger and better models, you're not speaking to the CFO anymore, right? You're, you're speaking to the product teams. And so at that point, it, it just becomes a completely different conversation, right? I mean, the, the only interesting piece here is that traditionally, of course, data center infrastructure is a cost center, whereas now we're, we're talking about it, turning it into a value center. And so if you turn it into a value center, then you really don't have this problem. Now, yes, of course, we have extensive ROI conversations with our customers or, you know, we have TCO calculators and all that good stuff is, is definitely there. And it's really about helping customers choose, you know, should we do more cloud for, you know, for where we're at? And then, you know, from a GPU standpoint, we're happy either with either outcome, right? So we're, we're maintaining neutrality in that, in that aspect that we're saying, well, if, if more cloud usage is, turns out to be better for you than you should absolutely go and do that, right? And then if we figure out that the, the economic shift that in such a way that a mix of cloud and on prem or a cloud and hosted resources makes sense, then, you know, we'll, we'll propose that, right? So it's, it's really about finding the best solution there and definitely our customers are asking these questions and making pretty hard calculations on on that, right? But I mean, it's pretty obvious, right? I mean, if you think about it, what was it, a couple years ago, we, you know, talked to an autonomous driving lab team and they said, well, you know, company A put 300,000 miles a time and a silly under road last year and we put 70,000 miles on a road last year, autonomously, right? We got to change that, right? How do I, how do I at least match the 300,000 miles a year that I can put autonomously on the road, right? And so that's a direct function of how well does your model work, right? And, and so on and so forth, right? And so, we're pretty clear, tie in, right now. What about inference? A lot of the customers that we talk to inference is really the dominant compute cost that they have, so that the training is actually much smaller than the spent on, on, on inference. Do you offer solutions for inference, too? So, you can use these based command at inference timer as it entirely training and do people ever use these de-gex machines for inference or that just be a crazy waste of an incredibly expensive resource? Well, um, I mean, yes, it no, I mean, depends on how you use it. So, first of all, you can use, uh, based command from model validation purposes, right? So, you can have some single shot runs, but, uh, the customers want to set up a server that is dedicated to inference and then just take mixlises and and say, well, you know, I'll do my model validation at scale, basically, do my scoring there. And, and so now, if you share that infrastructure across a large number of data scientists, you know, you put your DGX to a good use. I mean, there's no issue with that, right? We do have a, uh, sort of a sister, uh, SaaS offering to base command called fleet command. And that is meant to take the output of base command in form of a container, of course. And then deploy that at scale and orchestrate it at scale and really manage the inference workloads at the edge, uh, for our customers. So, it's an end to end coverage there from a SaaS perspective. Cool. In your view, based on the problems that you're seeing in the market, what functionality are our customers asking for in their software layer for machine learning training that you're interested in providing. That's a, that's a really good question because it's sort of, uh, goes to the sort of the heart of the question, what role, what space is base command seeking to occupy, you know, in a, in a, in a, in a, in a radical stack of where, you know, the infrastructure set at the bottom and something like weights and biases at the top, right? I would see a base commands role as an, an arbiter and a pro broker and, um, almost like a bridge between, you know, a pure developer focused almost like an IDE perspective and, and bridge that into enterprise ready architecture. So let me give you a simple example. If you do, um, data set versioning, right? And then it's say that's, that's, you know, what you, what you want to do with the MLops platform, then, um, you know, then there's, there's many ways to, to version data, right? And you, you can try and be smart about this, but at the end of the day, right? It's a question of what infrastructure is available to you. If, if I have an optimized storage file or underneath, my data set versioning strategy might be looking entirely different. Then if I just have kind of a scale out open source storage back end, right? If, if I work with, uh, as three buckets, then my versioning looks different, uh, you know, then, then I do that with NFS shares, right? So the, the value that basement provides is that it distracts it away. If you do data set versioning, which basically meant then, you know, do snapshots. If you do it on an NF file or if, you know, it'll do other things, you do it with a different storage. And, and so, uh, but those are exactly the questions that an enterprise, you know, architect will be interested in, how do you deal with that? I'm going to have to, like, just because you, you figure you need 50 versions of your data set that's three terabytes, you know, large. Does that mean they need to plan for, like, almost infinite storage? No, it doesn't, right? We can help you translate that and make that consumable, um, uh, in the enterprise. And I think that's, that's a big, that's a big piece that that I think that basement can provide, um, as this arbiter between the infrastructure and the, and sort of the, the API, if you will. The second thing is is increasingly I've seen people being very concerned about data security and governance around this. So if you have us, you know, sufficiently large infrastructure to deal with, then almost always you have multiple geos to deal with. They have different laws about the data that's being allowed at any given point in time. And so just the ability to say, well, this data set can never leave France, right? Or that data set has to only be visible to these three people and nobody else, right? So, the first thing that I think is of extreme value to, um, to enterprises. So all those things come into play and I think that's where a base command can help. Are there other parts of base command that you, you've put a lot of work into the people might not realize the amount of effort that it took that might be invisible just to, to a customer even just me sort of even imagining what, what base command does. I think that we invested a lot in our scheduler and I think if you, if you look at the, um, the layout of, of DG access in a super pod arrangement. And the nature of the jobs that go, go into this, I think people underestimate just how optimized the scheduler is across, you know, it's just multiple nodes, but also within the node, right? To be able to say, um, I'm running a job on GPU configuration and then it's a slider and I say, well, I'm turning this into an H GPU job now. And that's literally a selection. What goes on in the background is it's just a lot more intricate than people typically realize, but it goes on automatically and, you know, I mean, you, you do have to be, you have to be ready for you have to program for it and people know that, right? But as soon as you do that at your layer, all the optimization underneath is just incredible. And what's tricky is it like you need to find a GPUs that are close to each other and, and not being used and all that is at the, yeah, Yeah, data locality, you know, caching strategies, all that kind of stuff, right, is going straight into into that selection. Cool. All right, well, you know, we always end with two questions, both on MLs. Let's, let's see how you answer them. So, um, one thing we always ask is, what's an underrated aspect of machine learning that you think people should pay more attention to or you would love to spend time on if you, if you had more free time. I think what's underrated is this aspect of crowdsourcing. I don't think anybody is, is looking at, at machine learning and the potential that, that just many small devices that can, that contributory to, the creation of a model would bring. I think that's we're at the cost of that, but we're not really doing that right now. I think, the, to the degree that it already happens, it's very hidden from us, right, it's, you know, we all know, Google will run some algorithms across data that was collected through the phones, right, like we understand that on a conceptual level, but just the ability to, to bring that together in a more natural sense that we might want to find recommendations, not on the basis of, of a single parameter about find recommendations of more meaningful parameters. I find five star reviews very meaningless, for example, right, I think that is a very simplified view of the world, and I find consequently also one star reviews very meaningless, right, but if you could actually have a more natural understanding based on on machine learning, I think that would be, that would be an interesting topic to explore because it, it would have to be based on just all kinds of inputs that would have to be taken into account. So I would like to see that, and I think that would be an interesting field of research, an interesting field of development, I think people, still assume that it's only a prerogative of the big companies to be able to do that, but I think there's, there's, I just know it, there's an open source project in there somewhere. Cool, I have somebody start that, and they should send it to us when they do. And our final question is when, when you look at your customers and, and their effort to take, you know, business problems and and turn them into machine learning problems and deploy them and solve those problems, where do you see the biggest bottleneck, where are they struggling the most right now? The biggest issue they have, at least as far as I can tell, is that they have a, just a getting started issue, in the sense of, how do I, how do I scale this beyond my initial POC? So I think that the prefab solutions that are out there are pretty good at walking you through, you know, getting started tutorial and then they probably get you really far, you know, if you're a serious practitioner and you devote some time to it, but I think that some point, you'll, you had problems that may not even have anything to do with, you know, ML, they may just have some to do with infrastructure that's available to you and things like that, right? So I think that anybody trying to use this for commercial and business strategic purpose is going to run into an issue of sooner or later, right? How do I, how do I go from from point A to point B here? People call it like, what was it, you know, something like AI DevOps or something like that, that floated around and I think as an industry, we should be aiming to make sure that that job never comes and sees the date of, you know, sees the light of day. To lay, I think, yeah, you know, I feel like we lost on that one already, but I really think, you know, we should do better. You should have to require super special skills to create kind of this whole DevOps approach around AI training. We should really know better by now how that whole approach works and then build products that, that, that, that drive that. Awesome. Well, thanks a lot to your time, I really appreciate it. That's one thing. Thank you. If you're enjoying these interviews and you want to learn more, please click on the link to the show notes in the description where you can find links to all the papers that are mentioned supplement and material and a transcription that we work really hard to produce. So check it out.

88.35762

35.32272

3y ago

Dec 28 '22 14:56

1mpru1uj

Finished

Dec 28 '22 14:56

2096.472000

/content/suzana-ili-cultivating-machine-learning-communities-ukjx-ijgkya.mp3

tiny

I think the most important thing is to do something that you are really interested in because if you are starting a lot of things will depend on you and the key also to MLT is consistency so we consistently just keep doing stuff that we think is exciting and interesting. You're listening to Grady DeSent, a show where we learn about making machine learning models work in the real world, on your host Lucas Bewald. Susanna Ilych is a founder of MLT, machine learning Tokyo, which is a huge community of people working on and learning about deep learning. She's hosted around a hundred machine learning related events in the last two and a half years and built an incredible community. I'm super excited to talk to her. Susanna is so nice to talk to you. I was really looking forward to this because I see it that we share at least two interests in common. One seems like the democratization of AI and other is edge computing or deploying in a deep learning hardware so I'm super excited to hear about what you've been up to and I thought maybe we'd start with machine learning Tokyo. I would love to hear about why you started it and what it does. Yeah, a first of all thanks so much for having me. I'm super excited. I love ways to advise and I've visited in SF so I'm super excited to be on this podcast. Yeah MLT is a Japan-based non-profit organization in Iphone, Thailand, Hujin and our core mission is to democratize machine learning. So we want to make machine learning and deep learning as accessible as possible to as many people as possible because we believe that machine learning is going to be everywhere. It's going to be some standard component in the software stack in a very near future so I think a lot of people should know what it is and be able to navigate. And we mainly do this through open education, through open source so we build a lot of open source projects and open science so we work with universities. And yeah we we're here in Tokyo and we support a research and engineering community of about I think four and a half thousand members. Well four and a half thousand and so how does it work like how did people join the community and what do they do? So it depends like there's many ways how to join the community. You can just be an attendee of the meet-ups or join a workshops or hands-on sessions and then you can just join meet-up and you get all the information you need there on upcoming sessions. But there's also like more active ways to join MLT so if you want to contribute, if you want to work on open source or if you want to for example who the workshop or lead a study session you can join Slack and you can talk to me and there's like many ways how to how to be more actively involved in the community. What inspired you to start MLT? So we started I think two and a half years ago and it was basically just out of our own needs. We were two people that's how MLT started and so I'm a domain expert in machine learning so I come from a very traditional academic background and I'm a trained linguist and I was always working with text analysis and NLP. I was using very simple methods and at some point during my master's I was working on sentiment and emotion and effect. I realized that these kind of very simple statistical methods give us some intuition and some insight about a corpus about a data set but languages full of like very complex and very beautiful things like metaphors and humor and analogies and irony and sarcasm and you know that's not possible to to grasp with those very simple tools. So I think three or four years ago I started reading about machine learning and deep learning neural networks and I got super hooked and I realized okay having learning algorithm and having algorithms that learn from data directly and stuff from rules or lexicon might be a way to to understand language better or to to be able to process language better. So I started writing my first machine learning called three years ago but I also realized well like coming from a different background it's pretty challenging it's pretty difficult and for me back then I I knew okay I want to have this collaborative learning environment. I need to be surrounded by people with different backgrounds people that have different skills and no different things than I do and together or at least that was that was what I thought we could learn faster and that's exactly what happens. So you've asked my my co-founders also coming from a different background from an electrical engineering hardware background and he wanted to use machine learning and he still wants to use it for edge devices microcontrollers and yeah we we started very small and we just met every week and wrote machine learning code and every week more and more people joined even though it was kind of word of mouth and after like a few weeks there are so many people we didn't know where to put them anymore so we met in this open co-working space at Yahoo and with too many people so everybody wanted to write machine learning code and then we started like putting out our first meetups and every since it has been growing pretty fast so we started from very small but kind of you know out of our own need to because in Tokyo there was no such thing back then like two and a half years ago there were a lot of communities like great communities but that was no like place to actually build AI that was no place to work on hands on stuff so that's how it all started. That's a cool so you built the community that you wanted to be a part of that's so great. How did you frame it like when you were first saying hey come join me what was the thing to do like it was like let's learn ML together or read papers or how did you think about that? So the very first kind of I think first six months or so it was purely dedicated to going through tutorial so really learning about how to write machine learning code and learning about you know getting a conceptual understanding of different algorithms of the math but mainly to write code and I was how we started it is just you know going through as much stuff as possible and then once we kind of you know the team grew bigger and more people have joined us so after six months it kind of slowly started to broaden so we did a lot more things we did we started doing hands on deep learning workshops in the first year so we had deep learning engineers who were working as who were working full-time Japanese companies and they were giving five hour deep learning workshops where we focus on writing life code from scratch and training a specific model of training I don't know we first focused a lot on computer vision so we went through a lot of computer vision stuff and then gradually kind of moved into different areas of machine learning and like as the community kind of progresses and grows we we see that we go into different directions so now we have like a computer vision team that the does CNN architecture is and their own little ecosystem we have a team that is fully dedicated to NGI so running deep learning algorithms on hardware, microcontrollers and engine devices we have an NLP team that does research and natural language processing so and everything is fully community driven so there is no full-time employees or anything it's really how the community evolves and grows and that kind of broadens into different directions that's impressive so like how do you run a good workshop like like a five hour workshop you know I've seen really good ones in bad ones like how would you do it to make sure that it's a good experience for people I think it was learning by doing they in the beginning we really didn't know what we were doing so I think two years ago when we held our first deep learning workshops a lot of things were pretty difficult and pretty challenging because people come with different machines with different skill sets with different background knowledge with different software and hardware so it was pretty difficult but we kind of slowly we got a lot of feedback in kind of first iterations and worked and work with that feedback so things that made it easier for us is just you know focus on one thing that is really interesting to us where we see value that can bring value to us as instructors as deep learning engineers as well as to the communities with something that is very useful the second thing is like make sure that technically everything runs smoothly so we switched I think after a second or third workshop to Google Colette that makes it very easy like to just write code and there is no prerequisite except for having a Gmail account but that solves a lot of the technical issues and problems that we had. Yeah but does everybody like build the same thing together is that how you run it to everyone it's like you get you sort of like like say a problem everybody works together like how I guess like how much do you kind of coordinate like everybody doing the exact same thing versus that people going off on their own. So it depends what kind of workshop we're doing so if we have our standard deep learning workshop there's typically a topic and we already have prepared like a repository with the model that we're going to build we sit down with 50 people we do some theory so we do first like maybe an hour of conceptual understanding of what is going to happen what we're going to build and then Dimitri is for example he's like he's life coding from scratch so he basically walks you through from the very beginning to to getting your your performance metrics and so these kind of workshops are designed to do exactly this only this and people just follow along with the code and they can live code from scratch and this is something that people find really useful because especially like kind of the live coding aspect because sometimes when you're on your own you look at you know blocks of code and you kind of try to figure out what is happening try to figure out your own thing but it's useful if someone actually writes code with you and explains what is happening it's you learn just faster probably or this is at least what I what I find to be useful on the other hand we have much more open sessions so especially like our hardware sessions where I do edgei the only thing we provide is is a ton of hardware these are typically smaller groups maybe 20 25 people and then people come in they build teams they choose their hardware and they come up with their own idea and they build their own stuff and then at the end of the day each team presents what they have been working on so it really kind of depends on the session I guess is there like a different kind of culture in Japan and say in San Francisco like our language barriers like in a shoot all like like what's it like to be sort of I guess I know what it's like to be in San Francisco but what do you think that there's big differences coming from Japan? So I don't know San Francisco that well so I went to a lot of meetups actually and they're pretty cool I think a lot more things are just happening in San Francisco and I think a lot more things are supported probably in SF in Japan languages definitely an issue it's it's a huge barrier it is something that I've been constantly thinking about in Japan there are amazing communities in machine learning there is there are two super big machine learning communities the TensorFlow user group that is very related of course like to Google and then deep lab which is I think affiliated with Microsoft so those guys are very big and they're very very Japanese so everything is in Japanese and then there's us I think we're like similar in size we're English speaking and yes this is like one thing that has been bothering me so much because I'm always trying to find ways how to you know not have this isolated communities so this is a challenge in Japan this is definitely a challenge and we're working on it but other than that you see that communities are growing and that there is a huge demand also for a machine learning talent so apart from the kind of very Japan specific problems like language barriers I think it's a pretty pretty good and active environment to be in yeah I remember I went to Japan last year and I've worked off and on with Japan as a market and I've always been impressed by how excited people are about you know machine learning even going back like 10, 15 years I've seen look there's a lot of enthusiasm for it and actually I've been kind of wrestling I just would like to find a way to translate our documentation into Japanese and kind of keep it up to date yeah I've been think of it that way yeah I think that would be a good move we were also like only focusing on English but there needs to be like this bridge and we need to start somewhere so we also started translating we we worked with a TA from Stanford to translate their CSD planning course material of course notes into Japanese to make it more accessible to people and have like bilingual kind of resources for people so we're trying also very hard kind of to include as many people as possible that's awesome hi we'd love to take a moment to tell you guys about weights and biases weights and biases is a tool that helps you track and visualize every detail of your machine learning models we help you debug your machine learning models in real time collaborate easily and advance the state of the art in machine learning you can integrate weights and biases into your models but just a few lines of code with hyper parameter speeds you can find the best set of hyper parameters for your models automatically you can also track and compare how many GPU resources your models are using with one line of code you can visualize model predictions in form of images videos audio, flatly charts, molecular data, segmentation maps and 3D point-chouts you can save everything you need to reproduce your models days, weeks or even months after training finally with reports you can make your models come alive reports are like blog posts in which your readers can interact with your model metrics and predictions reports serve as a centralized repository of metrics predictions hyper parameters trade and accompanying nodes all of this together gives you a bird side view of your machine learning workflow you can use reports to share your model insights keep your team on the same page and collaborate effectively remotely I'll leave a link in the show notes below to help you get started and now let's get back to that episode I mean when you think about sort of democratization of AI what else do you think is important like how do you how do you think about that maybe this is because of my personal background because I am a domain expert but I also see like how important machine learning is and is going to be in the in the future in the new future if possible we should have as many people as possible involved in even technical stuff so there have been like a lot of democratization efforts if you look at H2O for example with with auto ML like making it really very easy to to experiment but also other of course like auto ML platforms from from tech giants for us it's like a lot of education that we do we work with a lot of universities so something that I kind of personally like doing is working with research scientists or students coming from different backgrounds so I think it's machine learning could be super useful for people that work with a lot of data and we worked with with a lot of super interesting people for example last year in summer I think we were at the Tokyo Institute of Technology where we held a two-day bootcamp for LC LC is the Earth Life Sciences Institute and those guys are amazing they're astrophysicist the planetary sciences computational biologists chemists like you know mind blowing stuff and we had a room of people and they all work with different kinds of data sets and problems sets and with different tools and techniques and machine learning could be one way for them like to get new insights and maybe even to advanced science so so these kind of things are I think for me personally super excited um getting like more domain experts involved into technical stuff doing open education doing open science this has been yeah pretty pretty interesting what about um people without kind of like a math or programming background do you think there's like room for them to contribute to yeah absolutely I think so um you know there are Jeremy Howard and and Rachel they've been doing like the best job ever into getting the main experts on board right you do have to have some coding backgrounds so you you should be able to write some Python code but going through fast AI courses for example it's a it's a more top-down approach and they're exactly democratizing machine learning or making it on cool but having so much more people just involved and this top-down approach allows you to get into deep learning without having to have a PhD at Stanford and computer science or like a really strong math background you build stuff so you start with thinking about your problem and your data and to build stuff and then afterwards you you start digging deeper into the math for example that you might need for your particular project or problem and I think I really like this kind of approach that's that's very similar to what we've been doing with MLT as well even though we we also do a lot of like fundamental work so we also have like study sessions for from machine learning math and other things but I think there's definitely a room for people who are coming from different backgrounds and I think if they if they find it even potentially useful they should look into it. What I mean you've probably seen people go from kind of not as to knowing a lot about ML and people ask me all the time kind of how do I get into this stuff do you like have any advice from the data that you're seeing and you know what folks should do if if you have no background and you really want to go deep on this stuff. Yeah for short so I think two things are super important the first thing is like don't neglect your background don't think that you have to start over from zero and you don't know anything before that leverage your background leverage your experience your professional experience your academic background whatever you whatever it is that you have been working on in the past years leverage that it's the same there are many examples for that for example you could be hardware engineer and you know a lot about hardware and now you're getting into machine learning and deep learning now leverage that background and that expertise and learn about machine learning and how to combine these two things in my case it's language so I've studied language as a system for many many years and I use machine learning and the combination of language and machine learning to kind of bring maybe interesting and unique insights to the particular projects that I'm working on I talked to a recruiter here in Japan and I asked him so what does the market need and he said well it's here in Japan it's not enough just to know deep learning you have to have some sort of specialization you have to have some sort of domain expertise some like way how you can use this kind of deep learning in combination with something else it could be software engineering it could be hardware it could be language it could be anything so this is the one thing and the second thing is when you're coming from a different background and you want to go into machine learning there's of course like two approaches either you you start with the fundamentals you start with math or you do what I just earlier mentioned top down you start with a project and you just write code built a project and then figure out details later and I think the most important thing here is to figure out what is interesting to you what would be something that really kind of catches your attention and you love working on and make that decision and then start working on that because the problem here is that there are too many options you could you could do too many things everything seems to be interesting but if you spend a little time here a little time there you will get maybe you know some shall understanding of a few things but you'll not advance as quickly as you might want so figure out what you want to do and leverage your background is probably my advice do you have um do you think that you see people being more successful kind of starting from the fundamentals or starting with a project because you mentioned those are sort of two different approaches and yeah people gravitate towards one of the other do you have a preference or can can both work both can definitely work I think we were just like only talking about domain experts and and people coming from different backgrounds but of course I think what the what the what the research but academia and industry needs just as much or even more is people with very very strong CS backgrounds with very strong math backgrounds that know how to optimize and know how to work on theoretical things so so I'm not saying like this is not important not at all I think of course this is still the norm and this is what probably employers want to see the most and if you're if you're coming from a strong CS or math background I think you you already have a strong foundation to to go very deep into machine learning and deep learning but I just want to say like there's room for for other people as well okay so this is a little bit out to the scope maybe of a ML podcast but I'm just fascinated by this so um and maybe it is well what about the starting a community like do you have advice on someone in an area like you where they want to find like-minded people I mean do you have any any advice on that like if I'm in the city where there isn't already like an ML group how would you go about finding people yeah and like so many people right me messages on that they're in some yeah yeah that's so great either in remote areas or in cities where like literally something like a machine learning community still doesn't exist and I would really suggest I would always say like go for it if there is no such thing out there be the first one to do it because MLT has like evolved into an amazing community like I'm like literally I'm amazed by how active and how engaged the communities and all those guys they have they have full-time jobs but they still find kind of time to um work on open source and to teach other people and to do these kind of workshops so it's it's pretty amazing so I will really kind of suggest to to think about starting a community wherever you are you have any practice with this we're getting it off the ground because it seems kind of daunting to me to to try to start that and keep people engaged how do you get people to keep talking I think the most important thing is to do something that you are really interested in because if you're starting a lot of things will depend on you and the keys I think somebody wrote it on Twitter recently the key also to MLT is consistency so we consistently just keep doing stuff that we think is exciting and interesting so start from what you're interested in start from your own problem set or from your own need and more people will follow and then like more practical things you know there's always the thing we started doing remote media so that that that is not the kind of burden of having to find a venue and a sponsor and other things so this is an option how to kick things off to find more people who are interested that makes it very easy like make that there's no easier way probably than to start like remote videos. On the other hand if you want to start something in your city you you might want to check out like first of all like for a small peer group around yourself and kind of try to figure out what you want to do and then start to look for a physical place and figure out if you want to do hands on stuff if you want to do like more educational stuff learn together and get it out there like try to reach as many people as possible and I think you know I just yesterday I talked to someone at journalist and he said to me wow there's no such thing for writers out there I want to start something for writers out there and I think it's kind of the same thing right there's a need for all these niche groups and and and communities so I think if you get it out there and if you do things that you're very passionate about people will follow do you have any thoughts on like diversity and inclusion and mmail and in these groups that that you create? Is that something that is top of mind for you? Yeah that's something that is very important to us luckily within mlt we're we're very diverse kind of four and a half thousand people in terms of you know countries and languages and skill sets and backgrounds and professional experiences so this is really super diverse but women are super underrepresented I think two years ago when we started on working on deep learning workshops we had 60 engineers and I was the only woman so I realized okay yeah so we really needed to do something about that so we're doing like very specific not only events but also projects that support diversity and inclusion we do a lot of women in machine learning events they're supported by google japan, mechari and other companies we also do projects that I just earlier mentioned where we had one of them was we had about 12 bilingual engineers that worked on translating some of the Stanford course notes into Japanese and having this kind of bilingual resources for people just to be more inclusive in general also to the Japanese community because we are literally in japan and we are very diverse but it's still kind of seems like there's a disconnect between between a Japanese community and and an English speaking community and I think it has never been more important we all know like tech in general is multi disciplinary machine learning should be as well multi disciplinary we need people with different skills with different expertise we need people with different backgrounds in general so this is something we all have to work on I think do you have any other suggestions for making a community feel more inclusive so in our core team we decided very early on that we we want to create an environment that is very collaborative and that is very inclusive that means that we really want to don't have this as kind of this elite math machine learning group we want to include as many people as possible and we want to have like decision processes we want to have the community involved in like what directions we take what kind of things we're we're tackling on next and we do like every project that we do and every workshop and every study session we kind of have that sort of mindset so when you look at our math sessions so last year we started doing remote math reading sessions so we're going through a book that walks you through some machine learning math and the so more than 1000 people signed up from all of the world so we have sessions in in the Bay area we have sessions in India in APAC here in Japan and the thing is it is very inclusive because decisions the people that join those sessions their levels of math are very different so we have complete beginners we have people that coming from completely different backgrounds but in our Tokyo sessions we also have mathematicians we have experts we have PhDs in math people that have taught math for many years and it's it's pretty amazing like it's a very interactive after the reading it's a very interactive discussion where people ask all sorts of questions and together we kind of brainstorm around things and try to our experts like Emil and Jason they try to explain mathematical concepts and it's been pretty amazing so I think really having this mindset whatever you do that you need this it's not like it's something that is actually enriching whatever you do is something that is very important and having that mindset is probably going to help a lot that's super cool that sounds really fun it is I'd love to it's been pretty fun yeah it's been pretty good what is something underrated maybe in machine learning that you think people don't pay enough attention to I think something still underrated in machine learning is data still oh my god yeah I think so like it doesn't matter who I talk to it's like always I feel like there's this kind of it's a troublesome thing to do right you know when I work with data you want you know right machine learning algorithms you want to train models you want to get good accuracy and and push accuracy of or metric it's not about data so data is kind of the least that people think about sometimes or this is a least kind of my my understanding of it and I think we should definitely think more about data and put put more emphasis on data maybe this is also because of my own background because I've been working with data pretty much all my own career and just three years ago started working with machine learning algorithms but yeah it all starts with data and it'll probably answer data I think tip you and just mentioned recently who who owns the data pipeline will own the machine learning in production or the machine learning system I guess I don't know if it's still that case maybe in SF maybe in the Bay Area people think more about data I don't know I mean I am I think I've got to similar to yours and so I feel like it is so unbelievably important I guess it's not possible for it to be properly rated for its contribution to them all cool and then when you think about like kind of making machine learning work in the real world for like real applications what's like the hardest part about getting it to work so in our case we love to experiment with new things and I think you know it's difficult when you're trying new things you kind of need to figure out a lot of stuff and generally I think in production environments there's a lot of experimenting and try to see what works so making a production pipeline work and deploying machine learning for different use cases has different challenges from from data all the way to software engineering to monitoring your model like how how it changes in in different real world scenarios so I think we need to even don't like things are taking off there's still like a lot of room like to work on these kind of things infrastructure things deployment things finding new use cases finding use cases that make a lot of sense for for machine learning at the same time I think this is super exciting so this is something that really kind of excites me probably the most is thinking about use cases and trying you know experimenting a lot and trying new things we don't work like at MLT we're we're not working on we do work on production things as well but it's not our main thing our main thing is just trying out new things experimenting and make make POCs so we don't actually deploy a lot of things on large scale of production so maybe I can't talk about like the main challenges here but what I can say is that we try like if we take edge for example we're trying out a lot of things we're working with different hardware we're trying to think about different use cases where these things can be deployed and like a lot of things you know just don't work out and fail but that's totally fine that's that's good as about is something that we kind of also need to grow and to figure out things but then at the same time we also built things at work and that are super interesting so yeah it's a lot of experimenting I guess yeah like that's okay so my final question if I am listening to talk and I get excited about you know joining one of your virtual events or something how do I find out more and how do I get more involved with MLT can I do that remotely? Yes you can definitely so we do as I mentioned earlier like on Meetup you can find all of our events and a lot of them are actually remote so if you want like to to be part of an event or a meetup or something like that you can just join meetup and we'll repost everything there there's also more active things so if you would like to work on open source or doing some other things or get more involved in general you can join our Slack group there's pretty much the whole community they're talking about different things so in more technical depth so you can also find people there to work on projects and do other things and so these are kind of the main two things the meetup for for events and maybe Slack for projects and other stuff awesome thank you so much it's great to talk with you yeah thank you so much for having

50.06834

41.87221

3y ago

Dec 28 '22 14:55

1qsvirkp

Finished

Dec 28 '22 14:55

2581.632000

/content/matthew-davis-bringing-genetic-insights-to-everyone-a0-b7pwkzmm.mp3

tiny

There's lots of genetic defect in everyone. We think the average healthy person has a couple of hundred genes with defective function in their genome. There's probably lots of what we think of as sub-clinical symptoms wondering around across the whole population. So I think that's the future vision of the company is everyone would benefit from having a full understanding of their genetic background. And it's a complicated problem that doctors don't understand. The patients certainly don't understand. And they were really at the frontier of understanding. You're listening to gradient descent. A show about machine learning in the real world. And I'm your host, Lucas B.Wald. I'm Matthew Davis as the head of AI in VT, a medical genetic testing company. The applies a really wide range of machine learning techniques to the genetic testing problem, but I think it was one of the most interesting applications of ML today. I'm super excited to talk to him. In VT is actually a household name in my house because my wife runs a startup that sells to VT and I run a startup that also sells to VT. So you're one of the very few overlapping customers. So I feel like I know in VT very well, but I was thinking if that wasn't the case, I would definitely not know in VT. So I was wondering if you could describe what an VT does and how I might interact with your product as a consumer. Yeah, sure. So for starters, we're a medical genetic diagnostics company and pretty sure by a volume of tests were the biggest in the world. Which is amazing because you're fairly new for a public company, right? Didn't you start in 2010 or something like that? Yeah, that's about it. So the company itself is about a decade old and it's not a coincidence because the availability of high throughput low cost genome sequencing really came online in 2008 with a limited up and making a scalable platform. At that point, it became clear that instead of analyzing one or two genes at a time, that you could be analyzing lots of genes for less money. And I think the strategy that was clear to the founders was here's a market with a very narrow market with a very high margin that actually should be an adjustable market of everyone with access to modern medicine and instead the cost could be low and the volume could be high. And that if you pursue that strategy, there's actually way bigger benefits to mankind and also share holders of a company because you'd start to learn things about medical genetics and disease and probably most importantly, relationships to treatments that you weren't going to learn if you took a small, small adjustable market strategy. So I think you like living this, but for someone who has an adagenetic test for a medical reason, what would be a scenario where you'd actually want that and what would it do for you? Yes, so classically, diagnostics were not about genetics. They were about, you know, your cholesterol is high or some other hormone is low or whatever. Genetic diagnostics were first kind of proven in mass at breast cancer where we know there are genetic Buddhist positions that would change your treatment strategy where the risk is high enough that if you're your mother or your sister or your grandmother, you're on had breast cancer at age 70, people get cancer when they're old, but if it's at age 35, that's way scarier. And when we add a genetic analysis on top of that, we could further partition that to who, because they had this variant, they were early cancer patients. And then your doctor can help you make the best practice decisions about how to avoid it. That's a proactive and a lot of other intermediates, including which type of drug would be most effective for you. Should you risk the downsides of chemo? Should you just wait? So it worked with breast cancer and then as we started being able to analyze more and more genes, we started discovering more and more things that it works for. I think one thing we keep in mind is that human geneticists, in general, geneticists, they historically study horrific things, whether you're studying fruit flies or amyes or a human, it's a big effect. There's a soul saying big mutations have big effects. We study things that could be hard to tackle in early age or make you grow tumors or have crippling nervous system diseases. And there are lots of people risk for that who don't know it. But I think the real future is lots of genetic defect in everyone. We think the average healthy person has a couple of hundred genes with defective function in their genome. And there's probably lots of what we think of as subclinical symptoms wandering around across the whole population. So I think that's kind of the future vision of the company is everyone would benefit from having a full understanding of their genetic background. And it's a complicated problem that doctors don't understand. The patients certainly don't understand. And they were really at the frontier of understanding. So that's a little bit of the history and a little bit of the future mission statement. So I'd imagine that a lot of people listening to this have done one of the kind of consumer tests that may be like ancestry or 23 in me. So how is what you do different from what happens there? Yeah, I mean, it's a great question. And it's one that pre-COVID writing on a plane, someone ask you what you do. And they're like, oh, like 23 in me, I'm like, oh, man, no. So I mean, the audience difference to the interaction with our customers is that historically goes through a doctor. It's a medical test. And you want that provisioned and administered by a medical professional. In 23 in me is, it's a fascinating company. But they've focused on things like whether or not you like cilantro, not whether or not you are risk-free disease. And they have tried to move into a diagnostic space, but they're not built for that. And we find the acknowledge that a couple of years ago, after many years of not wanting to offer a medical diagnostic procedure directly to patients, because we didn't want people to go with the information in hand, but not understanding an explanation that they could get from a medical caregiver. So that sort of difference is like we have medical caretakers in place, but we now have a strategy where we let patients initiate their orders. And that's really because there's a lot of the country that doesn't have access to one of the few thousand genetic counselors in the U.S. So there are places where it's a six month wait. There's places where you're just not going to go. And things to tell a medicine, things to, you know, suffering, engineering. It's easier now for us to let a patient who some other head breast cancer start their process themselves. We would further them to a telemess and genetic cancer who helps them by being their medical caregiver without them having to wait six months to go to a medical center that's a hundred miles away. And we think that's great. It does lead to a little more confusion. You know, like, well, we used to say we're not consumer-facing and now we have patient-facing border farms, but that's really the difference is we are trying to help people with a complicated medical problem not to find out your ancestry. Unless, of course, your ancestry has direct bearing on the medical risk. Got it, that makes sense. And so, I mean, how does AI fit into this? Like, you talk about a broad range when I've talked to you in the past, I've been kind of shocked by the number of different ML kind of fields that you draw from. So I think maybe you can give me an overview of the different problems where, you know, machine learning techniques can help with what you all are doing. Yeah. So, you know, we say AI and if it was an easy to adopt term, especially, you know, for the last few years. But when I think of AI, I really think of like every chapter of a textbook of a field of computer science, it's been around for many decades. And a lot of it, thankfully, is a machine learning until it's not. So some of it is like optimization algorithm and robotic planning and so forth, it has been around for a long time. And it's still making rapid advance in those fields. But maybe it's a little less well known to machine learning folks. And then a bunch of it is a machine learning approximations that can make a problem tractable that wasn't tractable before. So I mean, actual applications, you know, we have a key scaling problem in a lot of ways where like a manufacturing company and that our volume tends to almost double every year. And we have a laboratory that has to run assays with, you know, actual robots. We have rather complicated like standard operating procedures and business process models that you'd care for execution, not to mention like, you know, audit logging and accounting stuff, it's the medical field. So you have to be compliant and follow, you know, and not just like, kibbalas, which are complex, but also like contractual obligations to insurance companies and things like that. There's a lot of complicated process modeling. And then there's a lot of knowledge worker problems. So we have on staff, you know, dozens of PhD geneticist and biologists who have done this curriculum task of curating the medical genetic literature for any scrap of evidence that could inform whether this is very ant that seems to be breaking the function of a gene and a patient could actually do the causal factor that puts them at higher risk. Because we still don't know, most of the, most of the variants that are analyzed in medical genetics, which still uncertain what their eventual effectiveness would be. And that involves, you know, literature mining, all of the most contemporary and all of the methods for entity extraction and relationship modeling, thinking to ontologies. You know, we don't, we don't get into things like summarization because it happened, you know, even the fanciest, most expensive model, but it's not confident enough to write a medical report for someone, right? But the sort of language modeling that goes into something like GPT3, like we can use that for concept embeddings for extraction, for classification, for recommendation engines. So we have a lot of that NLP work that a lot of the rest of the world thinks of. And then got a fair chunk of computer vision problems, whether they're things like doping at processing computer vision or their computational biology problems. And about half of my team is devoted to more core research, advancing future products, doing academic collaborations with folks. So they're really trying to struggle with the problem I stated before, which is like geneticists, traditionally focus on big diseases with big mutations. But there's a lot more subtle signal going on for almost everyone on the planet. And a sense that's a signal detection problem. And it's the higher complexity. It's silly to think about this way. But if you just imagine we have 25,000 genes working in combinations, right? How do you search a space of 25,000 combinations, 25,000 factorial combinations, man? So the hope is that things that were completely intractable before, by an emeration, could be tractable by approximation. And so that's one of the great hopes for computational biology is that we can produce a search base with machine learning. And then, so we cover computational biology knowledge and operations. That's a big breadth of stuff to worry about. And then on top of that, I think there are things like graphing vettings for heterogeneous networks, where there's lots of reasons to believe that heterogeneous entities out in the literature shouldn't be just treated as like word tokens that you learn with a language model. But instead you can layer on causality and known relationships. Like biology is this kind of fascinating field because if you if you really care about Newtonian mechanics, then you probably don't need a neural network approximator to tell you how fast the ball's going to roll down the complaint with a certain coefficient of friction in whatever way. Because you can physically model it really accurately. And in biology, if you open a biology textbook, there are all these cartoons. So this protein binds to this protein and they both bind to the DNA and then the RNA is made in whatever way. And they are, like they're not just cartoons that you memorize when you're a biology undergrad, they're actual physical models of a material process of the universe. But the uncertainty is way higher, they are rough drafts. And because it's a tiny little some microscopic machines, historically we don't just take the picture, I guess that's a listen was true because electronic microscopy is now getting really good in x-ray crystallography and some ways is really good at that. But for the most part, you do it by inference. You do some experiment and the readout is like you look at a different color to like jelly, yoke, agro, send a tray and it's all binding friends. So when you see like when it was like CSIT B shows and they're looking at the big bands of DNA, it's a very abstract version of the actual physical process. And that's where like it's great for machine learning because there's enough structure to that cartoon that you don't have to imagine every possible forespector. You have some constraints but it's uncertain enough that it's not in company in mechanics. So modeling it with uncertainty and then using those indirect observations to guide your search in a lot of ways, it's a perfect filter for using model based machine learning. Well, okay, so I'm taking you like a mental note of all the different applications that you mentioned. I have so many questions I need to run. But maybe we should start with the last one because it seems very intriguing, right? So like why would a company like yours care about modeling like the chemistry of molecules? What is that? Do you for you? Yeah, so I mean we know that if you put a change in this DNA sequence, there's a highlight that is going to change what amino acid is put in the protein. Very very predictably we can predict that from basic other biology knowledge. But we don't necessarily know that's going to affect the function of the protein. And the easier ways historically to make computational estimates of that were you know compared the sequence of that gene across a thousand related species or you know 10,000 humans and see like as always the same letter and it's probably important because of it's not important the revolutionary that it float around. But there's actually quite a lot of flexibility in this protein where they're still functional. So there might be some said people where it's different and it doesn't actually matter. And if you're a actual biochemist then you might go do experiments in the laboratory like seeing how the proteins actually touch each other and discovering that the enzyme works better or doesn't work as well and it's a really expensive and time consuming to do that at that like slow process and scale. But if you had molecular models of those physical properties then you could do in silico experiments and say like well I can't be sure that the enzyme is not going to be as efficient but based on a whole. Is it in silico even like that? I don't know just that's not a chance or even I love it. Yeah right so I mean that's not me right that's a biology well biology loves Latin. And yeah so that's a well tested phrase and compute biology for a long time. But yeah that's the right answer that's what it means. So you're doing this simulation and then you can say like well with some certain need based on the parameterization of those actual biochemical experiments that other people have done. This looks like a big change and therefore it's going to affect the function of the gene and therefore we have more reason to believe in a very like bazy instance like our both increases that this is the cause of someone's disease. And is this like is this something that would be like kind of something you'd really do in the future or is it like in use now? Is this like something that everyone would have to do to make a realistic model of like I guess how in use is this kind of modeling for for this. I mean what you need to look at this. This is something you do every day like. I mean this is definitely a thing that our company does like through the team that does this and you know I think it's it's also an interesting example where I think it's the case where industrial research has more potential than traditional academic research just because of the volume right like the biggest academic collaborations for genome sequencing don't actually get to the same number of people as come through or are samples and they're not as enriched for people who actually have disease like the big population genome sequencing centers in China and the UK and you ask you know they're not generally systematic and going after people with disease like we have an ascertainment bias it's actually a benefit if we want to study disease because people with disease and their families come to the door and that means that we can do stuff that you can't do if you're working it the Broad Institute at MIT and Harvard or the you know I came rich with your P-Band from access to like we have access to data that you can use these methods on at Ponyl's yeah and how do you actually set up this problem like how do you like formulate it and trying to like put this my mind is just like how did I set this up is like a machine learning problem that I can actually train on like it like is it standard like what the loss function is and like what what could you actually observe to to put into this yeah right I don't think there's a canonically true answer to that question but you know we can talk a little bit about the pros and cons of approaches right so one thing is it's not like it can seem a recommender system where you recommend products and people could come in and buy them or they don't in fact like that's just that's diagnostics in general has this problem of no ground truth like people people die of symptoms on hospital beds and their doctors don't actually know in some sort of like you know Plato or Stalkle or Stalkle sort of way like why they died right it's just that we have a stronger belief about the causality so you can take a label data set and say like these people were diagnosed and they had that variant and I'll make a model that can predict that I'll come with us who were by signing with it but you're not actually dealing with ground truth because some of those people have the disease they had the variant and they had the disease but they actually had the disease because they smoked cigarettes for 50 years or because they were 90 years old or some other compounding factor was there right so if you want to try and think of it as belief then you can go down there like Bayesian probabilistic graphical model causality today a parole explainable AI path that I think people are excited about talking about but you have to know like a lot of human knowledge goes into that and it's not as simple as like I have some label data and I'm going to train an arbitrarily deep neural network to perhaps make the soft mix so you end up working a lot with like how do I take those physical models of what I think is going on in biology and you try and design the algorithm to do that whether it's with like causal graphical models or it's just knowing like from these feature vectors I can learn an out of encoded representation that should in theory account for these factors we know from the physical model are important and then I'm going to let the neural networks set the weights by showing it what are observations that are the closest to the ground truth that bullet or how but it sounded like there's sort of sub problems here that people work on right like you talked about you know kind of people like looking at like proteins and mixing them together and kind of seeing what happens like is that like a like a sub problem of this bigger problem where you could have different observations and build a different model around it? Yeah right so I mean we do have a web lab team that is like collecting basic molecular reality data but one of the awesome things about biology is that lots of other people are doing that right so you know lots of professors and lots of universities and lots of their grad students are collecting and publishing data in a way that is ingestable for us to learn some of those things but their places where you may identify a key deficiency in that knowledge set really well it's worth it for us to do this experiment because it would really help us prematureize what we think is missing as well and you know so then you have like from a from an industrial research perspective you kind of have to think about the cost benefit like is it worth spending up a wet lab initiative to do stuff that's hard to do at scale that's a fill on that you know you want that future vector it better be worth it because it's not cheap sure sure although it sounds actually kind of a lot of it's a it's a market of these civil arts like exploring a space of like hyper parameters for you know for it I mean I think it's I mean maybe it's more like if you had a product for a commander and you know everything that everyone had ever clicked on but it just still doesn't seem to have that much accuracy so you send out some design researchers to talk to your customers and front like sit in their house with them and talk to them and go like I see like oh that's weird everyone's buying a de-diss I didn't notice that before is that in the model let's go find out all the shoes everyone buys and then see if that pussy accuracy got it that makes sense but the thing is it's like if you if you had the whole life history of like me as an individual and everything I'd ever done then you might be able to start down the path of that modeling but that's crazy you know I'm gonna do that you just look at my eye click date and make some recommender that if it has 30% accuracy is going to make a bunch of money for a head company right but if you're telling about someone's health and like complex things like biology then you want it to be higher accuracy and you you got to go actually model stuff out deeper so I guess another another whole like field that you that you talked about doing is sort of the what do you call it like sort of medical and LP or like you know buy on ferrata x right you know and you talked about I mean this one thing I've been kind of curious about you know you see in a lot of progress very visible progress in NLP like you know notably like GPT-3 but also like these you know word embeddings becoming super popular has that has that influence bind from Alex is that like directly applies can you like find to in these models on like medical text domains or what's the city there today yeah right so I think there's two big problems in the industry that people would love to solve one of them is comprehending medical records and they did one is comprehending the medical literature and when you state the problems they sound the same right it's like I want to extract the entities map their relationships and then link them to ontologies so that I can structure the data and then make quick reads over it and if you can do that then you know like the challenges the practical challenges are things like can I show two one of our clinical scientists the right piece of literature at the right time to help them make the right insight about this genetic variant that's never been observed in someone before right and then if you look at the medical records it's like how do I take this like allegedly structured unstructured data and turn it into something that's actually structured so that we can like make trajectories of people's disease progression or predict the risk and so it turns out that training training language models on like you know couple books and your times articles and we can video that does not actually help that much but also kind of surprising like several years ago I did some experiments when I was still I used to be at IBM Research for I had to be extremely quick we did some experiments where we had domain specific purposes and general purposes we would train the same models and kind of to my surprise the bigger general corpus helped more than the specific corpus and that was like an early transfer learning kind of insight like take the biggest corpus you can get and then transfer learn is a good idea what's hard is the concepts are the same to humans but when you look at a medical record it says like mgm brca and that means maternal grandmother had breast cancer and you look in the the medical literature that's published academically it's been talking about the relatives and it doesn't even say breast cancer it says you know a latent new plays on of the tissue like you don't even know it's talking about the same thing right so mapping the concepts across this tricky and just this syntax right like the medical abbreviations and that's it's the almost like there it's you know it's almost like it needs its own language model so having those are some of the hard problems for contemporary methods to actually work on especially other the box but we usually take things like you know so we do take in code or decoder based sort of transform models and adapt them pretty readily with supervised training and it's definitely better than starting from scratch but it still requires like you know domain experts to link links stuff to get there or it takes you know some like weak supervision data programming sort of methods where people are writing roles that make a lot of sense to weekly label the data and you know it's not as good as human the whole expert data but you can kind of bootstrap yourself into having a better data set to train on so you know some of those methods work really well in biology yeah I don't know if that's that's really interesting do you have this sense that well I don't know I have the sense that recently NLP methods have improved a lot like when I look at you know scores that I'm used to from like a decade or two ago they just seem much better over the last couple years is the same thing happened in the medical field kind of right so like if you take you know if you take like the scores on question answer data sets made out like the models better at answering standard question answer questions and I am right super very impressive right like but but I don't think you would expect the same thing to be true with medical question answer and a bunch of like specialist doctors and whatever domain so like no one expects a chatbot powered by a GPT3 to be better at giving medical advice but it doesn't mean but that said like the language model that's learned could be extremely useful for facilitating a human expert and so I think that's where the hope is at this point it's this kind of like AI assistant you know better information retrieval better support for the expert is the current health point you got it so I guess you know like a general problem that a lot of people ask me about and know a lot of people listening to this kind of what kind of wonder about is how do you think about structuring your team like you talked about half the people doing kind of core research but then also it seems like you know what you're doing is very connected to what the company is doing like do you try to like literally separate the people that are doing the sort of like apply stuff and research stuff or do you separate it by the the sort of field of work or how do you think about that? Yeah it's a really good question and I think I suspect the answer that I have today will be different than the answer I have in a few years which I know is different from the answer that a few years ago and it feels like one of those things that like will keep reinventing like will what keep reinventing you know how to deploy software and will keep reinventing how to provision infrastructure and will come back to the same basic principles that people thought of a few decades ago but will keep refining it and I mean so right now like the company you know the company is effectively you know the company still works like I start up with very clear product driven vertical teams and the idea that we were going to in view a machine learning capability to the company is hard to to figure that out right and it's a little different if you're if you're a Google and like well you know the company is built on machine learning based information retrieval so we kind of expect everyone to take a machine learning approach to something right so you know the I guess the direct answer is like we have a team we call it a functional team everyone goes to meetings together hangs out together checks in together but people have different projects and you know it is definitely been hard for some of the team members like a common source of feedback is like I don't know what everyone else is doing because everyone is working on something else I'm used to working with four people on a specific project and we talk every day in a stand-up meeting and and this team like everyone's doing something different and so you know for the people who are the people of the team who like went to grad school and experience like what that's like to get a PhD where it's ultimately up to you to do your thing they're more comfortable with it because they're like yeah of course we're all doing your own thing and reality like every of the hope everyone's not doing your own thing right like I hope that there is cross-for-vilization and support and it's like kind of inherently matrix but the goal is you know we reserve some of the people's time for for research because if you don't explicitly kind of set aside the commitment then it will be absorbed by whatever like demand of the product team in this work time and then we set aside some people's time to develop platform that are modular and reusable with the hopes that we continue to review that throughout the rest of the engineering teams and then we set aside some people who are them like functionally assigned to specific engineering projects whether it's to realize one of the research projects into production or it's to leverage one of the platforms for a problem or maybe it's just like someone has a pretty shape forward problem and they need like a psychic learn model and it's going to take someone like an afternoon prototype in three weeks to get a production so we stick someone in there for a sprinter two and make sure it happens I say so the I guess in some sense it's because very zone defense kind of strategy like it has to be flexible right right and and do you then hire people who have sort of like knowledge of like multiple topics they seem like such kind of deep fields that are kind of different like it is a possible to find someone that knows about multiple these applications yeah so we hire people a specific expertise for sure like I am actually just extremely fortunate like I was a soft engineer who went to good at graduate degree in computational biology at a time when doing that probably also I'm going to do a lot of biology and then I went and worked at IBM in a research division with just this huge diversity of industrial interest so I was exposed to lots of different AI methods and that was not something that I knew was going to happen to me but was really fortunate and what that means is like I met people in different industries at different conferences to understand like oh there's this kind of boutique thing that was you know kind of popular two decades ago but continues to be a core technology for NASA or Toyota and not a lot of people they attention to it but man it can solve a lot of problems right which you're just not going to find like a course era course on so you know it's great because we can find people with that expertise and if they're as they're CS PhDs kind of fortunately right computer science PhDs are generally interested in stuff like if you if you practice an LP algorithms you're probably still interested in being computer vision right so I think that's a you know that's a fortunate thing right I can I can find someone with expertise and information retrieval and they can still make really meaningful contributions to other types of problems and other subject domains one of the harder things is getting the biology knowledge as all of the stuff that they can talk to the biologists to any other stakeholders and quickly understand the problems in that. That makes sense and I guess one of the things that you talk about a lot I think is the importance of engineering to making all this stuff work do you do you hire just like kind of pure engineers entertainment or do you rely on that's a team to provide that? Yeah I know I mean it gets really important and the research projects it's really important to be able to prototype things because like I hope I hope your listeners find the eloquent but like my experience in life is I may have some beautiful complex system in my head and I have a very little ability to communicate it to other people's brains and building a prototype for the helps and you need a diversity of skills and even a small team to make that happen like it's just a waste of everyone's potential to ask the algorithms expert to write some react friendly that you're going to throw away after you showed off to a stakeholder so it'd better to have a you know better to have a job as a programmer on him for that. Do you have a ratio that you shoot for a while I'm just kind of curious about this is like sort of algorithms to like implementers I think it just depends but we try to maintain like a bench of depth so that we can recombine it right like I think a lot of really high-impact projects can be done with like one algorithms person prototypes of things hands it off to one or two engineers who implement the thing and then we further made it off afterwards implemented to engineering team that's going to love and care for it in the long term and maybe come back to us if they need new features but to them it looks like you know software that could have come from anywhere other projects you need you know we have some some more challenging algorithmic problems where we have like we'd like to get a approach of probabilistic programming and you know there's not a lot of mature frameworks out there for that like Google and the array I both talk with there is some but you need some pretty heavy lifting on over the development some kind of fearless back-end engineering charts so to make anything happen and then once you have the ability to make anything happen then you also want to lay your in like the computational biology expertise to make sure the right modeling step that I described before it's happening so you know that could be a that could be a several-person team just to make the prototype because it's complicated and the tooling requires help but it's not as simple as like you know but back-end and a reaction from the end or something. What are the things that I've kind of been noticing you know we've at my company we've seen more and more interest in customers coming from pharma and kind of medical stuff and it always feels to me like all of our customers it's the biggest kind of culture clash like just like you know basics stuff that I feel like I haven't discussed in the long time like you know there'll be suspicious of like open source software and I'm like oh my god like I like why like you know it's like you know 1995 you know does that not happen and if you take because it's it's sort of a newer company and sort of you know more maybe CS focus or do you also kind of feel that kind of working with with biologists. No I think I don't think it's a problem here and like certainly I saw that problem with IBM customers at times I was lucky you know and I became a very huge investors in like 20 years ago and like it was clear to everyone why that continued to be the case but I would see it from other companies who like I would prefer the lower performance, more expensive proprietary thing thank you. I mean I think you know one of the virtues of NBT is that you does have any other conveyeria ethos and you know get their get their faster get their cheaper is a good idea. Totally. So I don't think there's any skepticism there but you know sometimes you collaborate with like the insurance agencies or the or the insurance payers or you know like Medicare and then you're into a whole perk of like it's not even individual skepticism right it's like in a semi institutional skepticism it's like codified in contrast. Yeah so I mean you know for us it's not a problem at all I think I would imagine you know the bigger the company the older the company it's probably true in every sector but a lot of the big old companies in technology got over it all right yeah that makes sense. Well you know we always end with two questions I want to make sure we have time for them they're kind of broad and feel free to expand a little bit but you know one thing we always ask people is kind of when you look at like what people are seeing or what people are doing in ML like what's the topic that you think people don't pay enough attention to maybe like the skills that they'd like to hire for but nobody's studying or something that you'd like to spend more time on if you could. I mean so just reasoning in general and I think this happens you go to if you go to like a general AI conference whether it's one in recent favor like I see Mellor in Europe or it's like the like the older's triple-EI sort of standard conferences lots of people will keynote speakers will talk about like fast and slow AI or system one in system two or whatever but I think like no one ever actually wants to do reasoning because it's so hard. Right and but then you see like communities of self-flagulating academics limiting that they're only competing to get a higher you know F1 score on some published data sets that's been around forever and what's the actual use of it all and I think this conversation is also often turned to like well if we were doing some more complex reasoning thing then it would be more valuable for mankind but it's just hard right so that's why I said earlier like we're bringing to the publisher programming ideas because you can take a you can take a causal graphical model that can be higher they explainable and you can not have to monitor this sample it until the end of time thanks to variational inference and you know frameworks like Edward and I don't know let me get what you see. I think that's going to push our ability to reason about really complex things and bring human expertise in and let people help correct the models and do a lot of the things that they're just frankly feel like we'll talk about doing but it are hard to do and it seems also a bias against systems and academic conferences right like no one wants to write a quote unquote systems paper in a workshop they want to write an algorithms paper that's going to get cited 10,000 times but that work is probably more important or they're putting together everything that solves a problem is really valuable and you know which we trained grad students to think about that instead of to think about hyper-primper tuning effectively right in that's you know if I can change one if I can sign my fingers and change one thing about the field I think that would be it is like pay attention to complicated systems because they'll help you build things like reasoning engines. It's really interesting I guess I've not made a connection between reasoning engines and systems like those two both seem like kind of separate tracks is there something about like making working systems in your experience it really requires like reasoning. What's good I mean so you know if you take an example like the word embeddings or graphing embeddings. Once you have the representation of similarity you can rank documents in calculate a f1 score but you can also give it to an expert and say I found this thing for you do you think it's the right thing or not and if they say yes you can further process it and extract some more information out of it for a specific purpose and if they say no then you could ask them like why not and reason about the entities and relationships you've extracted and actually like auto refine your model from the feedback of the user but that's like an HSEI problem that's a you know an interaction problem that you're not even going to start to touch unless you're open to the idea of like building some boring system that ties together user interfaces and back-and-systems that are not all missionaries. Totally. Okay so final final question is is basically when you look at in your career kind of taking stuff from the sort of prototype version to like deploy it in the real world and useful where do you see the sort of biggest bottlenecks or biggest problems? I think the biggest fundamental problem is when you work in industry and you have an existing company but probably also when you have like a start-up and you're trying to get funding for it to have buy-in from the product philosophy from the outset. And to have some willingness that the prototypes might not work. So like you need a need a foundational definitely going to work plan to make a product but to have a reserve 20% of the resources to try this crazy thing will prototype it and if it works a little bit great you've got to have the person who's going to take it to market care about that idea when you have a bunch of researchers like hanging out making cool prototypes and then they take it around like a toddler who made a thing and they're like oh look at this thing I made you don't you love me. I mean I think almost every researcher I've known in industry could identify with what I just said as the toddler because we all think we have some brilliant idea and we make a thing and we take it to people and they're like I'm sorry I haven't done that in right now. I don't understand why does this thing that I already have a thing that I already have I think that recommends papers. I think it uses a regular expression right they don't care and they don't see the value so you know you have to really get the buy-in at the beginning or you can spend a lot of time making a hard thing and probably an expensive thing happen and then it doesn't actually go anywhere and it's more emotional than strategic like you maybe have to like be open to the idea that they might not see the value and what you want to do and that helps to prioritize what did you? It's just and we've not heard that answer yet but that that really resonates that makes one sense. Thank you so much this is a lot of fun. I really appreciate your openness. You're welcome. I'm really appreciate it. Thanks for listening to another episode of Great Decent. Doing these interviews are a lot of fun and it's especially fun for me when I can actually hear from the people that are listening to these episodes so if you wouldn't mind leaving a comment and telling me what you think or starting a conversation that would make me inspired to do more of these episodes and also if you wouldn't mind liking and subscribing I'd appreciate that a lot.

68.86649

37.48749

3y ago

Dec 28 '22 14:54

21k1eiuz

Finished

Dec 28 '22 14:54

2157.024000

/content/boris-dayma-the-story-behind-dall-e-mini-the-viral-phenomenon-vxc8fkqqxgm.mp3

tiny

I don't know if you've seen that account, like we have Daly, they have crazy images, you know that they put on the demo organ from Stranger Things holding a basket bowl. That is insane, that is so cool. And you know, you go through it and there's like amazing things that are like self-hung. You're listening to gradient descent, a show about machine learning in the real world, and I'm your host, Lucas B.W. Boris Dama is a machine learning consultant and a long time waits in Vias's user. In fact, he started using us in such early days when I knew all of the users. Boris has gone on to build a model called Daly Mini, which is a model inspired by OpenAI's famous Daly project. And somehow Daly Mini has just captured the imagination of the world to the point where I've seen it in New Yorker cartoons. And I see it all over my Facebook feed, it's just an amazing piece of work and I'm really excited to talk to him about it today. So, I mean, maybe taking a step back for people who maybe aren't familiar with Daly at all, could you talk about what Daly is and what Daly Mini is and Daly Megan, how have you been working on this? Yeah, so Daly is that it came from that paper, from OpenAI, of beginning of last year. It was really amazing and actually, in the first time I saw it, it was a tweet you posted about it. And I replied to that tweet and I was like, I'm going to build that. And I was like, this is so cool. And I was like, I'm going to build that. I want to build that. And it was basically OpenAI, I don't know, first like, impressive image generation model that you would type any prompt and you would do something that looked actually cool. Because before you had like, image GPT or whatever, which would do like something very tiny, you have a bit of the idea, but it would do something more complex like the avocado armchair, which was cool at the time. Now, this avocado armchair is just something simple. It's nothing impressive anymore. It's crazy like a few months ago, I was still very happy when I had good avocado armchair. Yeah, so that's where it came from. And basically, like in July of last year, I wanted to build it. I didn't do anything for six months. I read the paper quite a few times. I don't know how many times I read it. I read it like a ton of times. And I didn't understand much to it. And at some point like hugging face and Google, they organized like a agatha. Where you had to build something cool in a jax, which is like a programming framework from Google. And you would have those cool computers from Google, those CPU, VM, and I was like, okay, let's not opportunity to do something cool. So I was okay, I'm going to try to build the replication of Delhi in terms of the results. And it turns out that it worked pretty well. I researched a lot of the paper, some people joined in a team too. And somehow the program had pretty cool results for such a short time frame. Then I continued. Can you describe how the program works? I mean, it really feels like magic. How has it actually set up? I know. It felt like magic for me for so long. And I think even if I read the paper again now, each time I read it, I learned new things. So basically the way it works is you have a good model right now for NLP, which transform text into text. So for example, let's say for summarization, or let's say for translation, going from English to French. And you were trying to do kind of the same thing, except instead of going from English to French, we want to go from English to an image. But it's almost like the same. It's just a translation. So the way you do it, when you do text to text, you encode each text, become a token. It's just a number, a unique number, and you try to predict that sequence of number, and each number corresponds to some text. And we try to do the exact same thing, except that each number corresponds to a patch of an image. So that's solid. So first, you need to try to create an encoder that's going to transform that image into a sequence of numbers. The same that the tokenizer for text would do. And once you have that, let's say, this is what I'm a little bit from it. So, oh sorry. You have an encoder that's taking text, right? And turning it into some kind of encoded vector. Yeah, that's right. And then the vector goes into a decoder that creates the image. That's right. But what did you say about the batches or? Yeah, no, that's exactly right. Except typically when you have the decoder, it would create a number that corresponds again to text. Now, you want that number to correspond to some kind of image. Oh, that's right. So you could try to do our GB, like the pixel values. The premise there would be too many. So it wouldn't be very efficient. That's what image GPT was doing. And that's why they had like maybe it was 16x16 squares or so it would be very, very small and it's very limited. But instead what OpenAI did, did each number corresponded to a batch. So for example, one number can correspond to a green batch. One number can correspond to a blue batch with a yellow dot in the center. They all correspond to something more complex. I say. So you train basically a separate model that's completely independent and that's trained separately and frozen later. That learns how to create those patches. And the goal is you want to, because you're limited in vocabulary, maybe you can create only 10,000 different patches. 16,000 or something like that. I think it's what is used commonly. So what you do, you want to create the patches that are going to be used the most of the one that are the most relevant. So you have a model that basically is trained to find those patches. It looks at a lot of images and tries to anchor it into a code book where in the end, when you reconstituted the images as close as possible to the original ones. Once you build that, once you're able to go from image patch to a number, it's the exact same thing as doing a translation. And are the patches in like a grid, like a two by two grid? That's right. So often, I think for me, the picture is a very each patch is about like a 16 by 16. Wow. It's amazing. You don't notice that. Yeah, there's a big patches. So it's not completely independent patches because it's a unit. So with the convolution, there's a bit of overlap in the middle, which makes sure that you don't see those patches. I say. Yeah. So they're kind of blended together between. Exactly. I say. Yeah. Interesting. And so it's the same as like with like from the attention, is all you need paper with with the like attention encoder. It's kind of identical to that translation model. Yeah, yeah, it's very similar. So daily what they do, they have just it's just GPT. So you have GPT, reads the text, then there's like a special token maybe. I mean, I don't have exactly the details because the code is not released on the paper. So the paper is very detailed, but it's missing a few of the small details. So it's like you have all the text and then you have the encoding for the image at some point. And so it predicts and at some point there must be a special token that says N out it's an image. So it's switched to another modality. Me, my model is slightly different in the sense that have the encoder. It's like a translation model where the encoder and decoder is separate. So the encoder will read the caption. And then the decoder we did I imagine will be causal. And the idea was that the idea was that it would be maybe more efficient. And now it was like a one the like string of text. I kind of understand how you feedback each individual one into the decoder and it looks at the previous words. But how does that generalize like a 2D image? How do you do that? So the 2D image actually you just put the patches next to each other. You don't consider the 2D. So it's actually an issue too because basically when you when you predict the image it's almost to predict from top left to right to bottom right you know. Oh, is that right? Just like a line by line. Yeah, yeah, that's what is done. So that's a premise because if you mess up at some point it inference the the entire rest. That's a limitation that diffusion models don't have for example. And now like how do you actually train this? Because I would think it would generate images that would typically be just totally different. You know each image actually is quite different from like an RGB value. So how is the training work? So the training the way it talks is like the image you will first and cut it into a sequence of numbers and then your text will be also a sequence of numbers typically when you have your tokenizer and your input basically is going to be those numbers for the text and the output is going to be those numbers for the image. So the the image in code you you don't you it's frozen at that point you don't train it. So you just have you just go from numbers a sequence of numbers to another sequence of numbers. So then what you do? You have that the code very pretty the logic it will predict what are the numbers for that image. And it's the code that goes from one to one and the attention looks only at the previous patches and it will it will predict it and then you have the grant with because you know what is the real image. So then you're going to calculate cross entropy you're going to calculate the loss the same that you will do with another problem to see how wrong you are. So basically it's kind of strange because you have a caption that will say a cat in the field. And then you have a grant risk image and basically the goal is to predict that grant risk image. It's a bit contrary to it that it works because there's so many possible cat in the field. Right and it could be right and you try to predict only one specific one which is that specific image you're looking at right now. But somehow it works somehow it learns to eventually you know to to associate concept to minimize the probability. That like the way to the simple way to think about it is let's say if you want to predict a view of something by night or the view of something or the beach during the day. But the model will try it will know that it should probably be dark images at the top. It's going to be black because it's the night or it's going to be very dark blue. So I'm going to just increase the probability for those tokens. And if you say during the days like it's probably going to be blue maybe there's some sky up there. And then when you touch put the next tokens it already knows the previous ones. So you know you have part of the image you need to predict the rest. It's easier to predict. Oh so when you're doing the production you're feeding in part of the ground tri-themed and then trying to complete it. So you're feeding the anterior image but basically each token sees what is before itself. So for example the first patch doesn't see anything but the second patch sees the one that's before. Not its prediction it's the one that actually was before. Sorry the second patch token sees the ground truth patch not the predicted. Not the first predicted patch. You're right the second patch sees the ground truth of the first patch. And let's say when you predict the second bottom half it already knows the exact ground truth of the first top. And it also sees the prompt. So you know it's the prompt so it's supposed to help. Right but I guess I would think like if you were just like say you know a cat at night or something. If I was going to guess like just a whole image that I want to be closer from like an RGB space. I could imagine if I'd be better just predicts like gray than like drawing a specific cat a specific point. Yeah that's actually a huge prime that I remember I was playing with colorization before. Yeah you you predict gray is like the the the best. The most chance if you don't know anything just out but a gray image. Right but but in that case the loss is cross on tropey on tokens. So you don't have an advantage in predicting gray because you have the same loss for being correct or for being wrong depending on the color. Being same gray or blue while it's red you have the same loss there. Oh so you're picking the probability the next token and there's only a set of I see. No I see. So so there's you don't have the problem that it's going to it's going to show like a black and white images or it has to predict something and a color is to me as well. I see. What was it like to to build this like what kinds of issues did you run into? I think this would be one of the things where it's just incredibly hard to debug when the program's not. Or yeah yeah it is it is quite hard because you don't know why it's not turning and what's happening. I think you know what what was good is the first version that Dalimini maybe we are like he somehow it worked well pretty fast. I know it would be a bit of those details you know like single aspect you can have a little mistake like you have a offset of one token or you have little primes and it doesn't work and we use the pre-train model. So we use just a similarization model and we decided okay the decoder returns from scratch now it needs to predict the images and what was good is it actually it actually worked pretty well fast. So when I when I worked later more on the larger model you have always a baseline you know you know is it getting better or worse and but there was some bugs that I spent two weeks or more to understand what was happening why doesn't work. Can you tell me about one of them? One difficult one was like let me try to think of an interesting one. An interesting one was like I was trying to use a liby which is like the way you encode the position and bedings and I'm not sorry I messed up. I was trying to think former. So think former is a certain way of adapting the transformer where you normalize the token and all but it's used for encoder models but for decoder basically you don't realize it but you have some information that leaks through the next token so I would have my list that was going very well very fast to close to zero and that no clue why and it's just when it normalizes it gets some information of the future and that was really hard to understand why that type of model didn't work but the biggest challenges were actually not when you make it larger training does large models is very very hard and when I first had this small model I was okay it's going to be easy to have a large model I'm just going to add more layers and train it to a bigger computer or for longer maybe a bit more data and it's just going to run but but unfortunately didn't work and that was like that was a bit sad and first you have to you know be able to split the memory well across all the devices it's not very easy because one model doesn't necessarily fit on one device so you need to spread the weights across the other devices jacks has cool features to do that by the way that was very very helpful to do that but then you know that you have the model that becomes unstable you have peaks that happen randomly you know it starts well the first hours you have your most at good zone and you're so happy and then suddenly you have like big peaks and I go okay fine I'm going to restart before and they're okay now it's going through but like five minutes later it's another peak and it was like really really out on that level something that was cool is like as I was training it I was making my my reports you know in weights and by season I was stringed them online all the time since I was working on that and what's crazy like the the community on Twitter and all got very engaged with it and they were like so helpful you know actually I don't know if I would have been able to to build it with that success without having shared all the journey because like I had like few key elements that helped make them a lot better and that were shared by by replies on Twitter like oh maybe maybe you could try that like the optimizer we used like distributed shampoo and it just came kind of random in the Twitter or super-oconditioning from from Katherine Cross and it was like maybe it's random things that I discovered along the way just by sharing that training publicly so it was very beneficial for me that's amazing series sharing weights and biases reports and Twitter and getting feedback oh yeah I was getting feedback and I think a little while it told there was interest I was showing the new predictions and then I was like okay I wanted that little demon lines so people can play with it a bit and so people already engaged like oh yeah I see it better at that or it's bad at this and it would give me idea and what to correct and having it open helped me a lot because in the end it's almost like it feels like it was not just my work I got like free advice from everybody too so it was it was really good it would be fun to see all the weights and biases reports I wonder if you could make a collection of that. I did. I hope you have a main report basically and I linked to all those reports I linked to all the main ones if I'd willing to all of them they would probably be like 50 reports or more and there's some I did just for myself that I wouldn't share because there were a mess you know like the things like sometimes like oh yeah you have a conclusion you do some test I'm like okay it's important maybe choose the report or not not choose or weight decays no effect but you forget it when you do so many experiments you you forget why why you don't use it and so I would always go back to my report I know I know I had those experiments somewhere and I would type of like oh here this is the runs does all the runs that show that you shouldn't use that and it was actually very convenient for me to see why I take some decisions or why the previous could run but not know what was the difference between those two runs it was like it was major help that's really cool do you feel like there is enough in the the dolly paper it's really reproduced it I feel like there are things you had to like key things you had to learn along the way to really get the thing towards so the dolly paper actually provides pretty clearly the main ideas and the main ideas like there's the last model we did and talk about but there's one model that's going to encode the image the one I use is not the same as the dolly is actually one from a terming transformers some other people added to it and added some gannels in there some perceptual loss to make it a bit better so the permit creates those weird artifacts that we have sometimes on the faces and all versus the original one what it would do with do something blurry so and then there's that that model that needs to predict the next tokens so for open aids like a kind of a GPT model meet small similar to bought which is uncoader decoder because I thought it could be maybe more efficient and then they have that clip model that actually was released there released little by little a model larger and larger but that's a model that has like revolutionized a bit a lot of the a lot of the research in multiple modalities and that's include like text and images and now audio people are adapting it for 3D but it's that model that basically it will tell you how well it learns how well it text an image match so it will give us score you give a text you give an image and to give it score based on how well it believes the match so for example when you use the demo we output more images than the nine that you see we output maybe 16 maybe more if there's not too much traffic which never happens nowadays but they're not over those 16 we have clip look at them and it chooses the nine that things are the best and it actually improves the quality quite a bit oh it's just as outliers that you don't get the yeah so one that are really bad typically it will find them as like no don't don't show them but so so the paper actually has those ideas which are the essential ideas then it's missing some details on the hoax train and all that there's a lot of details there's something that are missing but overall it's a good base enough to build something I wish you know like I could just run the code immediately and train it actually you know in a way you know I'm quite happy how to say it you know what you know the fact that it was not released pushed me to motivated me to learn how to build it I don't think I would have learned how it was built if I didn't have to try to build it myself hmm that's cool I mean how sensitive is the performance to the details of how the model works in your opinion like I always wonder this I don't know if you have a thought here but like when I look at the attention mechanism you know we tell ourselves a story about you know the three vectors that get generated and how they're multiplied together oh yeah but I kind of wonder like you know how much do you think that the specifics of that really matters? So I think there's some details that whatever you put it works you know it's like yeah when I started to do machine learning and now you have to do a component to detect cats and dogs and I remember I was like oh I'm gonna try to build my own model and let you like what depth should I put how many layers are like I'm gonna put 12 here and then I'm gonna put 36 and then I'm gonna put less and more and I would put random things for nothing and whatever you do it works you know like I'm gonna put whatever whatever you do in the any kind of works I think you know there's so many configuration where it would work and you actually don't need to bother too much on the loud and more don't you know there's some scaling laws and there's a bit of research and it's kind of hard to know what what works what doesn't work but I try to follow them a bit because I'm like okay some people try to do a few things and I'm gonna try to use you know the same ratio they have of weed versus depth and then I have a report where I try the lot of volume sector and it's for me there's like 100 different variants so I try to a lot of them and actually some conversion some necessarily conversion better but we are most stable for some reason so I try to bunch of them and I picked the one that worked that worked the best interesting yeah some product activation activation function I don't need to matter a lot I try different you know you have initially when you try to hypereporbiters like oh let me try different activation functions it really matters overall whatever you bet is it's okay but but you have maybe there's some advantages you know add one maybe it's hard to know you know I was like okay I'm gonna take the one that that was stable but maybe you know if I had just to kind of know the seed you should have had different reasons so I don't know how much I can rely on some of the conclusions. Cool what I mean what do you attribute to the models massive increase in popularity like I you know I think I we kind of noticed that our metrics for reports are getting messed up because there's so much traffic to your report what do you think's going on? Yeah so I think you know somehow people people think that that model is new but that model has been there for a year already almost it's just like over time I worked on it and little about Italy became it became better and actually the traffic like like the people using the model people involved in the forums and talking about it actually increased a bit over time but I think when I trained a larger model it switched like maybe a critical stage we have so the need to become good enough that good enough for like virality and like I think some maybe youtubers tried maybe forefair they would put their name and they would put their name in different situations like in the golf cart or whatever and suddenly they would see the office and they would see something that looks like them not really that was not very good but that kind of looks like them and they would put themselves in like the craziest situations and I think they got excited about it and little but literally it amplified but yeah it's just rich touch threshold where it was good enough what's fun is like the model is actually still training a little bit some curious of is it going to be much better and there's still stuff to improve so it's interesting it already rich touch threshold that interest people and there's easy ways to make it still better so how much did you train this model on? so it's probably I would say around maybe 400 million wow 400 500 million but you know the data is actually is actually very important and there's some you know there's some tricks you and there to try to make it work when we train the model at first and when you look at a lot of open source model that exists that you that you can play with a lot of some problems that will happen a lot is like you would say okay I want to view of the snowy mountains and you would draw maybe well the snowy mountains and then on top of it you would write sure the stock so I would write alamino and the model had learned that okay and image typically needs to other sure the stock or mark or alamino which are the kind of horrible you know because the team image was like it's completely new and you would have a horrible watermelon mark on it so whenever the first thing I did was like okay I don't want any of those images you know like I hope how to avoid that so it actually it was a little permit for example how did I solve that permit initially you know I looked online here out to detect if an image has a watermelon mark or not there was some things here and there but it was not that great and then and then some people were trying to generate that a set with fake watermarks to try to detect it and it was like already a big challenge to do that and I realized okay I can just remove all the images that have showed us talking the URL and that frame is solved and actually that was the that's the solution I took and it works quite well because you you'd never see a watermelon mark interesting and and I guess how long does it take to train on 400 million images so I think the what matters the most when you have a lot when you when we didn't have a lot of data initially after a while the model overfeets and even more it's a bit smaller you know the smaller model was was what 400 million power medals which is already quite big but after a while you overfit maybe I think after five or six epochs I would overfit and typically that was the equivalent of maybe I know two weeks on one single tip here VM now when you have like so much data maybe more than what your model size can handle I don't know if you overfit that easily maybe you can but that's where you know you need to be very careful of having a good validation loss because there's that cool model that's called real dally the Russian dally which is really really nice but when you use it it looks like it overfeets it's a bit I think it's not as good at composition but it will make really nice images it's used of so it it's on code's images also in a higher resolution that me so they have less of those artifacts but sometimes you realize that you will type a description and we show an image that it has seen before so there's a lot of memory in that so they overfit and I think maybe they didn't have the right validation set and there's a premonent that validation set too which is if you just try to take random set for you training data and you call it the validation set it doesn't work and the reason it doesn't work is typically let's say the Google logo is present on a ton of websites so you have a ton of URL that will have that will be unique that will have that Google logo but maybe the caption is different so maybe if you try to be unique by image and caption because it's okay to have a semi-mage with multiple descriptions but it's it's not good for your validation set because remember when you train it the pixel set the previous pixel so it will recognize the image it may ignore the caption and you will see your validation loss going done but it's just because it doesn't care but the prompt just recognize the images and just predict it I see that makes sense so you're really training on just captions on images that you crawled on the web that's right that's right I would imagine that would introduce all kinds of crazy artifacts like do you get is it a generic cool logo of companies yeah it can so actually it's good at creating logos and it's funny because it's something that made me happy a while back so some person reached out to me like hey my mom started a new business she could have an afford to have a graphic designer I just used a limini I gave her a logo it was good enough for me and I was like so happy that you know it was helpful in that way so it can do surprising things and you know like something that I didn't realize would be possible and the fact that it's open and that so many people can use it that's why I'm learning so much more you know like I realized I was barely testing the model before you know I was putting a cat on a skateboard while people have those crazy prompts you know like my prompt I thought when I was putting the I felt around the moon I thought I was creative but like it's ridiculous I didn't comparison to what other people do that's awesome do you have plans for what you want to do next yeah I think the I think a lot of ideas I don't know where I would go next obviously like the model that was diffusion are very attractive because they do very impressive images so it's definitely something I want to look at how does that work I'm not a model that they're diffusion diffusion so diffusion the way it works instead of predicting the the image in one shot because you're a product a patch and you just predict in a way in one shot you iterate many times and you have an image that's like initially just noise random noise so imagine like random pixels random colors and you try to remove that noise little by little and you you go through it like hundred times or maybe a thousand times a high number of steps there's ways to try to go through it a bit less times but you go through the less several same model many times it's time it removes a little bit of noise and at the end it turns up to an image that's actually cool so it's almost like a compare to be almost if it was like a recurrent model where you know you go through it but the fact that you just remove the noise little by little it gets it to a loss that's very friendly to train so it's it's like super-promising and it's like I mean it's already proved to to work very well with Delhi to and imagine so so that's something cool that the premise of the models is there are a bit more computationally expensive so we still need a bit of research on how to make it more efficient which I think is something interesting so I'll probably look at that but there's also you know my current model as many ways it can also be improved in cheap ways and fast ways so maybe I'll try to do that a bit of fine tuning and a fine tuning it on your own art or you own that I think that could be pretty cool and then there's people you can use the same model to maybe generate sound or music or video you can do like so much with the same type of model so it's pretty exciting what what what where it's going I think it's going to go very fast too very feature we could add to its invices to help you with your work to be fair I feel like I've been using all the features with that project you know after I have the pipeline the model is trained have the checkpoints are resume from those it's all tracked when I do inference I cannot do I cannot do inference during training because it would be too expensive I need to load that that image decoder model so you know it would be an efficient so have another machine that does it that that's linked to the to the checkpoint so I've just entered by planning that setup and that does regularly some inference and all what could be added are you using alerts like for when your model starts to go badly I shoot because I look at the training way too often you know on my phone I feel like you have a little pause or I go somewhere I'm walking and I'm checking quickly is it still training or did the TPU crash or not is it a lot of screen hi maybe I'll make me feel more relaxed well I have to tell you it's been really fun to watch my friends who aren't in machine learning talking about valley mini and they're like I know Boris you know I know they got it made it and they're impressed so congratulations on such a successful model it's really captured a resume congratulations thank you that's fun I like that I think it's cool to so many people using it I was a bit scared that you know because you could see it in negative ways too or it's creating images but overall the reaction has been pretty positive people are happy that they can see through that model what are the limitations and biases and what can it be used for and they can test it out themselves and if I do figure out the limitation and biases myself it would be impossible so it's actually I think it's actually really cool and I like that you know it's used for four people who cannot draw at all like me it's kind of cool because even if the image is not that pretty it's still so much better than what I could do but for people who are actually talented it's I'm happy to see that some people they use it as inspiration to take care this was the output from valley mini and then they use Photoshop and they do something crazy out of it and I think it's it's really nice to see that it can or it's gonna be used that way awesome well thanks I want to first great to great chat thanks awesome I'm touching my room if you're enjoying these interviews and you want to learn more please click on the link to the show notes in the description where you can find links to all the papers that are mentioned supplemental material and a transcription that we work really hard to produce so check it out

57.1608

37.73607

3y ago

Dec 28 '22 14:53

17c058s9

Finished

Dec 28 '22 14:53

3356.016000

/content/jack-clark-building-trustworthy-ai-systems-nv-f1gk8ybk.mp3

tiny

The challenge is like, well, shit, I didn't sign up of it. Like, I want it to do AI research. I didn't want to do like AI research plus the side graphics and geopolitics. That's also not my expertise, I think that's a very reasonable point. Unfortunately, there isn't like another crack team of people hiding behind some wall to entirely show them the burden of this. You're listening to Grady DeSent, a show where we learn about making machine learning models work in the real world. I'm your host, Lucas B. Wolfe. Jack Clark is the strategy and communications director at OpenAI. Before that, he was the world's only neural network reporter at Bloomberg. He's also one of the smartest people I know thinking about policy and AI and ethics. I'm really excited to talk to him. I feel like I typically get nervous when people ask me, like, you know, kind of big policy questions about AI. I never really feel like I have much smart to say. I think the goal of this podcast is mainly to talk about, you know, people doing AI and production. But then when I started writing down questions, I want to ask you. I was like, where is I like, I want to ask you all the policy questions and all the weird questions that I never asked me because I have no idea. And I, like, so the question I seriously want to know, because I feel like you think about this a lot. I mean, this is such a cliche question, but I'm like actually fascinated by how you're going to answer it, which is what, what probability do you put on like AI apocalypse? Oh, good. So we're start with a really easy question and go from there. I like it. Yeah, yeah, yeah. What's your, like, is it like one in 10, like nine out of ten, one in a million? Like, what are you? Oh, what's your? What's your answer? A, I apocalypse is quite high. Most of the apocalypse is, if they get to the point where they're happening, like say, global pandemic, which is something that we're currently going through. It's quite clear that most of today's government don't have capacity or capabilities to do your what you've really hard challenges. So if you end up in some scenario where you've got like large, or ton of as broken machines doing bad stuff. And I think your chance of being able to like write the chip, write the chip is like relatively low and you don't have like a super positive sort of outlook. I think the chance that we have to like a virtual fat and get ahead of it is actually quite high. And but I think your question is more like if if something like weeks up and we end to this very, very weird territory, what are our chances? And I think if we don't do anything today, and our chances are extremely poor. No, well, okay. So yeah, I think maybe I agree. I think it's a surviving an AI apocalypse probably low. But I think my question actually is, is what do you think the chits are of like actually entering the AI apocalypse? And remember that all apocalypse scenarios, they, they, they can't something more than one, right? So like, I mean, in a way like the like a pandemic apocalypse, like a, unless you think they sort of like linked that, that should make the AI apocalypse. Oh, yeah, actually lower, right? I think it was, oh, you know, this is kind of like, but the getting of when you started to the massive amounts of computer trading on the stock market, say what's the chance we're going to enter into a high frequency trading apocalypse? And I think our somewhat more advanced about is it's really high will have problems, but it's fairly unlikely, but the whole system will topple over due to high frequency trading. And I think for my answer on AI is pretty similar. Like it's really high, but we're going to have some problems because it's a, it's a massively scale technology that does weird things really, really, really quickly. And we'll do stuff in areas where we're, we're finance, sizzle, so to point, huge amounts of capital, so the opportunity for big things going wrong is kind of by. And the chance of a total of populates feels a little fantastical to me. And that's partly because I think for a real apocalypse, like a really bad severe one, the ability for AI to take a lot of actions in the world between TV robotics and robotics as you and I won't know, is terrible. And actually protects us from the huge amounts of many parts of the like, or next of populips scenarios. But the way that I think about this is, you develop a load of radical technology and some of the greatest risks you face aren't with technology deciding of its own relation to do that stuff. And very rarely happens, even unlike we here. There's a chance that you kind of get black mold with technology, like somewhere in your house, you're not being clean again, to be able to have good systems in place. And something problematic starts developing in a kind of a merging way. Maybe you barely know this, but that being has really adverse effects on you. And it's also hard to diagnose the source of the problem in why it's fair. Interesting, so, but okay, so that's actually like a little bit less of a, that seems like a much more concrete scenario. Like I guess what, what format that take? I mean, it sounds like you're mostly worried about sort of the things we're doing now. We get, we get better at doing these bad things. And that causes big problems. But what are like top of mind is like concerns for you? Yeah, well, I guess I'd bring my concern as we're currently pretty blind to a most of the places that could show up. And we kind of need something that looks a lot like Web Apport casting and radar and sensors for looking at evolutions in Mr. May. The sorts of things that I've been worried about are scaled off versions of what we already have. So, the innovation system is pushing people towards kind of increasingly odd areas of sort of content or subject matter that we have unrealizing our quietly radicalizing people, what we can people behave differently will be chowder. I worry quite a lot about sort of AI capabilities, interactive with economics. So you have some economic incentive today to create entertaining, disinformation, obvious information. And I think we think about what happens when those things collide, you know, through the AI tools, we're creating this innovation of the information and economic incentive and stuff that's showing off. I think we're going to be relatively few grand evil plans. I think we're going to be lots and lots of like accidental screw ups that happen at really, really large scales and that happen really, really quickly with self-reunportance cycles. And I think that that's the challenge is, is you've not only need to spot something that your parents need to take actions quite quickly. And that's something which traditionally is really, really bad at doing the feet. We can observe how things happen, but our ability to react against them is quite low. But you do like a lot of work on ethics and AI and a lot of kind of thinking about it. But it sort of seems like those scenarios sort of feel as AI special, like it seems like there's kind of a lot that might be like just sort of general technology risk, right? Do you think AI mixes sort of different? I think it represents delegation. So like technology allows us to delegate certain things. Technology up until many sort of practical forms of AI lets us delegate hiding specific things. So we can write down in a sort of procedural way. And AI allows us to delegate things which have a bit more inherent freedom in how you choose to approach the problem that's been delegates into you. Like, make sure people watch more videos on my website. It's kind of a fuzzy problem. You're giving a larger space for the system to sort of think about it. And so I think the ethics, they're not something that humans haven't encountered before. But it's a form of ethics which is kind of has a lot in common with military or how you do administrative states in the old days, which is the sort of ethical nature of giving someone the ability to delegate, increasing the broad tasks to hundreds of thousands of people. That's like a classic ethical problem that people have dealt with for hundreds of thousands of years. But with AI, now almost everyone gets to be about delegation. About really hasn't happened before. We haven't had the scale of delegation and this ease with which people can kind of scale themselves up. And so lots of the ethical challenges are like, okay, people now have much greater capabilities to do like good and harm than they did before. They have these automated tools that kind of extend for abilities. Do this. How do you think about the role of the tools about a very bad context? Because, sure, you're building just like iterations of previous tools. But the scope of which those tools are used to be areas in which we're probably used is much more broader than you sort of downward before. And I think it introduces ethical considerations to be used. But maybe your humans are previously downward. I see. So in your view, AI kind of allows like single individuals to have sort of broader impact. And therefore the tools that you actually make available to folks. There's more ethical issues within that. Yeah. Like a good way to think about this is I think language models are interesting. It has an ethical challenge that I find interesting with language models. You have a big language model that has a generative capability. You want to give that to basically everyone because it's sort of analogous to a new form of paintbrush. It's very general people are going to do all kinds of stuff with it. Except this paintbrush reflects the implicit biases in the sort of data that it was trained on at mass scale. So okay, it's like a slightly racist paintbrush. The problem is now different to just having a paintbrush. You've got like a paintbrush that has slight tendencies and some of these tendencies seem fine to you, but some of the tendencies seem to reflect things that many people have a strong moral view of as being like bad and in society. What do you do then? And I've actually spoken to lots of artists about this and most artists will just say, give me the paintbrush. So I can talk about it and make interesting things. That feels fine. But then I wonder about what happens if someone gets given this paintbrush and they just want to write text for a kind of economic purpose, they may not know much about the paintbrush or they've been given, they may not know about the traits. And then suddenly they kind of unwittingly creating massive scale not versions of the biases and heritage of that thing you gave. That seems challenging and where we as the technology developers have a lot of choice, a sort of uncomfortable amount of choice. And a lot of the problems which are not easy to like bits, you can't fit this. You need to sort of figure out how to talk about it and maybe go where it is. Well, it's a really clever analogy. I've not heard that one before. Yeah, I mean, I think it seems to be a way of scaleability of a lot of this stuff. If we just have tools that let people scale themselves in various directions and the directions are increasingly creative areas because we're building these, you know, scale up curve 15 systems, we can fit really weird curves, including good life interesting semantic domains. But all the problems of curve fitting, now the problem, we have problems of like the production of parts and sort of bought, which feels different and challenging. I don't have a very old to see or I have more like, oh dear, this is interesting. I feel it's like different. But actually, I mean, it's interesting because the, like the, you know, you speak of the sort of like language model is, oh, like, you know, just for example, like what if you had a language model, but I mean, like open it, I actually like had this issue. Yeah. And I'm curious, like how you thought about the time and how you reflect on that now. I think of a time. So this is GPT2, which is a language model. But we announced and didn't initially release that subsequently released in full. At the time, we, I think we made a kind of classic error, which is, but if you're developing a technology, you see all of it's potential. Very, very clearly. And you don't just see the technology or holding it your hands. You see, Gen 2, 3, 4 and 5 and the implications they're off. I think we treated some of our worries about the misuse of this technology. We were thinking about later versions of the technology, but what we were actually holding. Because what actually happened is we released it. We observed a huge amount of positive uses and really surprising what it's like this game, AI dungeon, where language model becomes a kind of dungeon master. And it feels like interesting and actually different, like a different form of game playing, something we wouldn't have expected. The misuse is relatively small. And it's actually because it's really hard to make a misuse of a technology. It's probably as hard to make a misuse of a technology as it is to make a positive use. And luckily, most people want to do the positive uses. So your amount of people doing the misuse is a lot smaller. I think that means the responsible technology developers is going to be more about maybe you're still going to kind of trickle things out in stages, but you're ultimately going to kind of release lots of stuff in some form. It's about thinking about how you can control some elements of the technology while making other parts accessible. Like can you control how you'd expect a big generative system to be used while making it maximally accessible? Because you definitely don't want a big generative model that may have bias. Tendencies, providing generations to people in, say, mock into view process that happens before basically to a human for an interview stage. Because that's the sort of usage that we can imagine. It feels like there's a big, you really want to avoid. But you can sort of imagine ways in which you've made this technology really, really, really broadly accessible while finding ways to carve out parts where new ways of developing kind of same as this probably isn't okay. So I think our big things become a lot more subtle and I think we did, we did anchor on the future more than the present and that's being one of the main things that's changed. It just makes sense knowing what you know now, you wouldn't withhold the model. I think you'd still do stage release, but I think that you do all research earlier on characterising the biases of the model and potential malicious uses. As a thing we did, as we did some of this research, remember we did a lot more after some of the initial models have been released on characterising subsequent models where a planning to release. What I think is now more helpful is that you have a load of that stuff front loaded. So you're basically saying, here's the context, here are the traits of this thing which is like going to slowly be released and you should be aware of it. So yeah, I think we would have done stuff slightly differently and I think for this is, what we're trying to do here is learn how to behave with these technologies and some of that is about making yourself more culpable with this traditional for its outcomes because as a thinking exercise, it makes you think about different things to do. So I'm glad, but part of the goal of GPT2 is bring a problem that we actually don't get to get wrong in the future. Earlier in times, we'll point where we can do different ways of releasing, you know, making some of it all be good and some of it will be suboptimal and learned about it. Because I think it's five, six, seven years, new sorts of capabilities will need to be treated in a sort of like standardized way, what we thought about carefully and getting to about requires lots of experiments now. But it's kind of interesting, I guess there are sort of two kinds of problems. I think my understanding of the worry with GPT2 is actually malicious uses. Which like more information probably wouldn't help with. But then there's also, I think, you know, your idea of like accidentally racist paintbrush. Like that sort of speaks to like inadvertently bad uses. I mean, both seem like potential issues, but do you now view malicious uses as kind of less of an issue? Because I really could imagine like a very good language model having plenty of malicious uses. I suppose you could say, well, any interesting technology probably has malicious uses, so should we never release any kind of tool? How do you think about that? Yeah, again, it's good for what we're doing, really easy question, some kind of what's going on with the GPT2. Well, there's a couple of things. One of the things we did with GPT2 was we released a detector system, which was a model trained to detect outputs of GPT2 models. We also released a big data set of unsupervised generations from the model, so other people could be on different detector systems. I think, but a huge amount of dealing with misuse is just giving yourself awareness. You know, like why are police forces around the world and security services able to actually deal with organized crime? Well, we can't make organized crime go away because that's a social economic phenomenon. But they can, like, tool up on very specific ways to detect patterns of organized crime, and I think it's similar here where you need to release tools that can help others detect the outputs of a big user releasing. For avoiding malicious uses, I think it's actually quite a challenging. I've been for this a little one clear today how you can speak and rule that stuff out. I think it's generally challenging to deal with sort of technologies, some of how we be approaching it, is trying to make prototypes. The idea of being, if we can make like a prototype use case for some malicious and real, then we should sort of talk to affected people, the extent to which we would publicise that remains deeply unclear to me because it's the economy sort of into it. If you publicise malicious users, it's like, look over here. Yes, how you might misuse best think we release, which seems a little dangerous. I think that we're going to need new forms of control of technology in general at some point. I don't think that's like this year's problem or next year's problem, but in 2020, you're going to have these like embarrassingly capable cognitive services, which can be made available to large numbers of people. I think sort of cloud providers and governments and others are going to be to work together to really characterize what can be just generically available for everyone and what needs some level of care and attention page where I'm getting to that's going to be incredibly unpleasant and difficult, but feels part of an evidence for me. But I guess just to be concrete, like if you created say like a GPT-3 that was much more powerful, you think that you would probably release it along with the detector would be the sort of compromised hour. I think you think about different ways that you can release it because like some people's use might find, some might want to have some sort of controls that you control the model with people access sort of services around it. That could be one way you do it. Another way could be just releasing fine-tuned versions of models on specific data set or specific areas because if you find tune of model, it's kind of like neural city party where you take this big blob of like capability, you put it on the new data set, it takes all the sort of constraints of that data set and in some sense you've restricted it so you can do things like that. I mean the challenge for a lot of developers creating boarders going to be in how to deal with the route like artifacts themselves, like models themselves, like here's a thing I think about quite regularly is it's not today, it's not next year, it's probably not even 2022, but definitely by like 2025 we're going to have conditional video models like someone in the AI community or some group of people are going to develop research that allows us to generate a video of a run some period of time, a few seconds probably probably not like minutes, but they can guide it, serve it in includes specific people and they do specific things and maybe you will still get audio as well, that capability is obviously something but it's like with much harder case of just language model or just an image model, I think with that capability definitely gets like quite a few controls applied to it and needs systems done for like more vegetation of real content on one of the public internet as well, like it promotes questions about that. Yeah, I think we're heading into a weird era for all of this stuff, I think the advances you get are releasing all of this stuff, just sort of publicly in the internet are pretty huge but I also think for that's like to some degree at derelicts with the UT by the AI community to not think about the implications of where we are in three or five years because I have quite a high confidence that we'll be can't be in the state of my fans with a norm as for like, put everything online instantly because I think I think we're just developing as for the frankly like two people by we, I mean the AI research is a bit large but usually I'll do that and say this is fine. Do you think what you're doing? I need to ask, I would ask you about this, like what's sort of issue you like, what is the responsibility of sort of technologists and how do we get to a more responsible place and is that even necessary? And then you could ask me another question about, like how old are you? I don't know, it's funny, I feel like I really want to, what are the rights to change my mind on some of this stuff, like I feel like I think I've kind of reluctant to say things publicly because I, you know, it seems like actually the ethics really depend on sort of the specifics of how the technology works and stuff and I think like, you know, I think like on GPT2 is like for just as an example it seemed like you know, I thought opening as decision was intriguing and like different than I think what I would have done or what my instincts would have been but it was kind of like provocative to say hey we're not going to you know, release this model and I think the good thing about it maybe was it kind of got everyone kind of like talking and thinking about it. I guess also another thing that I don't really have a strong point of view on but just seems like little interesting is it seems like every it seems like at the moment every AI researcher has sort of asked to be like their own kind of ethicist. You know on this stuff like I see like a lot of like ethics documents coming out with you know like even like open source, you know ML projects will sort of have like their their code of conduct and on one hand it seems a little um it seems a little almost like high fluid to me like I feel like I have this instinct of like come on like you know like you know should I like put out like as kind of ethics with like you know like the toaster that I sell or you know it seems a little there's something seems a little like unappealing about it but I actually also definitely take a look to the side of it that if you think I guess to me like it's less like the the power of an individual and more of just like sort of like a technology and kind of compound and like you know run a mock then you know maybe it's the case that you know people really should be thinking about it but yeah I honestly like I don't know and I don't even know I guess I'm curious what you think about this because you're like in this all the time do you think that AI researchers are in the best position to decide this stuff I mean if it really affects society it's profoundly as you're saying it seems like kind of everyone should get us say about how this stuff works right yeah so this is unfair right what what's actually happening here is an unfair thing for AI researchers which is for fair building powerful technologies they're released from into a world but doesn't have any real ownership technology governance because it hasn't really been developed yet and they're released from inter systems but we'll use the technologies to do great amounts of goods and maybe a smaller amount of power and so the challenge is like well shit I didn't sign up of this like I wanted to do AI research I didn't want to do like AI research plus the side effects and geopolitics that's also not the expertise I think that's a very reasonable point unfortunately there isn't like another crack team of people hiding behind some wall to entirely shoulder the burden of it's there are emphasis and social scientists and philosophers members of a public government all of them have thoughts about this and should be involved but I think the way the UAI researchers is they're making stuff that's kind of important they should be themselves as being analogous to engineers of like the people who build buildings and make sure bridges don't fall over you have a notion of ethics chemists who have a notion of ethics because chemists get trained how to make box and see kind of want your chemists to have a strong ethical compass that of most of them don't make explosives because until you have a really really resilient and stable society you don't want lots of people able to be really super absolute know ethical grounding because they might have to experiment and lead to liberal arts or you know people like lawyers who have codes of problem-lapse and they're very strange to look at AI research and sort of more broadly computer science and see a relative lack of this when you see it in other disciplines that are as impactful or maybe even less impactful on our current work. I don't think any AI researchers get a sole of this on their own but I think for the culture of culpability of thinking but actually to some extent I am like a little responsible here not not a lot it's not my entire problem but I have some responsibility is good because how you get to set a change is you know millions of people making very small decisions of their lives it's not like millions of people making huge or pop-triple decisions because that doesn't happen at scale but millions of people making like slight filters is how you get massive change over time and I think that's probably what we need here. Hey we'd love to take a moment to tell you guys about weeks and biases. Weak devices is a tool that helps you track and visualize every detail of your machine learning models. We help you debug your machine learning models in real time. Collaborate easily and advance the state of the art in machine learning. You can integrate written biases into your models for just a few lines of code. With hyperparameter speeds you can find the best set of hyperparameters for your models automatically. You can also track and compare how many GPU resources your models are using. With one line of code you can visualize model predictions in form of images, videos, audio, plot recharge, molecular data, segmentation maps and 3D point challenge. You can save everything you need to reproduce your models days, weeks or even months after training. Finally, with reports you can make your models come alive. Reports are like blog posts in which your readers can interact with your model metrics and predictions. Reports serve as a centralized repository of metrics, predictions, hyperparameter stride and accompanying nodes. All of this together gives you a bird-side view of your machine learning workflow. You can use reports to share your model insights, keep your team on the same page and collaborate effectively remotely. I'll leave a link in the show notes below to help you get started. And now let's get back to that episode. Let me ask you another easy question. What do you think about military applications of AI? I think the well, the military applications they are on special in the sense of its technology that's going to be used in general and different domains. So it'll get used in military applications. I mostly don't like it because of sort of what I think of as an AK47 problem. So the AK47 was a technological innovation to make this type of rifle like more repeatable, more maintainable and easy as a user by people who had much less knowledge of weaponry than many price systems. You develop this system, it goes everywhere, it makes the act of like taking life or carrying out war cheaper and more repeatable. It's massively cheaper and much more repeatable. And so we see a rise in conflict and we also see that this artifact is technical art about to some extent like drives conflict. It doesn't create the conditions for conflict, but it gets injected into the urban and it worsens up because it's cheap and it works. And I think that AI, if applied sort of wrongly or rationally in a military context, does a lot of this. It makes certain things cheaper, certain things more repeatable and seems really, really bad. I think AI for military awareness is much more of a kind of gray area like lots of some ways in which unsteady peace sort of holds in the world is by different sides, you're a war of each other, having lots of awareness of each other, awareness of troop movement, distributions, what you're doing and they use surveillance technologies to do this. And I think you can make a credible argument with the advances in computer vision that we're seeing, that's being applied massively widely. If adopted at scale by lots of militaries at the same time, which is kind of what seems to be happening, may provide some some diminishment, some of a certain type of conflict because it means there's generally more awareness. I think stuff like the moral question of lethal autonomous weapons is really, really challenging because we wanted to be a moral question, but it's ultimately going to be an economic question. Like, it's going to be a question that the government make decisions about on the motivation of economic speed of decision and what it does to strategic advantage, which means it's really hard to reason about because leaving you or I make these decisions. And actually, accommodate with a radically different frame, probably if like a strong intuitive push against from any existing, but I've not afraid of these people. Right. Right. Let's do it all together. What else you can go? Actually, okay, this is maybe like a less loaded question, but I'm kind of, I'm actually genuinely curious about this. So you recently put out this paper, I think it's called towards trustworthy AI development. And I thought the, as someone who builds a system that does a lot of saving of experiments and models and things like that. I thought it was really intriguing that you picked as like the subtitle mechanisms for supporting verifiable claims. So it seems like you draw this incredibly bright, like direct line between, you know, trust through the AI development and supporting verifiable claims. And I was wondering if you could sort of tell me why that, that is so connected? Well, it's really easy for us to say things that have a moral or an ethical kind of value and in words, committed or goodization to something, like we value, the safety of our systems and we value them not making, you know, bias decisions or what have you. But that's an aspiration. And it's very similar to a politician on the election campaign trail being like, well, if you elect me, I would do such and such for you. I give you a risk money or I like how build this thing. But it's not very very vital, like you're sort of leading to believe the organization or believe the politician and they can't get much proof to you. Because AI is going to be really, really significant in society that's going to play an increasingly large role. People are going to have a project with slightly more skepticism, just as they do with anything else at their life, but plays a large role and has a backs on them. And they're going to want systems of recall systems of diagnosis, systems of sort of awareness about it. Now, today, for most of this, we just pull back on people, we pull back on like the court system. You know, as a way to like ensure stuff's bearer viable, we have these mechanisms of a law that means that if I as a company make a certain plane, you know, especially what has a judiciary component, the sort of validation of that plane comes below to stuff around my company and the ability to verify it comes from action and also legal resources of our notary standards of stuff like that. But I guess like what, but just to be for like, like, like, just because some people will not have the paper listening to this. So like, when you say, like, supporting verifiable claims, like, what's an example of like a claim that you might want to verify that be relevant to trust for the eye development. A claim you might want to verify is that say our system is, we feel that we've like identified sort of many of them being biases in our system and have labeled it as such. However, you know, we want the world sort of validate the process of lackspice and the critical area. So we're going to use a mechanism of bias bounty to get people to compete to try and find bias traits in our system. And so there you've got a thing, you'll make you claim about it. I believe that's, you know, relatively unbiased or I've taken steps to log for bias in it. But then you're introducing an additional thing which is a sort of transparent mechanism, the problem people to go to poke holes in your system and find biases in it. And that's going to make your claim more verifiable over time. And if it turns out that your system had some like huge shaping, you haven't spotted. Well, at least a mechanism of how to identify some of your very rich-raising there, similarly, we think about the creation of like third party, auditing, organisation thrive. So if you could have an additional step, you could have, I have a system making some claim about bias putting a bias bounty out there so I have more people like hitting my system. But if I'm being deployed in a critical area and what I mean by critical is, you know, a system that makes decisions for the effects of one's financial line. So, you know, any of these areas for policy makers really, really care about, then they can say, okay, my system will be audited by a third party when it gets used in these areas. And so now like, I'm really not asking you to believe me, I'm asking you to believe like a result of my public bounty and a result of this per-part yard of the time. And I think we'll all of this stuff kind of stacks on itself and gives us the ability to have to have some trust in systems. Other things might be, I will just, you know, I will make claim about how I value privacy, but the mechanism by which I will be trading my models and aggregating data will be using solid encrypted machine learning techniques. So, there I've got this claim, you can really verify it because I have an all-estable system that shows you how I'm sort of preserving your privacy while finding it related to your data. And so, the idea of this report is basically produce a load of mechanisms, but we in a bunch of other organizations and people think are quite good. And then the goal of the next year or two is to have organizations who are involved in the report and others who weren't, it remains to be mechanisms and try them out and will be trying to do this with one of these to come up with. Oh, cool. So, I can join the red team too. Yeah, I'm really sorry. I'm really sorry. I think like so obviously having a, we recommend to share the red team that takes a little bit of unpacking because obviously if you're too proprietary companies, your red team can't share lots and lots of information about your proprietary products, but they can share the methods they use to like red team AI systems and they can standardize on some of those sort of best practices. That kind of thing feels really, really useful because eventually you're going to want to make claims when you're red team the system. And it's going to be easier to make a trust-worthy claim if you use a kind of industry standard set-up techniques that are well documented and many have done, but if you just thought of power lawyers and doing this up. So, yeah, please join the red team. We want lots of people like some shared red team and the stuff to revenge with you. But the red team of structures actually, it seems like the way you describe it and I'm sure this comes from security, but I just, I'm not super familiar with the field. It's like you have someone like internal to your organization, right? Like you, we have an internal team that tries to break or trust to find problems with it. You have that and then you're seeking to find ways to have your internal team share insights with other people at other organizations. Now, they can't say here's the proprietary system I broke and what I did. But they can say, when I like sit down and properly knockles and try like red team and ML system, and here are the approaches I use and here's lots of back to it. We, it's not in red team, but we have actually done a little bit of this, although from the I where in a GPT2 research paper we've grown to buy some of the ways we tried to probe for multiple biases, because we think for business and area that's generally useful to especially useful to get standard on. And then since then, we have just been emailing our methodology to like lots of people at other companies. These people can't tell us about the models of their testing device, but they can look at the like drones we're suggesting and tell us if they see themselves. And so that shows you how we're like able to develop some shared knowledge without breaking sort of breakthroughs. Interesting. What I think I kept kind of thinking is as I was reading your paper is I use all kinds of technology that I don't think has made verifiable claims. I mean, I feel like I rely on all kinds of things to work. And maybe they're making claims, but I'm certainly not aware. I sort of assume that internet security works, I assume that you know, I now have like all these things plug into my home network that could, yeah, but I just sort of what do you think that it's sort of seem like these might be just sort of best practices for developing any kind of technology or do you think there's something like really AI specific within it? And where would you even draw the line where you would sort of call something AI that sort of needs this kind of treatment? I think some of it comes down to the subject. So when you draw a line, I think AI stuff is basically when you cross to a technical and G-book can easily be sort of audited analyzed and have the scope of it behavior defined to a technological G-ware. You can somewhat audit and analyze it and sort of list out where it'll do well, but you can't fully define it's scope. And I think for a lot of like just sort of once you're training your on that you have this like big like probabilistic system but will mostly do certain things, but it actually has a surface area that's inherently hard to characterize fully. It's very very very difficult to like fully listed out and mostly it doesn't make a huge amount of sense to you because only a sort of subset of the area of the surface area of your system is actually going to be used at any one time. So it does have some kind of difference or you know, bias boundaries, right, is it kind of weird thing? It's sort of equivalent to saying all right, before we elect this like mayor or before we appoint this person to an administrative position, we want a load of people to ask them a ton of different questions about quite a distract values that they may or may not have because they want to be a confident that they reflect the values that we'd like someone to have about position. That feels different actually, it feels a little different to like normal technologies. It will be observed through expect we get to a world where everyone verifies every play they make over time because you have the time, you know, I mostly give you my life depending on my own beliefs but other people are sticking to a rules of the game. But we all have some cases where we want to go in on something that's happening in our life and board it every single facet of this. And I think the way to think about why you'd be verifiable claims or abilities to make from quite broadly is as governments consider how to sort of govern technology and how to let technology do the most good while minimizing the harm. It's probably going to come down to the ability to verify certain things in certain critical situations. So you'll kind of be able to build a little bit of stuff not all the majority of your life who put the really narrow edge places where this has to happen. The necessarily not need to be able to build quite general tools for verification and then try to apply to specific areas. It's interesting that a lot of the, it seems like there's been a lot of sort of complaining about AI research recently that a lot of the just the research claims, which are maybe not so loaded and not so applied and we don't interact with are actually not really verifiable. Yeah, I mean some of these things are just because there is a computer there is like a minority of organisations with a larger amount of compute. There is a majority of organisations and a huge swarpe of academia, if not all of academia, but has very real computational limits and this means that at a really high level you can't really validate claims made by a subset of industry because they're really experimenting at scales which you can't hope to be. So some of this is about what a really general tools we can create just results from these kind of asymmetry of information. Because some of the issues of verifiableity or less about your ability to verify specific figure about moments, it's more about having enough kind of cultural understanding of where the other person is coming from that you kind of understand what they're saying and what premise behind it and trust them, which is less you demanding a certain type of verification for being like, okay well you're a complete alien to be a couple of another cultural context or another political ideology. However we have this sort of strong shared understanding of this one being that you're trying to get you to believe you about. And right now if like certain organisations wanted to motivate academia to do a certain type of research, it would depend on icon from this like big compute previsland and I'm asking you to hear me when I list out a concern but only really makes sense that you've done like the experimentation of my scale because that's calibrated by intuitions. So we need to find ways to give these people the ability to have a saving conversation so that you can sort of improve that stuff as well. So are you going to give them a ton of compute like what's your what's his relationship? Well we basically specifically recommend to they but governments fund cloud computing which is a bit different to it's it's a bit one key right but what one thing you need to bear in mind is that today a lot of the way academic funding works to sort of send to the usually on the notion of having some bit of hardware or capital equipment that you're buying and as we know like that stuff to appreciate space of in cars it's like the worst things by pure research rate and academic institution you've been much better placed by like the cloud computing sort of credit or system but let's you access a variety of different clouds. We're generally when we go and work with governments pushing this idea that they should fund some kind of credit that back sold into a bunch of different cloud systems because you don't want the government saying all right all of America is going to run on like Amazon's cloud but it's obviously like a bad idea but you can probably create a credit which backs sold into the infrastructures of like fibrils, sex, large cloud entities and deal with the competitive things that are in the pathway and I think this is surprising me tractable it's like some some policy ideas are relatively simple because they don't need to be anymore complicated and so we're kind of lobbying the lack of their word governments to do this. I think the other things very mind is that lots of governments the because they've invested in super computers really want to use super computers as their computer solution for academia and that mostly doesn't work. You actually mostly need a double super form of hardware for most forms of experimentation so you're also saying to governments like I know you spend all of this money on this super computer and it's wonderful and yeah it's great at simulation you can all happen to well we love that you don't need a permiss stop trying to use it for a bit like excuse me so that tools are going to come to me. Nice I think I'm not in Canada that's an interesting uh... what if you like the US right you're like we've spent untold billions on like having the winner of the top 500 list and we're in some pitched geopolitical war with China but of course we want to use this for AI and you're like yeah do you but like some people just want like an 880 user actually most people are buying with that so you and this thing is not like easy to like more equips and sample out to people can raise like AWS or Microsoft or whatever or interesting well we so for a little bit running out of time and I asked you I'm curious but we always end with two questions um... particularly just in in your point of view and this so yeah the first one I mean and you actually you've really viewed a lot of things going on today I mean from your revenge point of open air and then also the newsletter that you put out so what would you say is like the topic that people don't pay enough attention to like the thing that like you know it just matters so much more than people compared to how much people look at it I think the thing that no one looks at with really matters is that both is in just a very niche policy computer vision which is a problem of re-identification of an object or a person that you've seen previously what I mean is that our ability to do pedestrian re-identification now is is improving significantly it's stacked on all of these image net sort of innovations it's stacked on all of those needs to be rapid like feature extraction video feeds it's stacked on like a load of just interesting components innovations and it's creating this stream of technologies but we'll need to really really cheap surveillance but eventually is deployable on edge systems like drones or whatever by anyone and I think what we're kind of massively underestimating the effects of that capability because it's not that interesting it's not an advanced it doesn't even require like massively complex like reinforcements learning or any of these things that research has like spent time on it's just a sort of basic component but that is the component that supports surveillance states and authoritarianism and that is the component that can make it very easy for and otherwise sort of liberal and government to slip into a whole of surveillance and control that no one would really want to happen and I I'm actually thinking about yeah can I write like a survey or something about this because it's not helpful for someone like open AI to warn of this and it's sort of a wrong message it's maybe okay me to write stars it may be my use that's what I do but I sort of think about writes me an essay like because anyone noticed this is like I got all of the scores right I look at all the graphs and stuff it comes big like over a bit it's all very up like it's a very hockey stick it's looking cheap yeah so that's a very cheerful when I think it's a bulletin yeah wow great answer as expected all right here's a second question which we always ask and you're normally we're talking to kind of more industry practitioners but maybe you can apply this to open AI so when you look at the DML projects that you've witnessed and like open AI's actually had some really spectacular ones what's the like what's the part of sort of like conception to to complete that looks the hardest to hear or maybe the most unexpectedly difficult piece of it like sort of watching you know like solving Dota or being the most even a Dota or like GP2 like what like where do things get stuck and and why good question um I think for for a maybe two parts where projects get like stuff or have interesting traits one is just data like I used to really want data to not add us so much and maybe just look at it and realize that you know whenever it's like Dota and how you ingest data from like a game engine there or robotics and how you choose to do like domain randomization in simulation or or supervising that in a way you're just figuring out what data sets I have and what mixing proportion do I give them to each trading and how many brands do I do that just seems very hard I think others have talked about this it's not really a well documented science it's something that many people choose with intuition and just seems like it's easy to get stuck and then the other is testing what's I have a system how well can I characterize it and and what sort of tests can I use from the existing research literature and what tests do I need to sort of build myself like we we spend a lot of time bigger new evaluations I hope I may I because for some systems you want to do a form of e-bowd it doesn't yet exist to characterize the form of some domain and figuring out how to test or a performance trait that may not even be present in a system is really hard it's really a difficult question so most of you can see my areas yeah okay I can't help myself actually as you're talking I I thought of like one more question that I'm sorry to do but I've wanted so like I feel like the people that I know are that I've like watched closely at OpenEv and actually spectacularly successful and and like you know they've been part of projects that have really seemed to be have succeeded like the the robot hand doing the Rubik's Cube and and and Dota are there like a whole bunch of products that or projects that we don't see that have just totally failed I don't know if you remember the universe that was sort of a failure we tried to like we tried to build a system which was kind of like opening it in but the environments would be every flash of each deval game that have been published on the internet and yeah so that fails right that failed because of network a sin-pronicity and so basically you ended up having because we were sandboxing the things in the browser and you had a separate game engine that you'd be going to talk about the network to them RL actually isn't really robust enough to that level of like time jitter to do useful stuff so that kind of didn't work and so we have some public failures which are because it's probably helpful yeah we have some some kind of private ones a lot of it is you know some people just spend a year or two on an idea but then it's not working out some people and I wrote a name project that's public but they came up with a simple thing but worked really well and they spent six months trying to come up with what was a research report was the board just been more like better approach to it and the simple thing was work to all of the other things they tried to eventually publish the system but like a simple thing and I don't like yeah it works but I don't I would much rather let my public study it works but we don't like I'll pick that's like a hand or don't to roll a GPT those intended to go okay the most usually because they come from a place of iteration like do to came from prior work applying pbo and I think evolutionary algorithms to our systems for hand came from prior work from just like block rotation so once you can do block rotation you can do a remix GPT came from prior work on skating up language models just of GPT more so a lot of it just happened sort of iteratively in the public to know yeah we don't have an abnormal lack of failure nor an abnormal amount of success I think I think it's pretty easy distribution awesome well thanks ever so much

87.16284

38.50283

3y ago

Dec 28 '22 14:51

28xigqz6

Finished

Dec 28 '22 14:51

2343.696000

/content/dominik-moritz-building-intuitive-data-visualization-tools-bcttibpleg8.mp3

tiny

When we design Vika Light, we build it not just as a language that can be authored by people, but actually as a language where we can automatically generate visual sessions. And I think that's also what distinguishes it from other languages such as D3 or GG-plotting because we're in JSON, it is very easy to programmatically generate visual sessions. You're listening to Grading Descent. Today we have Lavanya with us who has been watching all the interviews in the background, but we wanted to get her in their asking questions. And we're talking to Dominique, who's one of the authors of Vika Light, and we got excited to talk to him because we've been using Vika in our product, and we recently released it. But it's all just a huge problem for us, where we want to let our users have complete control over the graphs in a language that makes sense. And then we discovered Vika, and it was a perfect solution to the problem that we had. And then we talked to Dominique, and he had so many interesting ideas about the way machine learning should be visualized. I mean, didn't even realize he came from a visualization background. So we have a ton of questions to ask him today. Super excited. Can't wait. I think the main thing, or you know, you've done a bunch of impressive stuff, for the thing that is like, you know, most exciting for us is that we were in the authors of Vika Light. And so I kind of thought maybe the best place to start for some people who don't know even what Vika is is just sort of described what Vika's and what the goals are, and then how Vika Light works within that context. Yeah. So the way Vika came to be is that my advisor Jeff here, for Jeff, together with his graduate students, the Harvard and him created a declarative way to describe interactions, building on ideas from functional or active programming, which is a concept that's, that's been around quite well. And so they adopted this concept for visualizations to describe not just the visual encodings, but also the interactions fully declaratively. And so that then became, I think that was Vika version two at that point. Vika at that point was still fairly low level, and then in that you had to describe all the details of the visual encoding, as well as the axes and ledions and potential other configuration. So around the same time, my colleague Hamm who also worked on the first version of Vika and this reactive version of Vika, he was working on a visualization recommendation browser at that point then was called by ager. And I helped him with it and we needed a visualization language to do recommendation in. And so Hamm and Jeff talked about the need for high level visualization language, you can do a recommendation where you don't have to specify all the details, but really only what's essential, which is this mapping from data to visual properties. So I think they talked to the vis-conference and Paris, and on the flight back, Jeff had the first version of it, which then I think they could still, what we were building on today. That's awesome. Sorry, before you go too far down this path, I'm going to ask all the dumb questions that I feel embarrassed to ask. I mean, I feel like I've heard declarative language for visualization, like many, many times, and I always kind of nod, but like, what does declarative really mean? Like what would be like another way that would be like a non declarative way to describe a visualization? Yeah. The biggest distinction between declarative and on the other side is imperative. Is that in a declarative language you describe what you want, not how you want an algorithm to execute steps to get to where you want to go. Good examples of that are HTML and CSS. We describe what the layout of the page should be, but you don't tell the layout engine to move something by a couple of pixels, and then move again by a couple of pixels. Another good example of declarative language is SQL, which is the database query language. People use to query databases for both analytics or, or, let's say, banking system for instance. And in these declarative queries you describe what you want the result to be. So you say, I want from this table the two bolts or the rows that have these properties. And you don't describe how you're going to get that. And that's, as opposed to an imperative algorithm, where you would have to write the search, you would know how the data is stored in what format, whether it's maybe even distributed on multiple machines or not. In a declarative language you only describe what you want. And then that could run on a small database that's embedded, where it could run on a cluster of 1000 machines. And you shouldn't have to worry. And so for visualization, that means you shouldn't have to worry about how the visualization is drawn, how you draw like a pixel here, a rectangle here or a line there. No, you just want to say, I make a chart that encodes these variables. So I guess how declarative is it? Is it like, you know, and I have these very good fair amount, but I think people that are listening or watching may not have, right? So like, as opposed to most declarative thing, might be like, sort of give me an insight about these variables. Or like, just compare these variables, right? But that might be unsatisfying, right? Like what level are we describing the semantics of what we're doing versus saying, like, hey, you know, give me these three pixels here. Do you say exactly the type of plot that you want? Or is that inferred? Like, how does all that? How do you think about all that? Yeah, the way we built on this concept, called the grammar of graphics. And that is really cool concept. That a lot of languages like, even the three have built on. And the core idea is that visualization is not just a particular type, so it's not just a horizontal bar chart or a bubble chart or a radar plot. But instead, visualization is described as a combination of basic building blocks, kind of like in language, we have words that we combine using rules, which is a grammar. And so the words in the grammar of graphics are two things. One is marks, and the other one is visual encodings. So a mark is, for instance, a bar or a line or a point. And encoding is a mapping from data properties to visual properties of that mark. So for instance, a bar chart is a bar mark that maps some category to X and some continuous variable to Y. And that's how you describe a bar chart. And now I think what's cool about this is, if you want to change from a horizontal to a vertical bar chart, or some, some probably like a column or a vote chart, you don't have to change the type. You just swap the channels in the encoding. Ever question, too. We see so many really messed up charts that people make, because people get too excited, especially when they work with a really powerful visualization tool. And if you like, you've spent so much of your life designing really good grammar for visualizations and designing a lot of political plots. So what's your recommendation for people for best practices for designing these visualizations? I think it is actually making mistakes. It is trying it out and seeing how difficult is it or how is it, is it to read data in a particular chart? But before you actually go out and publish that chart and show it to the world, maybe think about what can I remove from this chart? I think a visualization is really showing what you want to show when it's showing the central of the data. Very important in any visualization design is following two basic principles. These are often called effectiveness and expressiveness. It goes back to some work from drug making, we developed an automated system to follow these rules. So these two rules are kind of oddly named, but essentially what they ball down to is, first, expressiveness, means that visualization should show all the effects in the data, but not more than that. So what that means is that visualization shouldn't imply something about the data that doesn't exist in the data. And then effectiveness means make a visualization that is easily perceivable as possible. And what that one rule that you can apply there is to use the most effective channels first. And the most effective channels are x and y, or there are like length and positions. They are the best. And then afterwards it's like color and size and some other things. So that's why bar charts get a plot, line charts, or so popular, or it's so effective because they are using those very effective channels first. But this also sometimes you have to go beyond effectiveness. I always wonder, like is there any room for like fun or like novelty in a good visualization? Yeah, that's a good question. I like to actually think back to a paper from Tukian Wolk there. They've written like the 60's, one of the famous paper of sub about exploratory analysis and statistics. And they talked about the relationship of statistics to visualizations. And one, so the paper is full of amazing, amazing quotes. It's kind of amazing to read this today because almost everything is still true today. But one of the things they say there is that it's not necessarily important to invent and message new visualizations. But think about how we can take the visualizations that we have, or the essential of visualizations. And combine them in new ways to fit new opportunities. And so I think there's a lot of creativity and making visualizations even the simple ones, part of the line chart, scale plots. But combine them in meaningful ways. Also preach transforming the data in meaningful ways. And so there can be a lot of creativity in there. Do you have a favorite visualization that you think is maybe underused or that you'd like to see more of? I think. Slope charts are kind of amazing. What's a slope chart? What's a slope chart? So naming charts, by the way, is an interesting concept. Do you think about a grammar and the concept of naming charts is kind of odd? Yeah, totally. I'm going to reveal a secret, but it's someone I want to write, like, a system that automatically names a chart. Or the other way around, give it a name and it tells you what the specification is. Okay, but going back to Slope charts, a slope chart is, imagine you have two categories of variables, let's say two years. And you have data for those years. And now what you could do is plot the, as a scatter plot. So on X, you have the years and on why you have some some numeric little measure. You should then draw different categories that exist in both years as colored points. It's hard to see actually trends between those things between those different years. But if instead you just draw a line between them. Trends or changes, like they just jump out to you. And then I think it's great. So wherever you have category of data and like this, these by-partite graph and just drawing a line instead of drawing points there is great. It's called the Slope chart. That's one name. One in the Vega Lite gallery. Yeah, I'll have to link to that. So I guess where does, like, where do you think about the line between, like, Vega Lite and Vega? Is it always, like, super clear, like, what belongs where? I would think of the clear, I mean, they're both in a sense, right? The clearative language for charts, right? One sort of just higher level and one's like lower level. So where do you draw the line? So maybe before we go there, one important thing to keep in mind is that Vega and Vega Lite added something to the graph. Vega Lite in particular added, for instance, support for interactions. So something that my colleague Hammond, Arbond and I worked together on, where we added some other kind of words or language constructs that you can add to make charts interactive. And we also add composition. And so these are high level concepts which then actually compile from Vega Lite to the low level Vega into, in this case, layouts and signals, which are these functional reactive concepts that Vega has. And so I think that helps me also a little bit understand the difference of where does what go. And what is sorry, composition before I drop that composition is being able to layer charts or, or concatenate charts. And we also have a concept called repeat which is a convenient concatenation. And then facetting, facetting is another word for is is trellis. It's a way to break down a chart by a categorical variable. So for instance, if you have data for different countries, you can then draw one histogram for each country or one scatterplot for each country. Facetting charts also great. Often facetting is a very powerful way if you have an additional categorical variable. So you show you data. So is this like where you make sorry like a whole array or like a matrix of charts? That's what I'm picturing with the facetting chart. Yeah, so if you've read the process, it's like a great of charts. Yeah. I say, okay. Yeah. Cool. Yeah. So that's facetting. Okay, so you also that's composition. And then we talked about Vega Vega Lite. Yeah. I think the biggest difference really between Vega and Vega Lite is the abstraction level. Vega Lite compounds to Vega. So anything that's possible in Vega Lite is also possible in Vega because of that. But it requires about one or two orders of magnitude more code in most cases. So that's one big difference. And how do we achieve that? Well, one, we have higher level mark types in Vega Lite. So for instance, Vega only has a rectangle. And it has for more. But Vega has rectangles. Vega Lite actually has bars as a concept. And so if you have that, you can have some defaults associated with that high level mark type, which you then don't have to manually specify in Vega Lite. You don't have to specify because it gets instantiate in Vega automatically. And then the other is sensible defaults or smart defaults. Essentially, you don't have to specify an access or make one for you. If you use the X or Y coding, I've used color will make a legend for you. If you use faceting, we'll make a header for you. And just kind of like an access. In Vega, you have to specify all the details of those marks or of those elements. That's try elements. You can start over at the defaults in Vega Lite. But by default, we'll do something. And that's really what Vega Lite is. It's a high level language and a compiler that compiles from high level specification to low level Vega specification. Right now, we don't have a way to easily extend the high level concepts we haven't Vega. And so in Vega Lite, we do have a little bit of an extension mechanism where you can add mark macros. So for instance, box plots in Vega Lite are just a macro, which actually compiles to erecting or the line and the little ticks at the end. And there's a bunch of other things that are just macros. And so one could actually build a language on top of Vega Lite. And people have done that. I'll tear for instance, it is a Python wrapper or Python syntax. Python API for generating JSON, Vega Lite JSON specifications. And there's other ones in L and R and then somebody made one in Rust and just one JavaScript. Oh, in Julia. Yes, that's one in Julia as well. I guess the our comment maybe wonder if you have any comments on GG plot. I feel like that's often like a below plotting library was that an inspiration for Vega at all or Did you have reactions to it? So a GG plot came out a lot a lot a long time before Vega and Vega Lite. And it also builds on the grm graphics. As a time really was the prime example for an implementation of the grm of graphics in any programming which really it uses slightly different terminology from Vega Vega Lite. Did you plot has definitely been a great inspiration and we might be going to say we so have our VengeF and I have talked to Hiddlywick and before. Big fans of it. We actually considered using it for Voyager but because Voyager was was easier built as a web application. Interfacing from a Voyager creation to our would have been a lot more overhead than building something. There's a sensation every. Totally maybe switching gears a little bit one thing I thought was interesting about your your back art interest is it's also machine learning and. I thought that was like pretty interesting and cool. I wonder if like machine learning has like informed your thoughts about well first if it's informed your thoughts about visualizations at all. And then I'd love to hear about if you have suggestions of kind of visualizations that you think are helpful in the you know machine learning process. Mm-hmm. Yeah I think visualization and machine learning are really good fits for each other and so. I can think of two things I would talk about but where visualizations useful for for machine learning and where machine learning is useful for visualization. Totally yeah. Maybe let's start with why visualization for machine learning. I think one of the most you can disagree with me they actually want to one of the most important things in machine learning is data. It's not it's not. I think few people would visit or disagree. Okay so because data is so okay we can agree that data is essential to machine learning if you have bad data. Your model is not going to do anything anything good you can still create a bad model with good data but a good data is essential for a good model. And so understanding that data that becomes part of your model or it's used to train the model is really essential and I think visualization is really powerful way there to understand what's what's in your data. It's happening there especially in conjunction with more formal statistics but formal statistics are only good when you know what you're really looking for. When you're still trying to look around what's in the state of what might be problems with the data that's when visualization really really shines. And you actually built a library to help with the exploration of data. Yeah so Voyager and Voyager 2 and some other follow up work from there was. Or is a visualization recommendation browser. So the idea there is that rather than having to manually create all the visualizations and go. Still through this this process of deciding which encoding so I want to use and which marketer one I use just let you browse. And still be able to steer the recommendations so the recommendations should be I shouldn't go too far from where you are. They should still be close to what you've looked at before but they should take away some of that tedium of how to do manually specify all the choice and recommendations great for two things. Because it makes visualizations less tedious and also it can encourage best practices for instance good statistical practice. A good practice data analysis practice is to look at the universe summaries when you start looking at a dataset. So what are the distributions of each of my fields each of my dimensions and doing that before looking into correlations between dimensions. And this is often difficult if you look at you if you start looking at one field and you're like ah there's something interesting now I want to know how this correlates with this other bits and then you are on attention. You're off on attention. And so by by forcing you or by offering you a gallery of all the dimensions. And all the universe summaries at first. Makes a lot easier to follow that best practice of looking at all the universe summaries first. Can you do this at scale like bullet kill to millions of rows and how do you even begin if your dataset is that big to find patterns in it and how does the surface kill to. Yeah so the software is built as a. It was a research prototype that is built as a browser application where all the data has to fit into the browser so currently does not scale. But the interesting thing about is that the number of rows shouldn't really matter too much as long as we can as long as we can visualize it just. We could probably have a whole episode about that. The number of rows shouldn't matter in what sense like it seems like it would make it more complicated to to visualize. I mean it doesn't make the visualization necessarily itself harder but it seems like actually like scanning through all of them right. Start to get in practical. Yes it's like most there's two issues one is a computational issue of just transforming that data and then rendering it. It's just can't represent the data in a way that is not overwhelming to the viewer. But assuming we can do that for like a couple thousands of data points or tens of thousands or hundreds of thousands of data points. If you have many dimensions the recommendation aspect gets a lot more difficult because now you have to think about how do I represent all these dimensions. So I think that's the way that we can do that. So that's the visualization for machine learning and then during the other way around machine learning for visualization is something that is a very complicated and when we design. So it's just as a language that can be authored by people. But actually it's a language where we can automatically generate visualizations. And I think that's also what distinguishes it from other languages such as D3 or GGpladinar because we're in JSON. It is very easy to programmatically generate visualizations. Then we built a recommendation system on top of it. The visualization language that is declarative and in a language that is easily generated. We could think about ways to automatically generate visualizations from programs or from models. And so one of those models is a model called Draco. My cousin I've been working on together where we encoded design best practices as a formal model. And then we can automatically apply those best practices to recommend visualizations. And so that can go beyond what I've talked about in Voyager where we recommend this gallery of visualizations because you can consider a lot more aspects of both the data or the visualization or the tasks that the user wants to do or the context that they're in or the device that they're looking at edit on. It's for a keep one and ask actually I don't know how to how to fit this into the flow but I think one of the issues with visualizing data and machine learning, especially with a lot of the deep learning folks that we work with, is that the data often has it's not like the sort of three independent variables and dependent variable in the stats classes more like the data is like a image, you know, the data is like an audio file. And so I feel like just even visualizing the distributions gets in a wheeled. And it's also like a little unclear like what you would do with that. So like do you have thoughts about visualizing things where there's like a like a higher order structure like an image or a video or audio file or something like that. Like a tricky because a visualization is too dimensional to point something dimensional maybe can use color and size and everything according to what essentially can represent. And other dimension but after four or five or so it becomes overwhelming. So if you have a data set with thousands of dimensions, I think the way to do it now is to use dimension dimensionality reduction methods. So, So, T's and the you map PCA to reduce the numbers of dimensions to the essential in some way dimensions. Or create some kind of domain specific visualization. So in a way an image is a domain specific visualization that maps your long vectors of numbers to a matrix. Color encoding. So what do you think about all my Twitter feed is talking about model explainability and how that's still a very until problem. So what do you think are techniques that everyone should know about and how do you think the field is progressing. Do you think we're going to have interoperable models in five years and it's soon or the earlier networks never going to be explainable. I don't know if that's a good question. I think many people are trying to answer this been a trade off where people often made simpler models because they are more explainable and the more complex the model gets the harder they get to explain. So sometimes there's methods to similar to a dimensionality reduction I guess to reduce your complex model to your simpler model, which you can then explain. But none of those methods are fully really satisfying. Some of the techniques I've seen is use more inherently explainable models that are still complex. So for instance a good example of that is our gams, which are linear models of functions applied to every dimension. Why is that more explainable? Why is it more explainable? Because you can you can apply some techniques where you can understand. For instance the function that gets applied to every dimension individually. Where you can also then look at how do those dimensions or the functions applied to the dimensions, how do you how do those get combined in a just a linear linear function which is a lot easier to understand than some nonlinear combination of many many dimensions. But when you want to have the different dimensions interact with each other or allow for that. I guess maybe taking a step back. Can you kind of make this a little more concrete for someone who hasn't seen this before like what would be. What kind of functions would you be imagining and how would they be applied. If you want to predict the quantity of variable like some number. I'll say the use the sentence example the housing price the price of a house. Do you want to do that based on the dimensions available dimensions. Let's say the size of the square feet the number of bathrooms or the number of floors. So now what you can do is do a linear combination of the dimensions to get the price. So if you just take a linear combination. I could say multiply the square feet by I don't know 10 the number of floors by 20 the plot size by five and then get a number out of that is the housing price. So you can see the simple linear model where you essentially apply a weight to every every individual dimension. So now what a general this additive models do is that they apply a non. So you can be like a log function or any other complex can be as complete as you want. But because it's a function you can actually visualize it very easily just by looking at the value on x axis and the value after applying the function on the y axis. So if you then want to know what is the price of a particular house or the predicted price of a house. In each of these charts per dimension you just look up for my value. What is the corresponding value that goes into the sum and then just sum them up. I see so you can see exactly how much each thing contributed to your final score or your final prediction. Yeah and a very good example of if you want to actually play with that and try it out is at the system called gamut which is a research project at Microsoft research where they built a system for doing exactly exactly this task of understanding model that is a general one of those gamut models. And both being able to finance this compare two predictions between for two houses and understanding how much each dimension contributes to the predicted price and also make it very easy to compare what to look at the general model the whole model interest one view. And yes you don't have this you don't have the ability to have multiple dimensions affect your output but still these models work fairly well and are a lot more interpretable than a model that computes many many dimensions or encompass dimensions and everything up. Do you have thoughts on visualizations to help with understanding what's going on in much more complicated models like say you know like a convolutional network or fancier type of network. Yeah I think visualizations can actually help a different points and I think visualizations are only as powerful as or only as useful as. As the task that you designed them for so I think in general saying oh can you visualize this thing is impossible without a task so can you visualize x4 y so for instance I can one could visualize a model for the purpose of understanding the architecture. And so when you for instance have a complex you know network with many layers and many different complex functions that every and every layer might want to visualize it to see. What functions are being applied what parameters are being used and how big is each layer. And so there's a couple visualizations I think the the most popular ones is probably the one in intensive board which actually my colleague ham started when he was interning at Google. That's one. The do you mean the payload coordinates flat maybe or which visualization intensive work. In intensive word it's the the visualization of the graph. The data flow graph. It's this this kind of two views intensive word is the one where you look at your model outputs or your metrics and there's the one where you look at the model architecture and I'm talking about the model architecture one. So that can help you to for instance debug what's happening but it might not it doesn't help you at all to explain a particular prediction for instance. So for that you might use a different visualization that does like future visualizations or lets you inspect different layers and what's the attribution of in different layers. Cool we always end with two questions I want to make sure we have time for it and I think we should modify them slightly to focus on visualization so. Normally we ask like what's you know subfield of machine learning that people should pay more attention to which I'm curious your thoughts but maybe I'd also ask. A sort of subfield of kind of visualization that you think doesn't get as much attention as it deserves. I think for for machine learning I'm very excited that there's a lot more attention on understanding. What's happening in these models. I'm also a huge fan of more classical AI methods. Oh, I guess this is not machine learning anymore, but yeah I'm very excited about constraints all birds and. Whoa, we have not had that. Constraints I thought you're going to say like SVN just something with constraints all birds. No, no classical like AI. Not even learned. I thought that you said mel to do constraint satisfaction these days. I guess. I don't know. Well, you can use a mel now for learning indexes and databases. I don't know. Cool. I think classical AI methods are exciting because they allow you to describe a kind of a model way, a concept of theory in a very formal way and then automatically apply it. Very declarative, very declarative problem solving and describe problems in solving them. And these stories are amazingly fast today pretty excited about. In Visualization because it's it's a science where trying to explain what makes the visualization good. And there's been a lot of work on high level design of good visual sessions. So I talked about this these principles of effectiveness and expressiveness earlier. And there's no systems to automatically apply them and there's design best practices and there's books and people are teaching those in classes and so on. And then on a very low level perceptual level. There's some understanding of how do we perceive colors and shapes and gistalled of shapes and how do we see patterns. But we don't have a good understanding of how those low level insights on perception actually translate to those higher level design practices. And I think the two sides slowly are inching towards each other, but they're not best. They're like as far as other right now and kind of slowly inching towards each other. And what I'm excited for is kind of like the general relativity of how do these two actually combine like we need to unify theory there of how these two things relate. It's like we know it's high level it's kind of like relativity. And we know this is small like quarks things. We don't know how they relate to each other. And now the universe behaves you know how little particles behaves. But when you combine it, it doesn't work. And so I kind of, this crisis of physics has had for a while as well in visualization. Well, I'm really great answer. It's so evocative. I want to talk about that kind of thing. Normally we end with asking people really on behalf of our audience kind of with the biggest challenges are that you see in taking ML projects from sort of conception to deployed. Do you have thoughts there? I think one of the trickiest thing in deploying machine learning are metrics coming up with good meaningful metrics that you optimizing. To me machine learning is it's optimized like a function. But what is that function? And how do you make sure that that's actually a meaningful function? And also that it's going to be meaningful in the future because we know for many examples that issue our work to my geometric that metric becomes meaningless. So how do you ensure that a metric is meaningful right now and we'll be meaningful in the future and it's actually tracking what you care about it. It's a difficult question and I don't know whether there's going to be one answer. I don't think so. Train a model and a bunch of different optimization functions and forget what you want to do. Yeah. But I kind of want to specifically ask about what are the biggest challenges around machine learning interpretation and also when you're training models using visualizations to debug these models. Do you have any thought around that? As I said earlier, I think data is central for machine learning and so understanding data is crucial. And I don't know whether the methods and tools we have for general data analysis. How much they might have to be adjusted for machine learning? Does for instance Tableau or Voyager or all these tools that are designed for exploratory analysis of tableau data? Where did they fall short when it comes to machine learning? The guess was pointing out earlier that machine learning often has these high dimensional data images and sound. Can we design other representations? I don't even want to save visualizations, but just representations that help us see patterns in that data. Meaningful patterns. Meaningful for the task of training what or understanding what that I think is going to be an interesting question for visualization tool designers who like to work in machine learning space going forward in the future. I feel like one thing that everybody working at machine learning misallocates their time a little bit, including me is. I feel like you've always been too much time looking at aggregates statistics versus individual examples. Every time you look at individual examples, like I can't believe I missed this stupid thing that is breaking my model or making it worse. I wonder if the gap is like we have really good tools for aggregates statistics, but it's hard to quickly drill it to stuff, especially when your data sets get very large. I believe actually that we have, I totally agree that we have very good tools for looking at aggregates statistics. I think we also have reasonable tools for looking at individual examples. I think we can look at an image, that's okay, where at a row in a table, but I think where it gets really tricky is understanding the in between. So understanding the subgroups that exist and that is because there exists inches and possible subgroups in a data set. And if you have a million million rows, that's a lot of subgroups. I'm only very few of them are actually meaningful. So understanding which subgroups are behaving oddly or negatively affecting your model and looking at those. That is a challenge that I seek over and over again. I think this is probably not aggregate and not individual, but somewhere in between and where and where and where and why I want to look at that to me is where. I think that's a difficult one. All right, I think that's a nice note to end on. Thank you so much. That was really fun. Oh, yes. Thanks for listening to another episode of Great The Sent. Doing these interviews are a lot of fun, and it's especially fun for me when I can actually hear from the people that are listening to these episodes. So if you wouldn't mind leaving a comment and telling me what you think or starting a conversation, that would make me inspired to do more of these episodes. And also if you wouldn't mind liking it subscribing. I'd appreciate that a lot.

58.67574

39.94318

3y ago

Dec 28 '22 14:50

o0ar33s3

Finished

Dec 28 '22 14:50

2693.904000

/content/johannes-otterbach-unlocking-ml-for-traditional-companies-agq4zft2tuo.mp3

tiny

If you take those big models, you're running to the problem like you need already compute power, you need infrastructure, you need admin ops, you need a whole department to actually make use of those models. Not many people have that, right? Especially those companies that are more useful for you. You're listening to gradient descent, a show about machine learning in the real world, and I'm your host Lucas B. Well, today I'm talking to Johannes Aderbach. He was originally a quantum physicist and then went into machine learning and he's currently VP of machine learning at Moranthix Momentum. Moranthix is a really interesting company that develops all sorts of machine learning applications for customers, small and large, and then really deploys them into these real world systems and hands them off. So we've really get into real world applications of ML and factories and other places that tooling required to make machine learning work and also how to do smooth hand-offs so that customers are actually successful in that transition. This is a really interesting conversation and I hope you enjoy it. My first question for you is looking at your resume, your one in the long line of people that kind of move from physics into machine learning. I'd love to hear what that journey was like, studying, studying quantum physics and I think he worked on a little bit of quantum engineering or quantum computing right? And now you do machine learning. How did that happen? That's a great question. I think initially I was super excited about physics because physics I just saw something to understand in the world and I'm really like excited about understanding how things work and taking things apart to put it back together and it's always drawn me to physics rather than engineering and I was on track to just like do of career physics and then Alex and I came out and the image net challenge happened. Like holy crap there is something really cool happening and it's always fine to tell people like I did my PhD before an image net for the thing because that makes me really old but it's kind of an exciting time and so when I heard it I was like well I want to reconsider my career as a physicist anyway at that point and booked into what this Alex net was about and the image net challenge and this cover this whole field of data science and big data that was starting off at that time and that's a very natural transition for physicists because we're good at statistics, we're good at modeling, we like math and then I fell in love with this big data data science and since then I've been continuously driving at understanding the language of data and and Alex just like an expression of that of that language and that's why I fell in love with it and now I'm here and you did do some work in quantum computing is that right? Do you think the quantum computing has anything to apply to ML or do you think like ML has anything to apply to quantum computing? How do you think about that? This is a great question. I think it actually it's mutually beneficial and I see there will be a convergence of those two fields in a new future. There is four different quadrants that we can talk about. We have like classical and quantum in terms of engineering and in terms of data so you have quantum data and classical data and you have quantum algorithms and classical algorithms and so you can actually start to think in those four quadrants and I think that right now we see that a lot of effort is being put into using quantum algorithms to classical data and that I think is actually potentially the wrong way to think about we should always think about like quantum algorithms for quantum data and maybe classical algorithms for classical data and these cross fields are a little bit more complicated to solve and there I think is like cross fertilization is going to be happening and what is quantum data? Quantum data is essentially data that comes out of quantum states like I don't know how much how deep you are into quantum computing but typically in quantum computing we don't talk about definite outcomes in a way but we're describing systems by way functions which are and if we're speaking the square root of probabilities quote unquote like don't take this too serious and so what you get with this is essentially expressions through quantum data which it has a face and an amplitude and if you start measuring this you get a lot of complex numbers you get various different types of phenomena and those data typically take an exponential time to read out into classical states when you have a quantum state and you want to completely express that quantum state classical data we get an exponential overhead in storage but what's the situation in the real world where we'd have quantum data like I can imagine how these quantum computers produce that but when would I be collecting quantum data when you actually deal with quantum systems right if you want to start to understand molecules for example very deep interactions of molecular properties they are ruled by quantum rules and so if you want to simulate molecules you rather want to do it in quantum space in a classical space that's really the the way to go and so that's why modern or early stage today's quantum computers are more simulators of other quantum systems so you use these computers to simulate quantum systems that you want to study on a very very controlled sessions and then you deal with quantum data at point and are we actually able to simulate quantum systems in useful way because you know I have experience with kind of classical mechanics systems and the simulations seem to break down very quickly so I can only imagine that the the quantum simulations are much harder and and and probably harder to make accurate we are getting really good three cells and like a lot of quantum experimental physics is essentially doing that we have toy models that we use in order to validate our mathematical theories a good example is a field that I worked in back in the past which is quantum optics where we have a lot of laser fields and and and single atoms and we start to put them together in a certain fashion in these laser fields so that we can simulate materials that we really have hard time understanding like for example high temperatures of conductivity we have certain types of mathematical models statistical models that we think about like how these things can come across or can come about and then in order to study the effects of these models we use a very clean system that we have a high level of control for and try to simulate those mathematical models and see if those models then give rise to these phenomena that we see for example in these materials that have high temperature of our conductivity so we use the much simpler system to simulate a much more complex system in order to probe our understanding of the physical loss here in this case. Is there applications of ML to that I feel like we've talked on this and this show to some chemists in different fields and they've been sort of using ML maybe to approximate these kinds of interactions is that an interesting field here. I think that's an interesting field to me but actually I think I'm much more excited about a completely different avenue of applying ML to quantum systems. If you think about building a quantum computer you have a lot of different cubits. There's like the atomic units you have bits in the computer, a classical computer you have cubits in a quantum computer to use an address these cubits we have to very very meticulously control those cubits in order to really make them do what we want because you cannot just flip a switch from zero to one but you have to control everything between zero and one it's a very very analogous computer like an analog computer to a certain extent. In order to control these kind of systems I think here is where it comes into play because you can use for example reinforcement learning techniques to do optimal control of these quantum gates in order to facilitate those two cubit interactions or three cubit interactions now to get a high fidelity quantum computer and I think that might be the one of the early applications of ML to quantum systems and to quantum computers and I my firm belief is that we probably need like machine learning techniques modern machine engineers in order to scale quantum computers to decide that they are actually useful. Interesting I feel like I've met a number of people in machine learning that kind of feel like their refugees from quantum computing like they felt like it didn't really have a path to real world applications and kind of moved into machine learning and when I saw your resume I wondered if you were one of those people but it sounds like you're pretty optimistic about the future of quantum computing. Yeah I think that the question is on which time scale right quantum computer is still a very nice end and I feel that quantum computing will go through like the same kind of winter instead machine learning went through a while. When this will happen I don't know but we will see these kind of winter is coming out. In my lifetime I want to see some more impact on a shorter term time scale and I think that machine learning is the right path for that and I actually don't think that I shot the door at some point I want to do a bit of quantum computing again but maybe take my ML knowledge to quantum systems in order to facilitate some better approaches to do that but right now quantum computing is very much at the hardware level and I'm a software guy. Cool but tell me about your work at Morantics maybe because start with what Morantics is and what you work on there. Yeah sure. Morantics is a super cool construct actually we have like two separate units where have Morantics momentum we have Morantics Studio which is the overarching company. When I take studios actually adventure studio that focuses on deep tech in Berlin. The idea here is that we have like pre-vetted industry cases where we then look for what we call entrepreneurship residents that want to work on certain critical domains that we deem necessary in order to bring AI into broad adoption outside of just B2C businesses and adventure studio looks at those different use cases then starts to see it an entrepreneurial residence let's them have like six months to a year of like vetting the use case and then build up the adventure. So where I take momentum is one of these special ventures because we are actually not an independent venture we are 100% subsidiary of Morantics Studio and we are focusing on these use cases where it's not big enough to actually build adventure by itself but actually still need help for for certain domains where we try to focus on use cases of clients that have actual problems to see how can we actually apply ML techniques and ML deployment techniques and ML ops to help those customers in need. Classic example are for example visual quality control manufacturers they have no IT stack they have no IT system but they have very hard visual quality control problems. So building a vision classifier based on a convolutional network just offers itself we built that for them make sure that it's actually scalable and then also help them put it into production close to the sensors. You can't build an old venture around it but Morantics momentum can actually do it and that's what we're year four and so with it. Well I guess what yeah why do you think you can't build adventure around that I mean it seems like that we pretty useful to a lot of people. I think the question is how quick do you gain significant market cap right I think eventually you can build a venture around this but I think the adoption is not big enough yet in order to build your own venture around it and in a way Morantics momentum is the venture that can actually do that because we're in the sense we are a professional service department where we go in and say like you have a problem you want to have a one-off machine learning model we can help you get there and that's what we're doing and so that kind of the venture around that but like you wouldn't build adventure to just go out and do visual quality controls for company x y y y z. So how does it work I mean I would think that doing this kind of thing for customers would be very hard to scope right because I feel like one of the challenges of machine learning is you don't really know in advance how well a particular application is going to work and then downstream from that would probably be hard for a customer to estimate how well different levels of quality the model would really impact their business so like how do you make sure that a company is going to be happy at the end of one of these engagements or did you just view it as sort of an experiment? That's a really great question and I think that we are getting some directions on that so the key here is to work early with customers to understand that it needs we really have like very intense engagements before we start our work to make sure is the use case actually solvable how big is the challenge what kind of data challenges do we meet which kind of approaches would we actually take and really take the customer on journey before we really say like now we start engaging and the way that we approach is it's like a stage to approach where we have more like individual workshops which we call the AI hub which is a pre-study tool and actual working engagement implementation engagement so that the customer understands what can be achieved with which data with which kind of effort and then we start the implementation work and then when implementation becomes of course it's a professional service as well as a little bit of security and risk but we already mitigate the risk significant scene and often it comes out that some problems are not solvable and then we go to like a different type of model which I'm actually working at what type of model is that you want to solve the problem is that what I just heard you said no not a problem problems that you cannot just do it like a client engagement right I see there's a different funding strategy that also exists in the US to a certain extent but much more so in Germany and Europe which is publicly funded research projects the German state or the federal government is interested in solving certain types of problems that are industry spanning but they're too hard for just a single company to just work on it because you have to bring many many different domains for it together and so they fund consortial research which is typically like four to ten partners but you have application partners that bring their challenges with problems and data sets with them then you have academic partners that bring in academic state-of-the-art research facilities and then you also have professional service company like us who really understand the problem models deep tech industry applications how do we make machine learning models robust in that and you engage in translational transfer research to use the academic results to apply to industry problems once you solve that then you have enough data to actually then bring it to a client engagement and a B2P relationship. Can you talk about some of the things you're working specifically? Specifically yeah we have a bunch of research projects that I go in on with big manufacturers and automotive and in Germany we just are about to finish a project on self-driving cars autonomous vehicles very classic use kit for Germany I would say and here the idea really is that car manufacturers do not really understand all the details that are involved in building a for example segmentation network or optical flow application but they are very very good in understanding functional safety regards and so really bringing those two domains together of saying like we need self-driving cars autonomous vehicles but we don't know how to build the segmentation models we need the domain expertise and we say we know how to build those segmentation models but we don't know actually what are the safety critical features and how do we bring those together and that was a research project that we worked at. That's cool so you're doing segmentation on vision basically from from vehicle. So there's computer vision is one of them we were investigating synthetic data sets where you have essentially rendered data set in order to pre-trained those models optical flow detection on inbox detection person detection these are some classic models we also have other research projects that I much more go into optimization problems where you need to understand how manufacturing pipelines actually look like. Cool example I unfortunately cannot name the the company name but like you mentioned you have a critical element for building a car seat there's metal bars these metal bars they are funny they enough going through like 50 different manufacturing steps and sounds crazy but it's actually true those 50 manufacturing steps are distributed over 10 different factories of five different just in time partners. Wow can you give me some examples of what these steps might be it's hard to picture 50 steps in a metal metal. It's like the raw metal forming to like the raw rod then the first processing to bring into the right rod then you do a croming of the of the rod then you start the first bending iteration then you recrome refinish the second bending due to the next step and so on until it's in the right shape. So like there's a lot of these steps yeah it's I didn't know about that either and it's it's pretty crazy and what happens now is that in your manufacturing process it mistake happens at step number 10. You don't notice that mistake until step number 15 when your metal bar is a little bit outside of specifications. So typically what happens is that now you take this whole batch and you put it to scrap metal and start from scratch. However to challenge now is like can we do something in step number 20 maybe that you can bring that rod back into specification so that at process step 30 40 50 it fits again back into specifications. And now you can imagine this is like a very high dimensional optimization problem with a very sparse reward signal so classic optimization problem that's kind of research projects that we're working at and now is the question like what kind of techniques in the field of ML can we use and transfer to those kind of problems and what kind of data do we actually need for that. So what would be the choice here like what would you do differently at say step 20 that might make it useful in the end? We have to find what I kind of leverage right and there is like different types of process stuff maybe you don't heat it up as much or you over bend it a little bit into one direction and rebound in the other direction maybe you do a refinishing at some point these are like all the levels that we have and we have to explore like what is the actual problem. And here you start to see the the the death of the details what are actually the defects that matter like it's cause the inference problem it's a patient learning problem we don't know yet because we just started this project so I wish I knew the answer but then I have would have already published something on that. But so you're just working on a totally wide range of machine learning applications in the real world. It's right you must be building a really interesting set of tools to make this possible can you talk about the stuff that you're building that that works across all of these different applications? Yeah no that's that's a super question because I think that's one of the the things that we do extremely well and we have a lot of I'm doing that. We let's start a little bit back because one of the challenge that we have of course being in Europe a lot of companies have very very little trust in cloudy points here. So you have to start with the customer and say like what happens here and one of the things that people are super afraid of is vendor lock-in. So we have to build a tool stack that really is cloud-agnostic so we can deploy like on-prem you can do it on a GCP AWS Azure Unayment whatever it is. So that's the first priority we need to understand like how do we build a stack that's completely agnostic of the online cloud. And so in order to do that we start of course building stuff on Terraform and Kubernetes so we use we do extensive use of those systems to automate a lot of deployment tasks so infrastructure is good. Now once you start going into like all of these files you're getting fairly quickly lost in him because like this configuration files start to become very very complicated so we start to build tools to automate how we actually write deployment files. So we have an internal tool which you also find enough code dev tool that essentially is nothing else than building very specifically pre-programmed template files in order to spin up complete deployments automatically. And so we are completely independent of the actual online cloud because we can just spin up the templates of a full deployment cluster. And on top of that we can then start using all kinds of other tools that we need in these classes what we deploy. We are typically heavily relying on Docker so we build a Docker file that if you can then deploy on a pod-defe that we command using Kubernetes or Terraform for the deployment then we use Selden we use a flight pipeline to automate complete learning pipelines CICD in that loop that is done with flight. And right now we still have cloud build build build but we're already thinking about like how to get that out of the loop. So we're trying to really really cloud a plastic and build like a whole stack ecosystem on this model and ML tools. And is this stack that this stack I guess you're deploying into a customer's production environment does this include training or is it just is it just for running a model for the customer? So it really depends on what the customer actually wants. We are right now we're talking towards ML of ops level 2 I think that's what Google calls it we are not quite there yet but so right now we still have like a split between manually triggering a retraining that we do internally using our stack in like the cloud or on their on-premise system and then also having a separate manual step to actually deploy it into production. And we're doing both of them like we can actually do the deployment step and the retraining step using all of our infrastructure and the target really doesn't matter because because we build the cloud a canostick we can for example do a retraining on our internal cloud which we most use GCP right now for us but if the customer wants to have the model in their production stack we train it on our cloud and then move it to their production stack on-prem. I guess what if you learned building these tools I mean it sounds like you're making the stuff you're deploying it there's many you know people trying to build these things what if and the kind of lessons actually when these things get deployed into customers systems. That is really really hard. What is it hard because it's conceptually it's simple like what actually really makes it hard. It's actually not that hard if customers are okay with using cloud deployments. I think what makes it hard is if they're using on-prem and their own stack because then suddenly the tools are not yet at that point where you can just abstract away every kind of sus admin. You're always having this touch point between how is that hard we actually managed and how you can deploy it as soon as you have a Kubernetes cluster installed on on premise you're probably fine again but until you get there you cannot abstract that system away and then you're also getting these realities of the business that you sometimes have to deal with IoT devices and then deploying stuff onto IoT death 3 not there yet and I think the tools are fallen short on that and but I think that just a matter of time until we have more tools that are ready for IoT deployments. How do you think about monitoring the system in production out of ads and these things could be somewhat mission critical but I know you didn't really mention production monitoring how do you think about that? I think it's very important and we do it. We are not necessarily deploying extremely mission critical systems right now so that's what we not haven't done yet. I think we're getting there soon but right now it's mostly just like measuring up time and making sure that the stack doesn't fold on the load so it's just like this standard production monitoring that is just profanal, low testing throughput measurements and these kind of things not necessarily decision making and auditing trails and that regard so it's more like a standard side reliability monitoring that can be automated fairly easily using craft fauna or any other monitoring tool that you like. I thought you might want to talk about some of the tools that you have developed like squirrel and parrot and Camille and could describe what these are. Yeah that's really cool. My personal favorite right now is squirrel just because we're just about to launch it and then and reset out into the world which is super fascinating. The goal here is that if you take a look into the ecosystem we have very, very good in building ML models for training on single GPUs but as soon as anybody encountered for the first time trying to deal with multiple GPUs you get into big problems and many frameworks have come across that are actually helping you to distribute a model but nobody has really thought about like how do you distribute a data and there are not many frameworks out there there are a few things that we have looked at that that are trying to solve that and the ecosystem is getting bigger but we are now decided we want to go into like a place where we can really make data loading on distributed systems as easiest possible and doesn't need to be only for the burning but it can be for a lot of different things and on top of that also build in potential access control levels right like you want to pull that one from this bucket next one from that bucket that's sort of one from this bucket and make sure that you will mix in mashups very well and that's what squirrel is really about to make data access and data storage and data writing super super simple as simple as you can do it by just extracting a way to file a system you can be on a cloud it can be on local it can just be pulled from the internet and it should be easy to integrate in any kind of framework and that's really what we're doing here and your plan is to make this open source. The idea is to make this open source and then you have a preference of like other open source tooling like do you guys kind of standardize on your like ML framework and things like that like what's your sort of years set of tools that you would typically like to use. I mean we of course also standardizing as much as we can you can imagine like having many many customers you want to have standardize tools so standard framework is a PyTorch that's what we're doing internally for training these models we're also betting a lot of PyTorch lightning as an easy framework we also using Hydra that's developed by Facebook as an interface and an interpoint into those systems. Why did you pick PyTorch lightning or what did you like about that? I think the idea here is that it really abstracts away much of what ML training frameworks have to do you're writing a data loader you're having an optimizer you're having a training loop and you have a locker and typically when you just look at typical GitHub repositories everybody writes for a batch in data loader do all these kind of things it's a very repetitive code like just abstract is away use some software engineering so it's robust and then you can go with that right it's especially important if you're doing production models or you just have to retrain and you need to be stable on that. Software maintenance is I think one of the things that is not really in the academic and now community which comes to me because the field that is coming out of the engineering should value good code qualities a little bit more I feel so we have to do it ourselves and so use tools that make maintenance and debugging off the machine not anymore is easier and so frameworks are the way to go for that because you don't want to build it yourself if your community can help you maintain your systems. Do you also use PyTorch to run the models in production and some people will kind of change the format or like do something the model before it's deployed do you just like load up the model as serialized from PyTorch or did you do anything special there? No we took the EDC or I did from PyTorch directly because right now our mode is to shift occurs around the world I think eventually we probably for certain applications need to go into like a more standardized frameworks like Owen and X or something like that that will change the game potentially but right now we are still using the binary Docker. Where do you see gaps in the tooling right now as some of it likes to make it so evelibles like what I guess what parts of the stack feel mature and what parts feel broken? What fields broken to me is that you have to pluck many systems into many systems and that feels a little bit sad because that makes it really hard sometimes and to stay a rest of the of the edge. I don't think that there is anything lacking in the community right now I more feel like the problem is that too many people are building too many tools instead of just coming together and take one tool and bringing it to the next level. The thing that happens is that people try to be different from others instead of making one tool that that are so a lot of problems counter example where this work really well is in the data science world right you just need two or three libraries in the data science world which is psychic learn, numpy and tennis and your set. If you're going into like an obstacle made like I don't know how many tools out there you probably know better than me and it's just I wonder sometimes why. Yeah that's fair I mean I definitely think there's always a moment where there's like an explosion of ideas and tools and then things started to standardize for sure and I think we're still at that explosion stage. That's what makes it interesting to know to be in this world right now. I agree I think that there's a lot of distractions we haven't figured out like for example diploma to IoT but I'm super curious about that I haven't seen much development until recently is how do you deploy models and heterogeneous environments? How do you train on heterogeneous environments? I think there's still like a lot of ML tooling that needs to get better not everybody has like a huge data center of homogeneous hardware so how do we deploy models or train models on heterogeneous hardware? Like I said to the other question I have is how do you hand off these models to a customer like you say you give them like a doctor but if they want to kind of keep iterating on a model like once they've taken it from you or they able to do that or how do you think about that? Because it does sort of feel like machine learning projects are never really complete if you know it I mean. I know I understand what you're saying it's like it depends on the customer I don't think that there's a one rule fits all some customers just like come back and say like hey we need retraining or we need a fresh up and can you do that for us because they don't have an IT department some people want to jump start their IT department they say like okay we know machine learning is the future we don't have an IT department yet but like maybe engage with you and you help us to jump start the engine right and then they start continuing that code it's always of course a conversation because it's always it's also tricky for us to say like hey we're offering our expertise we put in a lot of sweat tears and blood and then you take it to the next level that's always said as well so it's always a tricky conversation but we're we're happy to help people and I ultimately think that everyone benefits if the community just grows. I guess another question I've wanted to ask you about is you've written a few thought pieces on AI I don't know if you have a favorite but I think one interesting one is you're writing on the impact of NLP models on the real world if you could like summarize for for people who haven't read it my perspective is that in a way the NLP field seems to be doing like a whole bunch of very amazing things and I know people argue about like is this like real intelligence or not or like you know how much is it really matter but I guess from from my perspective is like a technologist and enthusiast I kind of can't believe how good text generation has got in some sense and yet I think the impact to me is smaller than I would have imagined from like how impressive the demos like I don't know how how do you feel about that. No I I see a point and I think that is exactly the reason why I like working where I am because it's right in the middle of driving the adoption of what an AI techniques. I think the reason why you feel the impact is not as big as it could have been or should have been is that it's really really hard to bring technology like that to people who are not technologists like us and that's really the challenge here like you have to bridge that that gap and there is this earlier doctor gap and that is the we breached and we are not there yet. I'm also with you I don't really want to get into this philosophical debate is it intelligent is it conscious or whatever it is it's like it's useful technology that's bring it to the people and have them have a better life with it right let's solve some problems with that and that's maybe like the philosophical side the practical side is if you take those big models you're running to the problem like you need already compute power you need infrastructure you need ML ops you need a whole department to actually make use of those models not many people have that right especially those companies that as models useful for take for example a new outlets or media outlets they are completely focused on a very different problem they don't have technologists that just like take a GPT 2 or even GPT 3 size model to put it into a production and then figure out the use cases right like that's just like not how the economics of these companies work and so bringing it to those people is really hard and I think it's the reason why we don't see that impact yet and it's gonna come but it's still gonna take a few years what do you think there are the next things that we're gonna notice just as as I consumers from the impact of these more powerful MLP models I do think that a lot of stuff that will come is improvements in search I think that the the thickness that we get from like similarity clustering is significant and we just need to figure out like how to adopt that in in two real worlds right if you just run three size models the search is slow so we just need to do some improvements on that but I do think that we see a re-ranking on that front I also think that a lot of automation will happen for automated takes generation and that's a positive thing like I don't know how much time you spend on emails I certainly do a lot and you probably do too and it would be nice to just automate some of that stuff away I also talked to several customers in Germany that have this funky problem where they're in like a logistics space and logistics is a very old school domain where you get very free form order forms and there are a models of people that just do nothing else than taking those emails that are just free floating written and turn them into structure text by just manually copy pasting into a structured field sounds easy it's not it's a very very hard and healthy task once we bring these big models into that realm I think there will be a lot of automation for the better and so I do think there's a lot of potential I'm very excited about a future of those models. Cool you also wrote a article on AI and regulation I wanted to touch on I'm curious you're perspective on regulation I mean obviously it's it's coming but I'd be interested in kind of what you think about it like what good regulation would look like if I'm only new right that's a good discussion I think being in Europe one of the things that that I need to learn is that how can you use regulation in order to build value systems of society into your AI deployments and that can be a good thing I think the regulation needs to address the realities of AI as being an experimental technology and we need to deal with these uncertainties but also make sure that we are not opening the door for extreme abuses and give people and consumers the right to to protest right how to exactly build that's regulation I don't know I think that what I appreciate about the regulatory frameworks that we have in the EU is that we are more willing to iterate on regulations which is good like we make a draft we see how it's being in practice something's work something's don't work we try to adjust classic example GDPR and the cookie better I don't know how many cookies you have to click away it's really annoying and people got it and now we're trying to figure out how to build a regulation that we don't have to do this anymore but it takes time and I think it's a process and I think as a technologist you're actually building software for humans right you don't build technology for your own set you're building in order to make something better you to do something better to to make somebody's life better but I guess specifically what's the regulation that you would like to see happen what I would like to see happen is to allow for ML models to have a sandbox environment where you can say like I can do tests on real-world scenarios where you can collect data in in the real world in a given risk frame and then you get can get risk certification that I're going up I say like okay I did my first test that was an exposure of I don't know a million dollar in risk right just arbitrary number don't don't take them for like fixed prices a certified says like okay that's great now you can go to the next iteration face and then you build up this risk where you can say like a certified is willing to back you up on insurance for a given risk factor because only then can you actually use these experimental technologies to go out and do the real world because right now hands are often bound right like by data privacy issue by copyright issues by security concerns and so the regulatory uncertainties around that for especially for a started that builds ML is really really high and so I would like to see having protected environments where you are allowed to test things within a certain box and I think that would be a good regulation because the consumer can slowly get other trust and can see what it can do in the real world you start to see curiosity and you have it under control to a certain extent because you know if the company does something wrong it's going to get penalized and that's bad for the company so I think that would be a good regulation over like to see this form or another. I saw you also wrote an ML and environmental impact and that's something you know I care about a lot and and I've have looked at what's your thoughts there I mean do you feel like people should be finding ways to train models with less compute I mean how do you reconcile the fact that you're also doing model training in your day to day job it's a it's a complicated question on a one hand big models and ML models are really powerful and important on the other hand you need to make sure that you are not burning up the planet with them right so my stance on this is let's reuse those models as as much as you can fine tuning the your short learning wants to train them and really invest it in that money let's make sure that this cost is carbon footprint and the monetary stuff advertisers and that's what we're currently seeing right like there's a lot of interest in training these these big models pre-trained them because they find you in very well I just feel like there's too many people who want to just build them from scratch and not figure out like what can we do with the existing ones and I hope to see a change a little bit in that yeah that's that's my take on it I like it's not just like shun it but also like it's become true the part it makes sense we always end with two questions and the second to last question that we always end with is what's a topic in machine learning that you think is understood or what's something that if you had more time you would love to look and see more deeply if I had more time I would probably put up my physicists head again and try to understand a lot of the optimization problems with the machine learning there's a whole field that is just right for this covering which is the combination of loss landscapes and optimization problems in deep learning models and a connection to statistical physics and I think that is a really really valuable lesson it can actually help statistical physicists who understand certain things better but also statistical physics can probably help the ML community to understand much better what's actually happening under the hood and I would love to contribute to this much more but that's very far away from I don't do everyday you know I've seen I've seen papers on this topic and I always find them impenetrable because I think I don't have the background in physics that people are assuming can you describe a little bit of what this says to someone like me who maybe know some of the math and is interested but doesn't it quite follow like what's is there like an interesting result that you could point to kind of from this analogy physicists typically can think in terms of what what we call a face diagram classic face diagram is the different states of water you have vapor water and ice similar effects happen in all kinds of other physical materials and one of the funny things that you can see is that these kind of face transitions are different where you go from one face to another face like from liquid to to vapor these kind of transitions also happen in optimization landscapes of machine learning problems for example when you tune the number of parameters in a model you go from the model not being able to optimize it all to the model just suddenly optimizing perfectly and people describe this as a spin glass to jamming transition very technical term but it essentially means like from being like a almost quasi frozen state to something that is just very very very whiskers and it's a very different physical properties and you can see those the machine learning models and so these are the early indications that you can use these kind of method and tools that we developed in statistical physics to understand the dynamics that happen in machine learning models cool and ultimately I think this will help us also train these models much better and it must super cost cool well on a much more practical note when you think about all the models that you've trained and put into production what's the hardest piece of that at this moment like what is what is the biggest challenge from a customer wanting a model to do a particular thing to that thing deployed and working inside of the infrastructure I think actually getting the high quality data is really hard because that's where the customer comes in and you need to actually pick them up at that point and tell them like it's not just data in and model out but you need high quality data we did a project for a semantic secretations of very very fine detailed mistakes on like huge metal surfaces these are tiny scratches you have to maybe like five or six pixels on like thousand by thousand pixel image and you need to find like a lost one now this images are recorded from various different angles and labeled by different people so some images there's scratch on some of you there's not same piece of metal but you see the scratch and you don't see the scratch so helping people understand how to label data how to bring the data into a quality that the model can actually pick something up is to really like the complicated part I think that's an understudy problem how did you actually get the data labeled in this case I idea some experience with with the labeling essentially having in a model of people that did use the labeling tool and teach them what to label for again did you build a custom tool for this to find the scratches yeah we used like open source software like I don't know actually which piece we used and then just adjusted it for the use case in order to make this quick and fast awesome well thank you so much this is really fun and so many different insights I love it thank you yeah thank you if you're enjoying these interviews and you want to learn more please click on the link to the show notes in the description where you can find links to all the papers that are mentioned supplemented material and a transcription that we work really hard to produce so check it out

68.66155

39.23453

3y ago

Dec 28 '22 14:49

1n1o07ch

Finished

Dec 28 '22 14:49

2118.408000

/content/robert-nishihara-the-state-of-distributed-computing-in-ml-q83rkjrks5m.mp3

tiny

You have all these machine learning researchers who are some of them have backgrounds in math or statistics, things like that. And they want to be spending more of their time thinking about designing better algorithms or better strategies for learning. But actually quite a few of them are spending quite a bit of time on the tooling side or like building better tools or scaffolding for doing fairly low level engineering for speeding things up. Or scaling things up. You're listening to Grady DeSent, a show where we learn about making machine learning models work in the real world. I'm your host Lucas B.W. Robert Nishihara is the CEO of the company that makes Ray, a high performance distributed execution framework for AI applications and others. His Ray project came out of his work at the Ryze Lab at UC Berkeley and prior to that he studied mathematics at Harvard. I'm super excited to talk to him. So because of the curious about how Ray came to be and how you think about it, but maybe before we go into that, if you could just kind of give a high level overview of what Ray does and why could we use it. You know, at a high level, the underlying trend that is giving rise to the need for Ray is just the fact that distributed computing is becoming the norm. And more and more applications, especially applications that involve machine learning and some capacity need to run on clusters. They're just not happening on your laptop or a single machine. And the challenges that actually developing and running these distributed applications or scalable applications is quite hard. When you're developing these scalable applications, you're often not only building your application logic like the machine learning part. You're often also building a lot of infrastructure or scaffolding to run your application. And we're trying to make it as simple as developing on your laptop, essentially to let people focus just on building their application logic and then be able to run it anywhere from your laptop to a large cluster. And take advantage of all the cluster resources, but without having to be experts in infrastructure. What's the real challenge of making that work because it's absolutely, as probably more of an ML person than the DevOps person, that's like kill me for, you know, even like thinking this, but it's actually it seems like a pretty simple idea. So what makes it hard to actually abstract away the underlying distributed system from the ML logic. A lot of the challenges actually being general enough, right, if you have a specific use case in mind, of course, you can build a specialized tool for that use case. But then the challenges that it often doesn't generalize to the next use case you have, like maybe you build some setup or some infrastructure for training neural networks at a large scale. But then you want to do reinforcement learning and all of a sudden you need a different system, or all of a sudden you want to do online learning and it's different. The challenges really trying to anticipate these use cases or without even the future use cases will be trying to provide the infrastructure that will support them. And that's what she's this by being a little bit lower level than a lot of other systems out there. So if you're familiar with tools like Apache Spark for example, Spark the core abstraction that spark provides is a dataset, right, and it lets you manipulate datasets. And so if you're doing data processing that's the perfect abstraction. If you look at something like TensorFlow, the TensorFlow provides the abstraction of the neural network. So if you're training neural networks, it's the right abstraction. And the way I'm doing is it's not providing a dataset abstraction or a neural network abstraction or anything like that. It's actually just taking the more primitive concepts like Python functions and Python classes and letting people translate those concepts into the distributed settings. So you can take your Python functions execute them in the cluster setting or take your Python classes and to stamp shape them as like services or microservices or actors. And in some sense, the generality comes from the fact that we are not introducing new concepts so that enforcing it, of course, your application into those concepts. We're taking the existing concepts of functions and classes which we already know are quite general and letting and providing way to translate those into the distributed setting. So what's something that we painful to do in Spark, but would be easy to do in Ray. For example, training neural networks or doing large scale, building alpha go or building an online learning application or deploying your machine learning models and production. Those are some examples. Now let's take like building alpha go as an example that does seem to me like a pretty maybe this is going to annoy you. I've got a question for you challenging a way, but alpha goes seems almost like a very embarrassingly parallel learning problem, right? It seems like you could run a lot of learning at once and combine the results like wouldn't that work on Spark for example. There's a lot of subtleties. So if you're implementing something like alpha go yes, you are running a lot of simulations in parallel. But and then that's one part of it. You're also doing a lot of gradient descent and actually updating your models. And these things, they're each of them individually are embarrassingly parallel perhaps, but one of them is happening on GPUs, one of them is happening on CPU machines. They're sort of this type communication loop between the two where you you know take the rollouts and stuff that you do on the money color tree search and pass those over to the training part and they take the new models from the training part and passes over to do the rollout. So there's a lot of this sort of communication and a natural way to express this is to have these these kinds of stateful actors or stateful services that have these machine learning models that are getting updated over time. And the way it's often natural to express these things is with stateful computation which is different from what Spark is providing. So those are a couple examples. Is there specific about reinforced learning because that is actually your background right and it seems like that might have been something to do for making this is something core to reinforce for learning as opposed to like supervised learning that makes this more necessary. I think the one reason we focused on some reinforcement learning applications initially with Ray is that well, beyond the fact that it's an exciting application area is the fact that it's quite difficult to do with existing systems right. So when deep mind is building alpha go or when open AI is doing Dota. They're not doing it on top of Spark. They're not doing it on top of just TensorFlow. They're building new distributed systems to run these applications. And part of the reason for that is that these that reinforcement learning combines a bunch of different computational patterns together. Yes, there's the training part with the gradient descent. There's also these embarrassingly parallel simulations that are happening. There's also some kind of inference or serving where you take the models and you use them to do the rollouts. In some cases you actually you have some data processing components where you are storing these rollouts and then using them later. So it combines a lot of different computational patterns together. And it ends up being tough for specialized systems and you often want to. And you often end up benefiting from this scenario where you benefit from having a more general purpose system. A lot of that seems like it would overlap with a supervised learning. But it sounds like there's a kind of more things going on in parallel that are different. What's why do you think reinforcement learning specifically requires a totally different framework as a. I don't think it's that just reinforcement learning that requires a different framework. I think when people build new applications and want to scale them up, they often end up having to build new systems. And so for example, with companies that we see wanting to do online learning. Training component you're learning from your interactions with the real world, but then also taking those models that you're training and doing inference and serving predictions back to. Some application in the real world right to do this. There's often a streaming component where you have data streaming in a training component where you're updating your models and then a serving component where you're sending recommendations or predictions or whatever. Back out there in the real world. And to do this people are again, it's not just TensorFlow. Just a stream processing system was not just a serving system. People are end up building new systems to do this. But this is also an area because of raised generality of where some of the coolest applications we see and you can do the entire thing on top of. I think one thing you mentioned to me or maybe it was someone on your team mentioned to me in the past is that a lot of folks are not even doing machine learning on top of. Yeah, there's a mixture certainly a lot of people are doing machine learning, but a lot of people are also they're just their Python developers. They're developing their application on their laptop and it's too slow they want to scale it up. But they don't want the investments of needing to build a lot of infrastructure and and they're looking for a simple way to do that. So you're absolutely right a lot of these people even if they're not doing machine learning today they do plan to do machine learning. So we're going to start to see machine learning being integrated into more and more different kinds of applications. So it's often our users are not just like just training a neural network and isolation. Sometimes they are they're often not just deploying the model and production. Now they often have machine learning models that are integrated in interesting ways with other application logic or business logic. All of that logic will run on top of Ray or that it's just like the trickiest bits of the machine learning are the most complicated parts going right. Yeah, well, so to be clear, Ray integrates really nicely with the whole Python ecosystem. So if you're our users are they're using TensorFlow, they're using PyTorge, they're using pandas and spacey. This is part of why Python is so great, right, it has all these great libraries. So our users are using these libraries and then what they're using Ray for is to scale up their applications and to run them easily on clusters. It's not replacing all of these things, it's letting people scale them up. Or share the dimensions. I guess like slightly switching gears. I feel like a lot of people have been talking about reinforcement learning for quite a long time and there are such like a a lot of examples and go and absolutely love those examples, but I think maybe like an knock on it has been that it's not as no, maybe use an industry as supervised learning is that consistent with your experience or do you see reinforcement learning like catching on is more inside of more real world applications. And certainly not being used to the extent that supervised learning is being used. I think a lot of companies are exploring reinforcement learning or are experimenting with it to see. I think we do the areas where we see reinforcement learning having a lot of successes are in like optimizing supply chains or these kind of operations areas or some financial applications recommendation systems and things like that. Of course, we are that's one application area that Ray supports really well, but it's as far from the main focus of Ray. Or the only focus. I guess it's just because online learning I would view as more like best practice and I think like lots of companies are these trying to do modeling learning. Do you see do you have any way of knowing the sort of volumes of the different kinds of applications or do you have any sense of the relative. I think that's from like the tickets that come in, do you have any sense of what are the can you stack rank like the most common of uses of right that even possible. I don't actually know the exact breakdown. There are certainly a lot of people doing stuff on the more like machine learning experimentation training models. There's a number of people building their companies products or services and running them on top of Ray building like user facing like backends for user facing products. A lot of people who are it's like really just distributed Python right independent of of a machine learning and then there are a number of people and actually this is a really important use case. A number of people building not just like the end applications but actually building libraries that other people will use and scalable libraries. And that's exciting because Ray is it's not just good for building applications is actually great for if you want to build a distributed system. Because it is low enough level that if you were to build a system or library for machine learning training or data processing or stream processing or model serving. It can let you focus on just that application logic right just on your model serving application logic and or your stream processing application you know logic. And then Ray can take care of a lot of the distributed systems details that you would normally have to take care of like scheduling or handling machine failures or managing the resources or transferring data efficiently between machines right like typically if you want to build say system for. Stream processing you would have to build all that logic yourself not just the streaming logic but also the scheduling and the fault tolerance and so on and by. Taking care of that inside of Ray we can let library developers easily build these distributed libraries and that can all give rise to this kind of ecosystem let a lot of other developers can benefit from. Do you think ultimately it subsumes what spark does or does it live alongside it for different use cases. I think spark is the kind of thing where you know of course if spark were being essentially what we would like is if spark were to be created today instead of. Back when it was created and if Ray is living up to its promise and really delivering on what we're trying to do then our hope is that spark would be created on top of Ray and that for developers who want to build things like spark. And so we would make them successful or enable them to do that more easily so that's a little bit of how it like raise it's a lower level API one analogy is if you compare with Python Python has a really rich ecosystem of libraries like there's pandas and non-pias so on spark is a bit more like pandas and raise a bit more like Python if that makes sense. That kind of reminds me of the question that I want to ask which is is it important to you to support their languages. Do you see it is essential to it's funny because we've had a couple folks every time this podcast who've just been surprisingly negative on Python it's. It's not my you know most of the language but I love it for training machine learning but it seems like maybe there's some sensitive slow or hard to scale and where do you land on that. And that's something we're trying to do trying to address and your and your right of course it can be slow although. A lot of the way that libraries like TensorFlow and Ray and other libraries the numpy deal with this is that the bulk of the libraries written in C plus or see and and then they provide a Python bindings. So Ray like you mentioned is actually written in a language agnostic way is the bulk of the system is written in C plus and you and we provide Python and Java APIs and of course Python is our main focus that's where. I'm going to talk about the language of the language of the language of machine learning today and it's off it's one of the fastest growing programming languages. And so being able to have a seamless story for how to invoke the machine learning from the business logic is it's actually a pretty nice feature of Ray and down the road we do plan to add more languages. I think I said you like pragmatic SEO hat of one is worth the languages the people actually want do you think that like Python will stay the language of Frank of machine learning. 20 years you haven't said feeling of that. I don't think I have any particular we special insight here I could see that going either way. Make a stupid dug deep though and I feel like sometimes people building the tools get more fresh airs Python and people using the tools and that's certainly there things that a lot of new features and Python that are making people's lives easier more there's you know more happening in terms of typing and things like that. And you can really do anything with Python it's extremely flexible when we design APIs for example pretty much any API that we can imagine wanting for for Ray we can just implement that in Python and of course when we we say okay what should the API be in Java a lot of times. You run into limitations of the language you can't just have any any API that you want. But maybe that flexibility trades off with fundamental constraints on speed or do you know if you'll that way. And it trades off with something I don't know if it's the performance or something else interesting. Okay, so it's again the nothing like wanted to ask you about is when you started grad school were you imagine that you'd become. And then you started to imagine that it could become a company and important open source project or was it to me to need that you had a moment. Yeah, that's a great question. So when I started grad school is a very focused on machine learning research and I was actually coming from more of the theoretical side trying to design better algorithms for optimization or learning or things like that. And this this was definitely a change in direction although it was gradual. You have all these machine learning researchers who are some of them have backgrounds in math or statistics things like that and they want to be spending more of their time thinking about designing better algorithms or better strategies for learning. But actually quite a few of them are spending quite a bit of time on the tooling side or like building. Better tools or scaffolding for doing fairly low level engineering for speeding things up or scaling things up and we were in the situation where. We were trying to run our machine learning experiments but built these tools over and over and these were always like one off tools that we built just for ourselves and maybe even just for one project. And we thought at the same time we were in this lab in Berkeley which was surrounded by people who created spark and all these other highly successful systems and tools and we felt there had to be something useful here that we could build or we knew the tools that we wanted. And so we started building those and the goal from the start was to build useful open source tools that would make people's lives easier and we had the idea for Ray initially we thought we would be done in about a month. And of course you can get a prototype up and running pretty quickly but to really make something useful to take it all the way there's quite a lot of extra work that's that has to happen. So that's how we got into it. When did you feel like. Okay this could be a company the scope of what we wanted to do was pretty large from the start right it we didn't envision this as just a tool for machine learning or just a tool for reinforcement learning or anything like that it was really. We thought this could be a great way to do distributed computing and to build scalable applications and combined with the fact that from where we were sitting it seemed like all the applications or many applications are going to be distributed so. We what we wanted to build was quite large from the start and to really achieve that it's effort from a lot of different people and companies and actual vehicle to go about these kinds of large projects. We were seeing a lot of adoption a lot of people using it and a lot of excitement and that combined with the fact that and we thought it made sense as a business and combined with the fact that it was a problem that we thought was important and timely. That sort of those are the factors that led to us wanting to start a company. And it has the transition then from grad soon to start a studio. It's really exciting as you can imagine it's there's a lot of differences and there's a lot to learn that's for sure. But I'm working with really fantastic people and even in grad school like before we started the company we are working with a great group of highly motivated people and we had already started thinking about some of the same kinds of problems of how do we combine our efforts to do something that adds up to something larger and how do we grow the community around the ray and it was a pretty smooth or gradual transition. So you've been users are customers that have pulled your product requirements and surprising directions. Yes absolutely. I think. So I can start with one example on the API side. So actually some of the initial applications that we wanted to support like training machine learning models with a parameter server or or even implementing some reinforcement learning algorithms. Those actually weren't possible with the initial ray API. I mentioned that. Ray supports like let's you take Python functions and classes and translate those into the distributed setting. When we started it was actually just functions not classes. So we didn't have the state full aspect and that was pretty limiting. The functions are pretty powerful. You can do a lot with functions. But one day we just realized we were doing all sorts of contorsions to try to support these more state full applications. And so at some point we realized, we really need actors. We really need this concept of an actor framework. And once we implemented actors I remember fill up and I, once we realized this, we knapted out a divided of the work and tried to you know implement it really quickly. And that just opened up a ton of use cases that we didn't imagine before. But you know there was still multiple steps to that. So when we first had actors. Only the process that created the actor could invoke methods on the actor could talk to the actor. And at some point we realized we needed to have these handles to the actor that can be passed around to other processes and let anyone invoke methods on any actors. And that was another thing when we implemented these actor handles that just opened up a flood of new use cases. So there've been a couple key additions like this, which really increased the generality and the scope of the kinds of applications we can support. But there haven't been too many changes. It's actually been a fairly minimal and stable API for quite a while. So there's that. And I would say there are other important ray users that have really pushed a lot and done a lot in terms of things like really pushing for more like performance improving performance. How can we keep making it better. And also on the availability side, they're running this in production, dream really mission critical times and how can they how can we make sure that it's it's not going to fail ever. And also some of the really the like support for Java that's actually that's something that came from our from the community and both initially adding Java bindings as well as then doing a lot of refactoring to move a lot of the shared Python and Java logic into C plus. So that's those are some examples. There've been a pretty tremendous contributions from the community. Actually, this isn't just that request. It's actually committing code the absolutely. How do you think about managing like a large open source community like how do you do basic things like make like a roadmap when people are coming and going and have different opinions and what to do. It's a good question. And I wouldn't say that we have totally nailed it just yet, but we have we use a lot of Google docs a lot of design docs on Google docs. We use Slack very heavily. So we have a Slack that anybody can join and that's a good way to you know, pull people ask questions or for users to ask questions anything from the roadmap to just some error message that they're seeing and or asking if there's anyone else using ray on slurmer something like that. And then a number of other things like just before the pandemic we're doing a lot of meetups. We're doing the this ray summit coming up this coming September. These kinds of events to really meet users in person or virtually and to just get a sense of what people are working on and that kind of thing. That's cool. If you ever had the situation or like someone submit to pull requests and they obviously put in a ton of work years like who I just don't want to go. Yeah, that certainly happens and we try to get in in front of that by having a design doc ahead of time and you don't want people to spend a huge amount of time on something like that before if people are not on the same page about whether that's even desirable or not. I think a lot of the time we're really getting a lot of those conversations are happening over Google docs and over the design docs and that kind of pushback is moved earlier in the conversation. Yes, but I believe there's this knock on ML researchers from some people and definitely not any ways and biases and people. I think some people that have met feel like ML research code is low quality maybe because they. Or in the thread at once to get the paper published and then to wash their hands and so they don't actually sort of see the maintenance life cycle and they don't learn to architect things well. Interesting as you started as an ML researcher and actually more of a theoretical and more research, I hear some people think are the worst. In this I mean, and you have to this very I think this is like very architecture heavy kind of tricky programming projects have you. Has it been like a transition for you to just a level like your skills around this or have you learned stuff along the way or do you feel like you've learned natural to it. So I've definitely learned a lot along the way and I think a lot of this was fill up the who I work with and one of my co founders. He's been building systems for quite a long time and has a lot of expertise in this area. So I think maybe there's less of a transition for him and then combined with the fact that we were in the amp lab and rise lab at UC Berkeley where people had created spark created and messos a lot of the just leading. And it's a very simple and distributed systems and Berkeley has also has a long tradition of creating great open source software. So if we were doing this in isolation it would probably look very different but we were in this great environment with all these experts we could really learn from. So I think that played a big role. That's a great what a great lab. Amazing. I'm going to make music projects. Yeah, and of course you're probably familiar with cafe about the deep learning frameworks like that also coming out of Berkeley at the same time or actually cafe was a little earlier. A lot of people want advantage of machine learning researchers building tools is that they know exactly what problem they're trying to solve. There's some advantages there as well. So I should say, you know, we have in our customers you very can't say this but we could definitely say a lot of our customers with huge fans of right. So that's great to hear. One thing that a lot of our customers use really like is right tune. I'm curious to see how that came about and what your goals are for that. Our goals there are to build you know really great tools for and I delete the best tools for for hyperphameter tuning and hyperphameter tuning is one of these things which it's pretty ubiquitous in machine learning. If you're doing machine learning and training machine learning model chances are you're not just doing it once but actually a bunch of times and trying to find the best one. And this is something again where a lot of times people are building their own tools and you can write. Your own basic hyperphameter search library pretty quickly right it's like a for loop if you're doing something simple but these experiments can be quite expensive and if you're trying to. Make it more efficient or you're trying to speed up the process there's quite a bit you can do in terms of. Stop things on experiments early or investing more resources in the more promising experiments or sharing information between the different experiments like with population based training or hyper band or things like that. So there's quite a lot of stuff you can do to really make the experiments more efficient and that's when we're trying to just provide that off the shelf for people who want to do that at a large scale in a way that's. Compatible with any deep learning framework that they're using and just works out of the box. So as part of the vision there to show people some of the libraries that you think should be built on race so that they build. You're going to start moving more libraries or yes what libraries you should build is a core part of your project in what one should be third party. So in the long run. Most of the libraries will be built by third parties but I think it's important to start off with a few high quality libraries that really that address some of the big pain points that people have right away and are the kinds of things that people would if would you know want to use Ray for or have to. Build themselves if they if we didn't provide a library essentially started with the scalable machine learning trying to address some of the provide libraries that let people. And then just build it themselves using Ray or hopefully in the longer run there will other people will build libraries that really flesh out this ecosystem. When you look at machine learning projects that you've been part of or that you've seen and you sort of look at the whole arc from like. Conception and experimentation deployed and useful in production. Where do you see the most surprising bottlenecks. Obviously aspect is the is the bottleneck around scaling things up right this is one of the core things we're trying to address with Ray one less obvious bottleneck. Is about interfacing machine learning models and your machine learning logic with the rest of your application logic and one example where this comes up is with deploying or serving machine learning models in production okay. Of course so web serving has been around for a long time right and you have Python libraries like flask which lets you easily serve web pages and things like that. So what's the difference between serving regular web serving and serving machine learning models. Superficially they might seem pretty similar there's some end point that you can query and in fact when people are deploying machine learning models in production they're often starting with something like flask wrapping their machine learning model in a flask server. So you start to want to batch things together or you start to want to incrementally roll out the models or roll back or things like that or compose models together. At the other end of the spectrum you have specialized systems or tools for machine learning model serving so things like TensorFlow serving I think there's a PyTorch one as well. The challenge with a lot of these frameworks for machine learning model serving is that they're a little too restrictive and so they often it's just a neural network behind some end points it's a tensor to tensor API right these a tensor going in and then a tensor going out. Often what you want is to have the machine learning model as part of this serving logic but actually to have other generic or application logic surrounding that machine learning model so whether that's doing some processing on the input or some post processing on the output and really combining these things together. So that's that's one pain point I've seen quite a bit and and we're actually building a library called race serve on top of Ray to really get the best of both of these worlds. Cool that's awesome. Okay my final question is when you look at machine learning broadly research but also production of women all these things. What's the topic that comes to mind is something that people don't pay enough attention to that's that's more important than the credit that it gets. So I'm not sure if this is underrated but one area that I think has a ton of potential is in using natural language processing to. Help people ask questions about data and help people ask questions about all the all the information and data out there and for example the fact that. If you if you Google something like a simple fact what year was George Washington born or what's the capital of California or you immediately get an answer and so it's it makes it easier natural for people to ask interesting questions about facts and to and to realize that there's some ground truth out there. So if we can provide similar tools that let people ask a lot of questions about data sets right that are not. Simple facts that you can look up in a on a database but rather have to be inferred from by performing some. So we can start some filtering or some basic statistics what is the correlation between. And I think that's something that's becoming more possible and would be very exciting. And what you're thinking about is your race summit can you tell me a little bit about what you're hoping to accomplish there and who should come to it. And so the range of them tech companies like Microsoft or AWS to company finance like. And so we're going to be here with the machine learning or the Python applications this is going to be the best place to do that and we're really excited they're going to be hearing from. And you know, you know, I think we can create the pandas and as well as tons and tons of companies using right to really do machine learning or scale up their applications. It's an area where it's an opportunity for the rate community to to see more about what everyone else is doing to get to know each other better and to really showcase some of those use cases. Nice. I'm really going to be there and yeah, I think it's really a talk so yes, you are. That's a very super excited about that awesome. Thank you. When we first started making these videos we didn't know if anyone would be interested or want to see them but we made them for fun and we started off by making videos that would teach people and now we get these great interviews with real industry practitioners and I love making this available to the whole world to everyone can watch these things for free. The more feedback you give us the better stuff we can produce so please subscribe, leave a comment and gauge with us. We really appreciate it.

65.3848

32.39909

3y ago

Dec 28 '22 14:49

3twyu6e0

Crashed

Dec 28 '22 14:49

3971.088000

/content/james-cham-investing-in-the-intersection-of-business-and-technology-t4lxx8bs1ky.mp3

tiny

They're still an enormous disconnect between when an executive expects to be able to do and what the software developer or what the machine learning person or the day signs is actually understands this doable. You're listening to gradient descent, a show about machine learning in the real world, and I'm your host, Lucas B.W. James Champ is a partner at Bloomberg Beta, a fund that invests in machine learning and the future of work. He's invested in many successful companies, including my first company Crowdflower and my second company Wates and Biases. So I've worked with him for a really long time, and he always has really smart things to say about technology trends. I'm super excited to talk to him today. So James, you've invested in AI for a long time. You were the first investor in Crowdflower, my first company and you're the first investor in Wates and Biases. And I was curious to know your perspective on how you're thinking around investing in the AI has changed over the last 15 years. Clearly, the market has changed, but I was curious to understand how you're thinking has changed. When I invested in Crowdflower, I didn't understand that I was actually investing in AI. I thought that there was a broader collective intelligence problem that you were solving, and I was really enamored with both Crowdsourcing and flashteams at the point. And to be honest, I kind of still am. Like I still sort of, in some ways that I think about AI or sort of machine learning more specifically, kind of as a misnomer, I kind of think that it's actually a collective intelligence thing that's going on. And so that's like on the sort of the broad, theoretical side. And then the big change on the on the investment side, I think, is we went from a place where people actively didn't want to invest, or where I actively, there are a couple of folks that you and I both know who I actively encouraged not to use the word machine learning because I thought it hurt their chances to raise money to a world in which now we live in where there's an incredible amount of investment. And what's interesting about the incredible level of investment right now is that it's still, we're still sort of at the cusp of getting actual great business results. And so we're sort of like at that point right now where I think all the pieces are almost all there, but they're not quite, and everyone feels that you have that little bit of impatience where everyone kind of wants to get it, you know, and you know, the talent's not quite there or the executives don't quite understand it. So I think like that's, and so that's like that's like an uncomfortable also, but also really exciting point to be in. Well, do you think there's some chance that we're set up for disappointment? We are always set up for disappointment. You know that's what we're like. So that's it. So Lucas and I I'm lucky enough to you know, you know, whatever I guess every two weeks we have like our little morning chat and I feel like like sort of we have recurring themes and one of them is this continued question of where are we in the market and you have to admit that like the last probably few quarters there's this sense that everything is coming together right, but at the same time as it you feel like everything's coming together you're still looking behind you to say, oh goodness, and what way are we overselling and what way are people misunderstanding things? And at least to me, it feels like there's still base levels of understanding that are missing and it still feels to me like like there are sort of opportunities to define the market in sort of like the right way rather than the buzziest silly way. When do you think investors kind of flipped from feeling like machine learning was a science project to machine learning was a good business to invest in. I mean you've always done kind of early stage stage investments. That's probably where the change happened the earliest, but but when was that and and what was going on that caused that change in mindset? I mean some of it is that well okay you know there's this little joke around Google and Facebook where you know sort of what do we start up really do we commercialize things that Google figured out five years ago right and then we bring it to the rest of the world and there's a little bit of that sense that that's like not ridiculous right that you saw sort of like the kind of changes that people were able to implement and build inside the big vangs and then realize that this is this should be more broadly available and so you had yet that on one side and in other side you had like sort of these remarkable well okay so I like how do I think about this? I think about the academic side you had a few things happened right on the one hand you had great results right to super impressive results but also there's a way in which like academic sort of figured out how to play the game in the sense that the machine learning world was sort of well defined enough now that people can compete on some basis that they understood. I remember there was this guy who gave this great pitch around how to think about advances in machine learning and he made the point that actually maybe it's really about the size of the data set. Do you remember who that guy was? Do you think that's still true? That was Lucas by the way. That was Lucas just to be clear just to be clear. Do I think that what do I think what is still true? Well you know I do think the size of the data set is incredibly important and I think you know maybe five or ten years ago I thought it was really the only important thing and that advances in algorithms seemed pointless to me at the time but you know I think in retrospect maybe it had a such a quite extreme view but you know at that time it wasn't clear that like deep learning worked much better than sort of traditional methods. There hadn't been a lot of improvements in algorithms for a really long time and so almost all the advances felt like it was coming from bigger data sets but now I look at you know open AI and deep minds and it feels like a lot of the advances that are happening there is you know on one hand coming from bigger data sets making more advanced modeling possible but also advances in compute. Okay so I've got a new one sort of like the extreme claim you used to make which is I actually think it's that with the availability of large data sets but also with the availability like the understanding that these large data sets were available it meant that everyone understood how to play the game right that it meant that you have a whole wave of academics and companies and corporations and groups and teams saying we can play with these sets of data in interesting and novel ways and so what that meant is that the thing that was the scarce commodity or the way that you basically laid that piece out meant that people were able to work on it and then that's where you get all these exciting advances in part because everyone agreed on sort of how to think a little bit about the data. You know what to ski to I think one of the things that that you did really well was maybe starting a real trend in content marketing among VCs when you and Shivan put out the machine intelligence kind of infographic where you laid out all the companies I was curious what what caused you to start it and then I feel like it became wildly successful and you stop doing it and many other people have picked up where you left off but without the same in my opinion quality that you had so can you tell us the story behind that? Sure I think you know the fun start when the fun started I think there was a sense that we were at the tail end and incorrectly there was a sense that we were at the tail end of a bunch of the investment around big data and that there were a lot of failed big data projects sitting around and so then the question was what are you going to do with all that investment understanding and collecting data and then one of the claims or one of the guesses was that you use that data for machine learning right and there are a bunch of AI applications and sort of my my old colleague Shivan's illness sort of had that like push that inside a lot I think in part because she felt it like just intuitively but also she was surrounded by a set of folks who were like playing around different places with it and then I think we were both sitting around thinking well this is just so hard to understand and we couldn't make heads of tails of it and then and then basically what happened was you know Shivan being just a really great synthesizer but also someone who's quite dog-ed sort of decided to go work with another friend of hers who figured out ways to cluster different types of businesses and so she basically then took that those you know clustered a bunch of different types of businesses that included a number of keywords around AI and then categorized it and then stuck in an app and you know I think that was like a I think there's like a two-month process to actually go through all of that and have all these horrible spreadsheets because it's super I mean there are products now that do this right but it is like super manual in some ways and what was exciting about it the moment she put it together so I give I give her all the credit for actually doing the real work then suddenly it felt like this world was legible for the first time and then I think we kind of assumed that there should be people working on this full time rather than having this just be a part-time job and then they would do a better job of it and I think that and so for a few years basically she wanted to take some time off right around the summer to like just do the state of what's going on and I think it made it it was really good not just because you know the categories were not always right but at least it gave something for people to agree or disagree on it and it made a bunch of connections for folks that I think you know it's still valuable to the state and so why do we stop I don't know like there are too many companies right and and part of it is there are too many companies part of it is like you know you think that like there should be I mean I do think there are new class of journalists who now think that way right who think that makes the computation no plus will in the study the work plus not sort of subject to like the day-to-day grind of reporting the Latin story and they should be coming up with those conceptualizations but I haven't totally seen I do you think it was a novel contribution at the time? So one thing that I know you are very interested in because you talked to me about it all the time is it's kind of how organizations function as as a collection of humans trying to work together towards a common goal I feel like you you think about that more than most and you think about machine learning more than most I was curious how you think or maybe how you've seen organizations adapts to machine learning becoming more mainstream within them and I'm curious if you have predictions on how organizations might continue to evolve as as machine learning becomes a bigger and bigger part of them. I mean we're we're not yet at the point right now where machine learning is boring enough that it could be adopted easily so we're still in the part of the market or part of the face where there's you know plenty of exploration and plenty of definition and ecosystem definition to be to be had but and and you see some of that in like sort of slightly misguided arguments around augmentation versus automation and I think you only have those sort of theoretical sort of questions when people don't have actual solutions they're dealing with day-to-day right but I think that there's definitely so that's that's the first part and then I think the the second part is that like management theorists have thought for a long time or talked about the idea of a learning organization that organizations will actually get better over time because they you know learn things and generally that's just been a metaphor right that's just been sort of because of the organizations are not people they they don't have minds they don't learn anything right you know sort of maybe those things get codified and processes are rules and part of what's exciting about machine learning sort of in the next like you know the pre-AGI version of machine learning is that we could actually digitize a bunch of decisions that gave bait on a day-to-day basis and we can actually literally learn from them right that like something is boring as you know do I go to this meeting or not go to this meeting or something as important as do I invest in this project or not all those things in the world we live in right now have almost no consequences no one actually follows up on a consistent basis to make sure or understand where the things work or not or they do it's incredibly expensive and difficult right you just think about the think about not you guys but maybe some other theoretical organization you know we'll have to spend all this time just digging down the figure up what product like what random marketing campaign actually happened or didn't happen or how well it worked and just the amount of automation people need to put in in order to like system and times that and what's exciting about sort of like at least to me what's exciting about like so the data rich ML world we could be living in is that those decisions we can now find out whether they actually work or not and then we can actually maybe consistently start making better decisions right now there are also a bunch of you're going to say something quite you're going to say well let's take your example of should I go to a meeting or not how do I ever even know in retrospect if I should have gone to a meeting like how how could an organization really learn whether or not it makes sense to go to a meeting okay so so I think there's okay so you know one of the other angles that I'm very interested in is like that intersection around machine learning and the social sciences and so you'll talk to like management folks that you know a rather on the on the AI side right there's all this question of what's the objective function and the interesting thing is that on the social sciences side they've learned the lesson right which is I don't know what have some objection function and it'll be good enough to like sort of manage but it'll never be perfect that actually you'll have to change over time because the most interesting systems are all dynamic they're dynamic because people are interesting right that you know sort of once you decide that what you decide that one metric is the right way to measure whether a meeting is good or not people will start to learn that and they'll start to game it they'll be like you know what whenever Lucas smiles twice then I'm going to go always make sure to make him I'll tell some stupid joke and he'll detract from the actual professor of the business right and so I think that the the illusion is that you'll come up with some perfect metric and I think the actual goal is to continually come up with you know metrics that slightly will change over time and you'll understand what works in a different works that doesn't work but that'll be okay right that in you know you think about an traditional organizational science there's this great paper I think called like um on the folly of wanting A and rewarding B right or measuring B and I think like that problem is going to be forever but that's part of the fun of the job right that's part of the fun of creating organization and social systems I totally agree with that but I feel like even I mean maybe I don't want to harp on this case too much but I'm curious because I always wondered myself if I should go to a particular meeting or not but how would you even make an imperfect measure of that like what what what what are you even imagine like looking at to um so you can certainly imagine if it's you can imagine it as like is the meeting useful to you you can also imagine it in terms of is the meaning useful for to increase the collective intelligence of the organization right and then you know sort of you can certainly do direct measures which we can just literally ask you how good was that meeting afterwards or we can literally ask the team how good was that meeting afterwards or we can literally look at the number of things you write after that meeting right or we can literally look at the number of times that you nodded or didn't nod so I mean which is just say like all those signals are increasingly cheap together and when they get cheap together that's when we actually get interesting innovation when it's incredibly expensive when you need a higher like McKinsey to do some study and then higher a bunch of people build some system like some very expensive bespoke system then it's not that useful right because then your ability to like move and play with the edges of your social system becomes too difficult and then your sort of your chance to actually design it on the fly and sort of continue to understand it like that's I think the the interesting the interesting edge around social systems interesting where do you see machine learning making a meaningful difference in organizations today? I mean in all the normal places right that we're now finally getting good enough to cluster large scale bits of information in ways that are meaningful so that we can provide consistent responses and so I think that that piece of it which is the you know the big version of machine learning you know finding the most critical decision you need to make the most digitized pieces and then finding ways to like sort of consistently improve and collected I think that that's that's sort of where most of the energy and opportunity is right now but that'll change right that'll change I think that the exciting does that make sense first of all? Yeah you know what I mean? Yeah yeah okay so let me take one slight digression as we're talking about this of course one of the as you talked as you asked this question the the real answer is that executives could know how to apply machine learning if only they understood a little bit more than what they learned from reading or watching a movie right and they're still an enormous disconnect between what an executive expects to be able to do and what the software developer or what the machine learning person or the they science is actually understands is doable and so I do have to make the pitch which I think I've done too many times to you which is I do remain convinced that the sort of like three to four hour class that you used to teach to executives on how to think about machine learning probably is the best like if you were to say like what's the best way to improve the way people think about machine learning you should make your boss's boss take a three hour course and just sit around and play with like a very simple machine learning model because in that process they will at least have some intuition about how incredibly powerful unsexy brittle finicky and incredibly scalable some of these models that you'll build will actually be well you know it's not the core of our business but I have passionate about doing it and really it's not that we you know shut down those classes there wasn't actually much demand for or maybe we didn't pursue it aggressively enough there's much more demand for the the tools that we build but I guess I'm curious you know when you did the class or you're probably having a bus yeah go ahead what's it no no maybe I'm I'm actually maybe I'm just softballing a you know a pitch to you but I'm curious you know it seemed like you really liked that class and really felt like you know your team got a lot out of it but really what was it that you feel like you you took away from those this couple hours of of building models so you know what what you did is you did like half and that you was to a wide non-technical or audience when you know a few technical fish folks and what you did is you gave a little overview and then you had them far up an IDE open up some things in Python have access in data of like I forget where are these socks what was what are the images oh yeah fashion fashion evidence that doesn't the audience yeah that's right that's right and then and then and then you had them and you gave them a very straightforward framework but you had them like sort of played around with slightly different approaches you gave them the opportunity to see the results and you gave them the opportunity to sort of like play with different parameters and then you introduce a few curveballs right and it was actually a very straightforward sort of exercise but it was and it was curated and it was like accessible to a wide range of folks and what was interesting about it was that for the first time rather than thinking about sort of like the grand vision of machine learning you had like a wide range of folks thinking about it from a very concrete like sort of the way that a developer would right where you're like actually dealing with data and you're thinking oh what does this actually do and you're thinking oh my goodness this totally broke but by the way I could also just apply this like 50,000 images instantly right which is an amazing feeling for someone and and it's a different kind of feeling that you get from building software right and and I think that that intuition I'm kind of convinced that you could teach this to Nancy Pelosi and she learned something and she'd make better policy decisions as a result of that I'm kind of convinced that if you you know sort of we've done a slight variation of this with a couple other executives and it worked really well and at least to me it feels like that that shift in mindset and also just like a little bit of finger feel meant that folks just had better intuition right and I think it made a huge difference and then they also have like better questions. What did it always surprise us me about VCs because they always come there's so many come from a quantitative background and I feel like there's so many investments being made is sort of the the lack of rigor in the decision making processes as far as I can see I'm curious at Bloomberg beta do you use any machine learning or any kind of is there any kind of feedback loop where you know something successful and then you decide to invest more than that only for top of funnel only for top of all I mean in our case we're seed stage investors right and so our process and our process for follow-on is very different from let's say some of the bigger like a bigger fund but I will remind you though like you know sort of part of the fun adventure is that the game is constantly shifting right if it was exactly the same game if the business models were exactly the same then it'd be like kind of like everything else if you know fun it would be root nice right and part of the excitement of the job but also part of the opportunity and the only reason it kind of exists is that there there are chances for new business models to emerge where the old metrics no longer make sense and and I think like those sorts of windows come around every so often and to be honest that's where like for there's that kind of uncertainty when they're either a key technical uncertainty or your key business model or market uncertainty that's where like the amazing opportunities come from. You have been doing venture for quite a while now and have seen a lot of you know big wins and losses. Is there anything consistent or in terms of like what you saw at the beginning of a successful company or is it that the venture market sort of adapts to whatever that is and and the opportunity goes away like the I'm sure you reflect on this because it's kind of like that you're main job like it are there is there any kind of common threads that that you you see in the in the successful businesses that you've back. I mean I think inevitably there are arbitrage that exists or there are ways to tell signal from noise but because the market is clever and you know you're doing the founders who are really smart and care a lot about what they're doing you know what you're going to end up seeing is like they'll end up imitating those signals of success right and so there's a little bit of constantly shifting game where you're looking for a new signal to say that this thing means that these guys are high quality or these this insight is like really important and then they'll figure out oh you know what right we should do I should make sure I game hacker news and then I'll get all my buddies to go on hacker news and then we'll coordinate and then that will no longer be a signal right or you know what I really should do is I should make sure that all my friends are consistently starring my open source project on GitHub. I mean just mean like once you figured out then because then this goes back to like I think why you know these sort of dynamic models are so much fun right you know like that's the that's the whole point of it because then you march on and think okay what's another what's another signal success right I'm curious now at this at this moment if I if I showed up and I was pitching an ML company and my customers were kind of maybe the the less tech forward enterprises I feel like I probably shouldn't name names because some of them are ways to advise his customers but like you know sort of like you know if my customer base was like you know proctor and gamble and GE would that be more appealing to you than if my customer base looked like Airbnb and Facebook like what what I mean how would you how would you compare those two is one obviously better I do think it entirely depends on the nature of the product in the nature of the solution you know I think the way that I think about it is that there's sort of like a gradient and a badmiration right and in different types of markets different people are higher up on the on the sort of like and imagine that map right you know the higher up you know sort of in terms of admiration and you know in some places in some markets you know some set of developer tools then actually it does matter a lot whether or not the earlier doctors come from you know sort of the tech forward or from Facebook or whatever right but in plenty of markets and increasingly as machine learning gets mainstreamed right then the questions will all be around business benefit and then the question is who are the companies that other people admire or look up to or aspire to become in those specific markets and I think that's part of the shifting sort of the shifting nature of the game I say it is degrading a badmiration I was always clear to you like I mean you could I mean like okay so you know this the secret found part of the game is when you figure out what that gradient looks like before everyone else does and then you play with people who are sort of high up there right and then you figure out and you said oh yeah everyone's going to admire the data scientists and Netflix you know whenever that that was true right and and then you play with them and then you come up with much bit insights or you know when it was true about you know whatever organization and so you know it's not that complicated to think about right you know you just ask people who do you like or who do you look to you know and and I think that kind like that constantly shifts so what are the things that we were talking about that I was intriguing was you mentioned that businesses that kind of focused on ML even if they're not like selling into ML but using ML for applications in different industries you expect them to have a different business model potentially and and my thought is that the business model would match the market that they're selling into but you you felt differently I'm curious to hear your thesis on that okay so so I you know like uh so on a VC I'm only right occasionally and you know I believe most things provisionally right you know sort of but I'm pretty sure about this one I'm pretty sure that we underestimate the effect of technical architectures on emerging business models so you know if you're to go back I don't know go back to like the saber which IBM builds for I guess American airlines right when they have a bunch of mainframes in some ways that business model which is will charge a bunch of money to do custom development for you that really comes partly out of the technical architecture like the way that mainframes were like central centralizing some other place right and the moment the PCs come around or like they start to emerge there's there's a way in which we think about the you know maybe the best business model ever which is to say you know sort of the one that Bill Gates creates you know quit which you charge money for the same copy of software over and over again right like it's an incredible business model um that partly arises because Bill Gates and Microsoft and a bunch of folks were stubborn and clever and pushed through an idea but part of it was also because there was a shift in the central architecture right that you ended up with a bunch of PCs and so then a different business model because they're different economic characteristics of how that soft that that technical architecture is both rolled out and how it's developed and how you get value then some different business model might make sense and then you see the same thing for the web right you know when you have a ubiquitous client in whatever 1995 I think everyone realizes that oh that means something you and it takes you know five or six years before people come up with the right way to talk about it but subscription software really only makes sense and only works in a world where you have a ubiquitous client that anyone can access from anywhere but just sort of a shocking idea now right you know you compare it to like delivering CDs before or before that you know someone I guess getting a prenatal code that they were supposed to retipe in and so that in each one of those cases it's you know it's enabled and there's some new dominant business model that comes about because the technical architecture shifts and of course you know that that only enables it it's really the people who build the thing and market it and sell it and come up with a new dominant business model like they still have to do that but it just strikes me that the shift that we're going through right now around machine learning our data symmetric applications or this change in collective intelligence however you want to talk about it like that the nature of building those applications is different enough and the technical architecture is different enough that there should be some other business arrangement that ends up becoming the better one for both consumers and for some due dominant customer you know you think about how on the machine learning model building side like you just think about just amount of data you're trying to like just own and control and understand and manage right and you think about like how that changes what's the scarce resource it just strikes me that there's something there and so to be honest I'm constantly looking like in my mind what's my grandrame my grandrame is to meet that person who's working inside one of the big companies who's been frustrated because she's like understood how the the the grain of the wood of machine learning lends itself to some new business and then her boss's bosses like that's stupid we need to maintain our margins or whatever right and then and so I'll admit like that's the grand dream that I'll find that person and the old invest and partner with them for a number of years. In your imagination are there ways that that model could look I mean I suppose it's a little bit hard to imagine these new things but you know subscriptions have been around for while do you imagine like you know a move to more of like a usage based pricing or maybe like yeah companies that are like the pay for your data and combine the data I'm trying to picture what this this this could be right so you know let me describe something you know I I let it a little conference chat the other day a little session about this and I kind of tried to this any anywhere I go I try to like lead a session on this because I'm kind of obsessed um and you know sort of certainly usage based is quite good and interesting but I would just contend that in some ways usage based sometimes puts me as a vendor at odds with my client because it's just kind of want you to do more of the thing right and and sometimes it's not really useful because I don't want to name it but like we are certainly in a world right now where people are wasting a lot of money either on compute or storage without clear business value and then they're gonna someday actually figure it out and then you know sort of cause a lot of trouble right you know that so I think that's that's the prone cognitive usage based there's only some notions around data co-ops right where the the realization as these models get better when are we share our data and then we share data maybe we share upside together you know sort of you know I think there are a bunch of folks who are trying variations of that the the dream of course always is to be in perfect alignment with your customer and one way that happens is you have something like a value added tax or a vague right where you benefit when they benefit but right now in the world that we live in understanding that benefit is so hard right because it requires an enormous amount of infrastructure and management layers and AB testing and blah blah just think of it all the problems all the reasons why it's never worked and maybe someone will figure that out right maybe all the objections that we've had for less X years around why this sort of like benefit driven business model doesn't work maybe it'll work and some like sort of with some twist or turn of how we think about machine learning models. You had me convince many years ago that the competitor would come along to Salesforce that would aggregate the data and use it in smart ways and Salesforce has this inherent disadvantage because they're they're so careful about like you know keeping everybody's data separate and not building models on top of it. Do you still believe that's coming or do you think there is some wrong assumption that you're making or has it happened quietly and I haven't noticed it? No it hasn't happened yet I mean like look Salesforce is like this enduring great business right that's going to last for decades and decades that said it still does strike me that there's an inherent tension it you think about all the trouble that they spent convincing me or convincing people like me to work with them because we believe that the data was safe in their cloud right and and then just the idea that I might share data with other clients is like crazy and terrible and at least from that point of view so there's that inherent tension in sort of like the traditional or the now established SaaS view of the world and I think it's very hard for the incumbents then to sort of like move off of that sort of way of thinking about the world but harder yet is convincing that they're clients and their customers who've been trained to think that way right there's a there's a funny maybe not funny story around you know sort of Microsoft where Microsoft got it and a lot of trouble at some point for sending information back to their main servers about how PCs were doing you know because they would crash or there would be some bug report then they'd automatically send it back to a huge scandal because how could Microsoft be looking and stealing all my information and the hilarious thing and not hilarious to Microsoft but the hilarious thing about that is like that's right it's Google Docs is starting and in the case of Google Docs Google literally sees every single thing I type right I mean like literally stored on their servers and somehow because it's a different configuration or a different expectations around the business I'm okay about it and I think something similar will happen with some you know sort of with sort of emerging sets of machine learning driven businesses although it's interesting that you you say that you had a really interesting viral tweet at one point showing how much better Google's transcription was that Apple so that that was really interesting and actually made me think about the same point of you know Apple is so kind of known for being careful with privacy and Google is known for being have much more at Lazer Ferre I guess with with people's data but it's not clear to me that Google has used that perspective to create a huge advantage at least you know in terms of market cap do you think over time Google's point of view will really serve it or is something changed okay so I think that case is a little bit of a slightly different nuanced thing right I mean why why was that pixel six voice record is so much better is better in part because they had an on-device model right that was one part and another part of it is that they just collected data and much more thoughtful ways and and and so what does that mean that meant like sort of you had a very fast very accurate sort of like local experience it the fact that that's true is like also that you know that's that's definitely true but it's also confounded with the fact that Google is a very large organization right now and they've got lots of things that they worry about and lots of ways that they're unwilling to take risk you know sort of in my ideal world someone who built the sort of technology that Google did around voice would have decided that you know what actually this should be part of the MSTK or some API and we should just make this available for everyone and developers should be building up under products right I mean I think that's the other thing that I think we're on the cusp of because we're just at this point where there's this massive investment infrastructure and research and tooling around machine learning and we're right at the point where maybe people will build products that are actually good right you know like like we're just at the point where the lessons learned around how human elude works the lessons learned around experiences on user interface all those things they don't quite take it like or value added to like the end user like we're just the point where there will be enough variation that some ideas will actually take hold and and so I'm sort of excited about that part too. Are you starting to see that because I feel like maybe I'm too impatient but I kind of can't believe how much better all aspects of NLP have gotten in the last few years I feel like you know transcripts in is now a solid transition now like you know it works like you know I I mean it it basically works you can communicate for sure with with people that you don't speak the same language with by using a translation system the you know hugging face and an open AI's GPT 3 of just like have incredible demos and yet I don't feel like it's impacting my life that much except for you know asking Alexa to play me music. I mean well that but you're you're exactly right where at the point right now where I'm hoping you're listeningers you know sort of our building products because now it's easy it's easier to access it right you know there's this talk about democratization of machine learning you know we talk about this often I feel like but I think it kind of misses the point right the point is by making this more broadly available it also means that the actual ordinary person on the edge who might not have had access to try this before the person with like the crazy idea that will make a huge difference when we act once we actually see it that they can start working as well and I think that that's that's the that's part of the exciting thing that I think everyone misses as they talk about the way that this whole world is shifting but you're exactly right that we should be deeply dissatisfied with on the one hand all the progress is made with sort of voice and part of NLP we should be super impressed with it and we should be deeply dissatisfied because the products and the product minds and the UI folks and the business minds have not yet figured out how to take advantage of those advances in ways that actually makes sense and go with the grain of the technology. One thing that I would imagine being hard as an early stage investor investing in machine learning is that it's so easy to demo successful cases of machine learning I feel like no other field is it quite as easy to make a compelling demo and yet it feels like to make a working product it's often you know going from like you know getting like the air bring the air rate down from like 1% to 0.1% or something like that deep deep trouble okay so here's my secret okay so I'll give you one of my current secret okay telling um so I just assume it doesn't get better like if the application requires the thing to go from 95 to 98 or 98 to 99 I just the mental exercise okay whatever it doesn't get better it's user still get value out of it and if users still get value out of it because of the way they configure the problem right then then it's an interesting product right but if you if you're shouldn't get thinking you know we just spend it'll just be another month before we go from 98 to 99.5 then I'm like well you know I don't really know if I believe that I mean think about this this goes back to like one of our earliest conversations around search quality this is like many many years ago and like what's the beauty of search the beauty of search is that when it's wrong I'm okay about it and there are whole sets of products in which like you can take advantage of the fact that it's super fast it's consistent and when it's wrong I'm okay about it right you know I feel like you do that over and over again you find the products that do that then like those are interesting applications so for the investor you're doing an extraordinary job of not bragging about your portfolio but I mean give me some glimpse to the future let's see what did the exciting stuff that you're seeing lately um you know sort of yeah I mean part of it is okay let's we think about well there are two parts that I want to talk like I sort of want to highlight you know sort of on the ML infrastructure piece I still think that there are analogies or lessons to be learned from traditional software development right I think that you guys have done such a good job of understanding so many pieces out of that and then I but I still think like you know you think about like QA like like sort of figuring out how to consistently do QA like sort of I think that like there lots of lessons to be learned from normal software development to be applied to computer vision and structure data and those sort of like those sort of release processes and so there's a company called Collina that sort of is the middle of figuring out parts of that I think that like you look at companies like um well you know we talked to Sean about we talked about Sean every something like you know you think about you you look at the demo like the publicly available stuff about primer right and you look at it just imagine sort of what they're actually doing under the hood you know if you go to primer dot ai and you look at sort of their ability like synthetically gender I mean their ability to synthesize huge amounts of data and lots and lots of articles just makes sense of the world and imagine applying that in their case like a bunch of natural security use cases um I think they've done I don't know do you look up various things that are happening in the world right now in the word primer you'll see these demos and it's sort of you know they can't show you what they're actually doing but like you get that sense of like oh this is changing the way that people are actually sort of doing things right now so that's you know so that's the sort of thing that I feel like on the application layer but then also like sort of in the development part we're just sitting on right now there's a going back to like your point around um my secret arb right which is I just sort of assume it's not necessarily going to get that much better um there's this great guy Michael Cohen at this company called spark AI and they're big insight is similar to that line which is like you're like look we want autonomous vehicles and we want to be perfect but they're not going to be perfect for a long time and so let's just make sure there's a human in the loop right and so you can think of them as like in some ways sort of like uh whenever the machine is uncertain about something right in front of them they'll get a response in like you know a pretty short SLA then to make a decision and thus you can actually roll out these applications you can roll out these sort of in the real world applications with the realization that the model doesn't have to be perfect right that we can actually have backup systems and I think that's sort of perspective assuming like this sort of non-utopian view of what's possible with machine learning is is super excited to me. I'm curious what you think about um and I guess this is a broad question but about ethical implications of machine learning I mean many many people talk about machine learning and ethics and there's there's I feel like there's constantly the news you know issues that come up with machine learning what do you make of it like do you feel like there's sort of special ethical considerations unique to machine learning different than technology or or not and I mean how do you think about like what you know what kind of world you want and what regulations make sense. So you know I think I have I think it's it's a good thing that we live in a world where people are more sensitized right on the one hand and so I'm very glad to see lots of people applying their minds towards it on the one hand on the other okay so and this might slightly get me in trouble there's like a game that I play with friends of mine who are ethicists are thinking about sort of the effects of technology and I sort of ask you know sort of I think it's appropriate to ask these questions around sort of what are the implications of this of that but but if you were around in like 1950 whatever and someone proposed the compiler to you for the first time so and so you know we got this really really great way of like making software easier to develop and available and it masks and scale and etc etc would you have allowed me to build a compiler just imagine all the harm that could come from a compiler and imagine like me honestly all the harm that has actually come from compilers right everything from hacking to stealing money from people etc etc and there's a way in which like I think there's a reasonable argument that like we we wouldn't like given some current frameworks there's that argument for why we should not have had a compiler which seems on the face of it at least to me crazy right like absurd and so to me the questions instead should there should be this sensitivity and there should be these sets of questions but in some ways the questions should all be around do we think about what do we do for wrong and I think one of the beauties of machine learning is that embedded in machine learning at the very core machine learning is this idea that these are not fixed heuristics or business rules but actually these are sort of like these kind of guesses that we just have to assume that it'll be wrong sometimes right and so in that way once you think from that framework or once your executives understand that's how they think how models actually work that they're wrong they're never going to be perfect otherwise you can have a big event statement right you know sort of once you realize that if you wrong then you need to build the systems and the processes to deal with the fact that they could be wrong and you also need to build a whole set of ethics and ways of thinking about like questions more like responsibility rather than possibilities right and I think that that shift in the way you might think about sort of machine learning I think it will be much more profitable in the sense of being useful for humanity. What do you think? I guess it does feel like machine learning might not be as neutral as compilers in some cases if you imagine it sort of taking inherent biases that we have in our society and then like encoding them in a very efficient system so they can be sort of deployed at at bigger scale and with possibly less oversight. Right and I think okay so that's only if you fall for the idea that we're trying to build an all-knowing God brain that will solve things for us perfectly but instead I think and I think you know at the honest like oftentimes when you'll talk to executives that's how they will think about machine learning right they'll think it's only we can get this perfect and like we can rely on it forever but instead of we thought about as a bureaucracy that is right some of the time but wrong too right and we said we thought about as like a possibly fallable system and we built in the support for that because remember the nice thing about machine learning is that it's incredibly cheap like in the grand scheme it's incredibly cheap to make these judgments on the one hand and also it's centralized right and by being centralized and being cheap and conscientious meaning it's like consistent then you actually have a like one place where you can go and you can always say we fix it here we can fix it everywhere right so that's one part of it and I think the other part that you highlighted which is it captures inherent biases I think that's the other part which is like in some ways it's a problem with the way that we anthropomorphize machine learning like one way to think about it is this amazing whatever mine genius thing on the other hand you could just think of as like an incredibly conservative attempt to cluster collective intelligence right that if if we understood that machine learning was derived from data and data is by nature historical and anything historical by nature happened in the past right then I think that changes a little bit your expectations about what the model could do right on the one hand and then it changes your expectations around what layers you need to put on top of it because you can't just rely on the model right you're going to have to have both sort of straightforward business rules to protect yourself but also you also have human processes to be honest that are actually thinking through so I do have to this point make the plug for one of my favorite papers which is called street level algorithms which sort of talks a little bit about that which talks a little bit about the site so you have to link to it you know so right don't have you have you read it no no okay I think I'm trying to make you read it many times it's totally worth reading you should get Ali up to there Ali or Michael Bernstein to chat about it at some point but I think like their core insight is that if you did think about machine learning models as bureaucracies or as like sort of processes that could be wrongs in the time that you change your expectations but also like the ways that you can take advantage of machine learning which is say like you fix no one place you fix it for everyone right those sorts of inherent advantages go with the grain of the technology rather than against it have you ever gotten a pitch on a company and not invested because it made you uncomfortable like from a like an ethical perspective oh yeah I mean plenty of times right and I think really I mean there's I mean there are plenty of times when I will say I mean on the one hand I'm utility maximizing but then I have my own idiosyncratic definition of utility and my definition of utility is a map directly to just dollars but maps into ideas of who I am and what kind of person I want to be and what kind of world I want to be in and I think that that's true about all of these things right that you know we see like everyone pretends that they're like or rather a lot of people pretend that they're sort of pre-strait forward and utility back but like dollar maximizing but that's not true we'll have tastes and we all have things that we like are don't like in good or bad reasons to say yes or no to things and I think that reality is always sitting with us is there a company that you feel like you've massively missed judge like is there any like wildly successful business where you go back and like think about the pitch and and feel like you like like miss something or or should update your belief system I mean constantly right you know sort of the whole set of low code, no code companies that I sort of dismissed like if you I think I don't know if you remember this conversation like there's at some point when we chatted where I basically said that you know what I really believe in I believe in domain specific languages I think that DSLs are much more powerful way to express business applications and the possibility for business applications than you know sort of um then all these low code no code things and I was totally wrong, totally wrong. I entirely misjudged the value add of making something easy and the way part of part of my head I was like well you know like a developer's valuable not just because they can write things in good syntax they're also valuable because they have to think through complicated ideas abstract them and come up with like good code to actually build something to get something to work right and what I misjudged was that their whole set just like low level glue things that people need every day right that are super easy to do that sort of fall right under the cusp of companies sort of really scary programming and and so that I totally misjudged. Well one topic that we've actually never talked about but I kind of wanted to use this podcast as an excuse to ask you it's I'm curious what do you think about AI and consciousness like can you picture um AI becoming conscious is it something that you think you could imagine happening in your in your children's life times? What does that mean? I guess like could you imagine that there's an ML system that gets to the point where you would not want to hurt it right you would sort of really care about it's it's well being okay so there are a couple different angles that I go on with this I think that's true right now I feel bad for like I do lots of things to have to provide for I feel kind of bad when I drop my phone right I feel really guilty and I feel kind of bad about it for your phone for my phone yeah like I feel like and and I think there lots of ways that I as a human sort of assume human like characteristics to almost everything right from the weather to my camera to like the screen to like some computer program I get irritated like why do I get irritated with chrome as if it's an actual person like it's just like a bundle of numbers right and and so I actually think that we're there already I actually don't think that I don't think that like my willingness to and view moral worth or value to non human things is something that's out there someday but actually something that we're doing we do all the time right now and then although I am Christian what you talked about before like I don't really take a magical point of view on consciousness I think consciousness is controlling what I pay attention to and the continuing log to sort of walk through like imagine you know sort of and so you know like in that way and that continuing log to sort of explain what I thought before right and so so I probably don't I mean I I both value it I think it's really really important and it's like an incredibly important organizing principle obviously for me day to day and I kind of think that lots of things are conscious already right that they already figure out ways to direct attention and organize and also tell stories about themselves does your Christianity not inform your thoughts about consciousness at all um it totally does I mean it's it's really but I mean I think there's a little bit of this angle where I I think that the things we learn about the world or science constantly shift and so I'm actually quite open and willing to sort of adopt and adjust based on how we end up changing our view of the universe does that I don't know does that make sense? Yeah totally I'm like a coherent but I guess I just always make it concrete for me that I also was telling you I'd ask you what I don't know how you felt about it but I always keep curious if people would go through that start check transporter like like if you saw a whole bunch of people go through the thing that disassemble their atoms and put them back together somewhere else safely and you're convinced that it would work would you would you subject yourself to that with that alarm you or or not? Okay so I have contradictory impulses like I get carcic you know like I get like sort of and get woozy standing up on like walking over a bridge right so I'm sure there'd be that trepidation but isn't there also this view like or like when you think about you know sort of yourself right now versus yourself I don't know whatever 10 years ago like a bunch of the atoms have changed have been replaced right and you know in some ways we are going through this slow motion transportation I mean in some ways you're just speeding I mean so much you're just speeding up that transformation you know of the rearrangement of those bits and so like you know I mean I probably want to be the first person to do it pick you know yeah I don't know the hundred but meaning that I would not necessarily have some deep deep ethical mystical reason to be concerned about it because I kind of think we're going through it already right I mean like literally your set of like atoms they are like who are you or you your set of atoms or you set the pattern that your atoms are in right you know like I mean some ways like you're the pattern it's just saying you know I'm I'm not Christian but that transport I think makes me more nervous than it makes here well I mean but but isn't it true though that you I mean like you've thought about like the current the current material composition right now like the literal pieces of it have changed pretty substantially and what could you do change right um for sure and but there is yeah so look I just gave you my most tech positive version of it but sure you're asked me tomorrow if I would do it I'd take a little scary let's find out you know but I do but but don't you also believe that you're your pattern rather than your actual like you're like who you are the organization of these things inside you right rather than the actual substance of it that's true but I feel like I am going to experience the world through somebody's eyes and I think I am concerned that I might my future self might not be inhabiting the body of the person that comes out of that machine but my my wife strongly disagrees with my point of view and that so I can I can see both sides of it I'm just I'm pretty sure that I just wouldn't wouldn't do it no matter how many people went through it and told me that it was safe okay well you say you say that now but I will just remind you that like like our ability to adapt to circumstances into change expectations is pretty dramatic right there's um there are plenty of things you do now there's super like would be super weird to like you from like 1999 or whatever you're really young too but you know what I mean like like our expectations around what's normal or not normal will shift consistently like staring at a phone all day long yeah seriously right yeah all right well um final two questions um one one question is um what's the aspect of machine learning that you think is as underrated or underappreciated or under invested in I do think the all of the sort of HCI social system stuff really is under invested in and I think that there are lots and lots of opportunities I think that it's it's interesting to me that the tools that annotators get right now are still so bad I think it's interesting to me that the tools that data scientists use in some ways have not really changed since I'm where your friend care throughout that paper like 2013 you know look at his paper in 2013 it's like the tools in some ways have not changed enough right and and so I think there's lots and lots of opportunities there and then I think there's lots of opportunities in making um I don't want to like making mainstream or more general or more to generalize from the lessons we learned from human in the loop I think human calling things human in loop kind of was a mistake there should be a better name for it and if we had a better name for it then everyone would think of all their jobs as human in the loop because I kind that's I kind of believe that right I kind of believe that like sort of in the end if we're successful like every process will be slightly better instead and we could be consistent and get consistently better because our job as humans were to either figure out edge cases or create broad clustering so that we can be consistent. So you care about the sort of interface of humans and machine learning how they can work together. I mean I think that I think in in multiple levels right at the level of sort of the at the level of the the person sort of making the initial decision at the person at the level of the person sort of like learning from that at the level of the people controlling that at the level of the people benefiting from that I think all those things like the cutting edge like we're still at a world where so much of that is siloed like the way to think about it is siloed and I think the ways to a lot lots of business value but also me on it's like just straightforward good things for humanity is if people had at all levels of that game sort of like a bigger view of what it is that they're engaged in which is like sort of a great game of collective intelligence. Hmm. All right well practical question which might actually have the same answer it's never happened before as I asked these pairs of questions but when you look at machine learning trying to get adopted and deployed and useful inside of enterprises where do you think the the bottleneck is like where where do these projects get stuck? I think they're so often badly conceived and over promised right and I and you know we joked about this in the middle of this I am still kind of convinced that if we offered your exact ad class to like every senior executive in the world that we would basically all make much much better decisions and we'd end up with like sort of much much more successful with limitations. So I think that that part's definitely true and I also think that like the other thing that's holding us back is we still don't have great methodologies for thinking about how to build these systems right that we are still in software development world I think it was someone just gave me this history you know software like random coding becomes engineering like when NATO decides that it's an important thing in like 1968 and then we could have codified all this waterfall stuff right and it goes from waterfall to extreme to agile over the course of the last like whatever 40 years and what you should see in TV to me is that that methodology I think is mostly wrong for building machine learning models and and so we are still shoe-harning these projects as if they're software development projects oftentimes and thus wasting a bunch of time and money. Awesome thanks James. Okay, thank you. If you're enjoying this interview series the most helpful thing that you can do for us is leave us a review. It helps other people find the show and really we do these shows so that people watch them and what I really want is more people to find it so if you leave a review I really appreciate it. So James here's what I really want to know. Okay how does your religion inform your thoughts on machine learning? Okay so this might be this might be both borderline kooky and heretical so we'll just caveat it first that way. Fantastic. Okay. So I think that there are a few different angles. I think the first is that at least in my theology I think that sort of part of godliness is the act of creation and I think that there's a way in which you know as a investor I put faith in the act of creation and helping people make something new right so that's that's one part and you know sort of the creation of however you want to talk about machine learning I think there's a sense in which the models that we're building in some ways have sort of inherent worth and dignity as sort of basically sub creations of people right that we are creating something you and whether you want to call life for whatever you want to call that thing right that it is something like fundamentally new and different and interesting and that piece of it then sort of informs the way I think about both its capabilities and why it's important but at the same time and so this is the part where I think other folks might have trouble with this is that I do believe that we're fallen I believe that I don't I actually think that we want to be good but we're actually bad and I think that anything we create in some ways has tragic flaws in it almost no matter what and so in that way I'm actually much more both forgiving about people but also institutions but also the models that we make right these things that we're making are both like have great beauty and potential but they're also tragically flawed because we are oh yeah that's definitely good other than five guests that was a great I mean it's a kind of plausible right it's not crazy okay yeah I mean yeah totally good I think we oftentimes all think we're good I mean I think we we think we're good but we actually know I mean I it's not that I'm good it's I want to be good and I'm just always doing stupid things and of course I think it's like creator look I'll be a perfect and that means that there that also means there's this constant transfer improvement which is the core of the understanding of gradients

162.56033

24.4284

3y ago

Dec 28 '22 14:46

bhmr455o

Finished

Dec 28 '22 14:46

2737.128000

/content/adrien-treuille-building-blazingly-fast-tools-that-people-love-xapf15jyzyu.mp3

tiny

There's the area that's often like I'm going to send you an email and just do a one-off exploration in Jupiter notebook and tell me the answer and paste it into a power plant presentation. Like that's a lot of how the rest of the company interacts with the data science team and the machine learning team and that's kind of insane. It's so inefficient and so I think that the aspiration that I have for Streamlit is that almost as a byproduct of existing workflows, the engineers working on those teams are empowered to sort of bring their work directly and inject it into the entire to a company and allow the whole company to make decisions and and predictions and stuff in the same way that they can. I think I would have a big big big impact and they'd already have started to do so. You're listening to Grady DeSent, a show where we learn about making machine learning models work in the real world, I'm your host Lucas Bewald. Adrian is the CEO of Streamlit and a good friend of mine. Before Streamlit he founded Foldit, which is a famous crowdsourcing project that had enlisted millions of gamers to solve real scientific challenges. He also served as AI project leader at Google X and the VP of simulation for the autonomous vehicle company's dukes and was an assistant professor of computer science at CMU. I can't wait to get to all these things with them. I was kind of wondering how to do this. It doesn't feel like we're just talking to an old friend but I think it's inevitable that's what it's going to feel like. I don't know exactly the best place to begin but I thought it might be interesting for you to tell a little bit of the story of your career. Like I know that when you're younger, you're super into music and you're a great guitar player then I think you got into graphics right? Now you're doing a really interesting company and you've done some deep learning. So how does it will fit together? What is the arc of ages? Yeah. Well the arc is that I keep changing what I'm doing half of the time because I realize that there's something else even cooler that I want to do and the other half of the time so I realize I'm never going to be good. And whatever it is I'm doing right now. So when I was in high school I wanted to be a guitar player and I ended up going to this like jazz club that was kind of really hot in the 90s in New York called Smalls and I'm seeing this totally epic young guitar player and then Kurt Rosemanko who became very famous, that's going like the time was a little as well. And I was like you know I was like a high school or who didn't shave anything and I like when I was like excuse me like Mr. Rosemanko would you please teach me how to play guitar and he was like yeah come to my place it's like you know and Brooklyn like tomorrow or whatever so okay so I go there and he becomes my guitar teacher and it was like absolutely one of the most inspirational like episodes in my life because here is someone who like just just lived in a musical dimension that I couldn't believe basically and I was so inspired every time I took lessons with him and and I I was like I can do this you know I didn't even ask him I was like do you think you'd be a professional purpose yeah I think you could I look when I was like hey do you think like how often do you practice and he was like about 12 hours a day and I was like 12 hours a day are you kidding me and he's like yeah I only practice when I feel like it and and I was like oh wow I am not going to be a professional guitarist so that was me realizing I was not going to be a professional guitarist and then I wanted to do international relations and I became disillusioned with that and I got into math and I ended up becoming a professor at Carnegie Mellon and working on both basically machine learning problems and big data problems and we had jobs running for hours every single night and over days on end actually and that was really fun I actually loved it and we were using you know Python and pie all these things that are now very much part of the zeitgeist we were using them like pre one point out when you know why would you use Python instead of a math lab or something like that in various days and well over the actually maybe we should go before that you made fold it right which I think is one of those things is thing and maybe do you want to talk about fold it and what happened there yeah yeah definitely I think that was you know if the if the guitar one was an example of me realizing I was never going to be back get it I think fold it was an example of me seeing something else really cool and jumping at it and so what you know what happened was I had been working on this numerical stuff and then right at the end of my PhD some some basically biochemistry professors and I and they got this group together and we had this idea of let's create a computer game out of protein folding and so it was you know first of all it was a really interesting scientific question because it just so happens that it's like very difficult to create simulations of protein folding it takes a lot of computational power to solve it also is a problem with like enormous real consequences because you know in short proteins are these machines and your body that carry out the basic functions of life their shape determines how they do that and folding is how they get that shape so understanding how proteins fold and why they fold in certain shapes is like literally like the it's like the origami of life itself and so here is this like super interesting scientific problem very difficult to solve by computers we had this line on this like totally crazy kind of fantastical take on it which was let's turn it into a computer game you know which may or may mean not be fun much less have any kind of scientific impact and we just ran with it and we did it and it kind of I blew up you know over a million people contributed to this like really profound scientific problem all over the world some of the best folded players in the world were people who scarcely thought they had a scientific bone in their body and all of a sudden they're at the top of the leaderboards and the BBC is calling them up and asking them to interview them this really happened and so I think game though I mean the game just feels feel made you have trouble imagining this I mean I played this game I was not good at it but it was like you know trying to like rotate and manipulate these like molecules basically right yeah yeah and why can the person do this but yeah computer can't do this yeah well okay if you initial intuition was the the rules of you know why can you recognize a face right well you know from a computer's perspective it's like super hard to recognize a face you this giant neural network and you need to like measure all these things and involve all these things and so you really need like in a sense you might say like millions of equations are stacking up to in order to create this like face recognizer and yet we can do it instantly and and similarly in the case of a protein technically speaking there are you know this geometric number of paralyzed atom interactions that are going into it and these atoms you know they to mention pel one another they attract one another as the case may be and so it creates this like network of sort of attractive and repulsive magnets basically and the the ultimate shape is some kind of stasis so you would look at that problem and think like it's a crazy math equation to solve what it actually would look like and yet the scientists who work in this field develop an intuition that's very like definite and in fact they they could say this looks like a real protein that we learn you know we we know it's shaped through a crystal structure or this looks like a protein that was designed badly by a computer so in essence it had this similar flavor which was that like over time you could actually build an intuition for what you know looks right and what doesn't and that was kind of the that was like the er idea that that let us to believe that potentially that intuition could be essentially trained and here you're training you're training humans actually through yeah that's right and that's actually a really fun process and and instead of the way that we trained them this like you get into the game thing is you actually build a simulation well we had a simulation of how a protein solved and then you let people play with it and and in essence you know proteins they are physical objects like they they're they're a little different from the ones that we normally play with because they're like suspended and water and stuff but but if you pull on them they they resist you know in some places and then they don't like to bang into themselves and stuff and so as you play with them and as you sort of flex them and pull on them there is an intuition to the game about how these things work you know it's like no different than playing with Play Do or Tillie by you at some point you start to understand the underlying material and you don't have to it's not completely new when you press on it what's going to happen and so that actually to you know funny way you know the long story short is that I think it was hailed as sort of a you know certainly a sort of milestone in in in in in attempting to build a giant large scale sort of human computer or computational you know complex and and and also you're able to publish papers in nature and in some cases and in Pani S now they're great that I'm used with insights that have been derived from the players so that was great but for me one of the most fun things was actually that that that phase of like how can we build an actual game that gives you know millions of people the intuitive sense for what this thing is and it is impossible to to can them that and then have them sort of understand it. Grak it and then didn't you make it you made a second game too right? Yeah yeah yeah we created a second game called Eternah and the and that also actually we published same with similar idea it was a scientific discovery game we enlisted a bunch of people both these games are going strong actually so you can play them both right now and the real innovation in Eternah is that rather than just do everything intonulation on a computer we were actually using high throughput synthesis to build the molecules being designed by the players and and so in a sudden in essence your score was determined by a tiny little high throughput experiment that was run which I just think is so cool and and a lot of interesting stuff comes out of that and you only assume you're a simulation for one thing. So how did it work the players have proposed molecules and then you said to size them yeah they would propose molecules they would initially they would vote on them the things that the cost of these experiments keeps going down and so that actually means that the games are being designed against this like you know super more as law kind of change in biochemistry in terms of like what is possible it's in the size how fast how fast what kind of experiments can you run you know what information you get back that's all shifting underneath the game until we're actually redesigning the game over time as these things changed but yeah that's that it isn't they would they would design them they would vote on them we would synthesize them we would share the results with everyone everyone would get a score everyone would look at whatever one else is molecule did and then and what would the score come front and repeat so the score was so okay first of all rather than working on protein folding we were working on RNA folding and spoiler alert COVID-19 is an RNA virus and in fact it turns out that like modern out of this company that's famously one of the contenders to create a vaccine for COVID-19 is an RNA research company so actually it turns out that actually RNAs have at least shoved aside if not in some ways supplanted proteins as being in a molecule of like intense interest by biochemists and pharmaceutical companies as a sort of chemical substrate with which to build a whole new class of drugs that could potentially essentially enter your body and then interact on a super deep like quasi computational level with what it sees and a chemical sense so we were using RNAs this time around which have slightly different properties and then proteins they're they're tend to be bigger in some ways and they're they're bengier they're more flexible and so what we would do is we would say try to create an RNA that folds into this particular shape and initially the shapes were essentially just things we invented that we thought RNAs could plausibly fold into and then over time they became actually more pharmaceutical interesting and in fact the most recent challenges on eternity do have to do with COVID actually are actually if this sounds intriguing to your listeners I think it would be super cool if you guys take a look and play around with it and it's very very current actually but yeah we gave them a shape they were trying to build RNAs in other design sequences of nucleotides that would naturally fold into that shape we would take the most highly voted molecules and synthesize them and then basically figure out what shape they folded into which you can do and then we would basically use a sort of root mean squared error distance function to tell you just like in you know in machine learning to tell you like how close you were to the shape and so the neural net as it were the black box is the human mind but other than that it was just you know same thing a loss function and input function and so and then you just do this thing over and over again and you and ideally through some kind of human-based gradient the sense a little cloud like the community would improve and I'm low in the whole they get so I bet just so cool but I guess like where are we at now because you know I think about games like go that are so well studied and computer is getting better than humans like yeah like it's it have an artificial neural nets surpassed gradient descent on even neural nets at this point yeah yeah so basically yes actually in a way but the other thing is that the game design shifts and so it's just as it's similar to the real world like yes we have better neural nets but doesn't mean that we're all out of a job yeah if anything it means that new jobs are being defined and so and if that sounds good it actually did kind of happen that way into like microcosm of these games which is to say that like it tended to be the case that thought like raw like let me be a computer at this task was not the most interesting thing that came out of it and in any case it was a moving target because there was a universe of researchers trying to create better and better algorithms and using your own natural that matter and a lot of other statistical stuff and so that boundary kept shifting but it might have been in the really interesting thing about having like a large number of humans like in playing this game and basically talking about it on before I was in sort of creating a community around it was that they ultimately came up with interesting ideas and shifted the game design and sort of did this like human element which is like what other interesting stuff can we do here so for example at one point some of the players in Eternah like basically noticed that certain motifs like putting certain new good highs and certain patterns was more likely to create a stable RNA and this is just like a purely human thing it wasn't something that we were like necessarily looking for and then we were able to like basically rigorously prove they were right and that's starting to cross into like science basically and so to you know we have not automated that yet so those are the kinds of things that I think ultimately to me are they're like more important outcome rather than just like we temporarily beats the best computer algorithm in 2003 at this very specific computational task which was ultimately not gonna be a winning formula. When you became a professor what was your research areas or what were you interested in? Yeah there was always these two pieces of it which were if not they weren't really in conflict but they didn't really connect so one of them was creating computer games and we actually created a bunch that did all kinds of interesting things we created of computer games that like allowed us to like capture a kind of information with my student Alex Limpecker a bunch tons of information about how artists draw faces and we actually put out a game on iOS where you could like draw celebrity faces and then try and gas a celebrity it was like based on this like draw it game or something for you out there's called we got a bunch of people to play that and so we were literally like paying Google AdWords to get people to play our games to create these like esoteric scientific data sets to study these like recondite questions which was so so cool such a weird thing to do I guess and pretty weird compared to the other professors and at the same time we were also like writing papers on basically applying machine learning methods to crazy graphics problems we were like applying machine learning to snow simulations and to to the like transport equations we were running you know it was like like words full of equations and it was you know running jobs on clusters that literally took days or weeks to run and so we were doing hyper parameter searches and all this all this stuff that's now suddenly cool so yeah that was those are the sort of dual worlds that I guess probably somewhere to you there's always been this pull towards on the one hand like math and just they're like austere perfection that just fun of that and then also just creating things that people want to play with or use and sort of delight in like creating products basically well I wanted I wanted jump ahead to stream with to give it the time to deserve but we are skipping over a whole bunch of other amazing things that you did but I love to hear the story in your words of coming up with stream because I feel like I watched some of it and it appears to me like it almost just like popped into your head is sort of a complete idea that was sort of like immediately awesome so I'm curious to know what this gets what's for you yeah well it didn't quite happen that magically it's funny that on the one hand I was working on like machine learning problems in numerical math and on the other hand I like wanted to build products for people and like both communities around those products and weirdly I feel like those two things that come together in this product constrain but basically what's written with that is is it lets machine learning engineers and data scientists build little interactive artifacts that allow them to share their data sets their models with their predictions for the future etc with one another and inside of organizations and also with the world it's an app library for Python programmers and we can go into actually why it turns out that's actually a really important thing both for like people who want to show off their skills but also in big corporations that are really that needs to like export machine learning into the whole company I turns out that they both need this sort of superpower that Schummer provides but how it came about was I had worked on a project at Google that got canceled very like heartbreakingly to me and it was a very public failure you know if you choose to look at that that way in retrospect like all my failures were my successes about my successes were not a scarily successes so but you know it's kind of the story you're telling yourself at the time and so I took your really hard basically and I then took a job that I wasn't like super excited about you might in dating parlance you might call it a rebound and then I eventually basically took some time off and I started just writing code which had it long been a passion and a friend of mind named Lucas Bywalt was like hey dude let's go into the woods and like get an Airbnb and we'll write code together which I thought was the coolest idea so we went to the woods and we started writing and grown at CUNE and that was one of several projects I've also been working on a stock market simulation which actually also came out of conversation with you Lucas well the funny story there is I'm like Lucas I think that like and I'm telling them over dinner some like telling you some like statistical properties that I thought this stock market might have and I remember you were just like Adrian like do not invest on this assumption like people lose their shirt thinking stuff like this and I was actually really touched on it's like personal like knowing tension of actually investing and I was like I'm not even like well enough organized so like do that but like I was like well like kind of touched on like Lucas is like really looking out for me here on this set on this like math conversation we're having so I was working on a bunch of fun projects like that that were kind of mathy and they also needed to be able to play with stuff and see it and so basically naturally coming out of that workflow I just was like this I was trying to everything I was out there so I started writing my own tools that allowed me to basically take Python scripts I was writing and turn them into little interactive artifacts that would allow me to like play with them and see their properties more tangibly and then just like changing a number and rewriting the code or or writing a loop and then running a 6,000 times and that need kind of just snowballed like I'm skipping the part where we had some like heartbreaks and poets and stuff and I have been going to that too but it is true that on some level you know I wanted it a bunch of my friends wanted it some people who eventually became a cohort or as well like that's all work on this together some big companies started using it in terms of the way way way way way way way it doesn't skip the heartbreak in pimp minutes I mean that's like a way way yeah okay yeah yeah yeah rewriting into a wall at 90 miles near where they're pimpets because it it really seemed to me like I feel like I watched you kind of come over and say again it seems like it's the idea that's the core of what you have now I'm actually kind of surprised to learn that there was yeah yeah so the pivot was it all comes down to you throwing me off first of all you can showing me the way look at some of that and then throwing me off my head so what happened was I started using stream like as a way to understand you stock market simulations actually and the key thing was that once you built this model like you want to be able to change parameters really easily and then see how that affects the model and obviously the model just retry line it's not like super interesting but when it's like you know you get it when it's something that's like really just computations happening especially it's a non-tradal simulation of the future like there is crazy I mean it's one of the principles of like dynamical systems theory like you can change a number of tiny little bit and all of a sudden like totally different things start to come out and we're in the by-for-case and stuff and so it's really like fascinating and worlds get created and that was the original version of stream and in fact if anything we've come back to that but what happened was you invited me to go out in the woods and code neuro and that's some stuff and at the time weights and biases was you were ahead of me in the sense that I think you'd start at a company already but it was pretty like rough like you were like okay let's use weights and biases for this project and then like five minutes you're like it's not working forget it pretty we're not using weights and biases and so yeah so overall of you people who think the way to bias is as so polished and perfect I can remember it is still very much early so anyway we were doing this just in the wrong that stuff together and I was like oh this is really cool and I think I probably got like a little phomo that like how cool you're in city and company was and so I kind of started to work more on like deep learning style applications for stream but and when we initially fundraised we sort of had a super position of two products in some ways and what it basically happened is that we had some signal that was positive like people were using it and not just because we were biking them every day but we also just didn't have as much signal as we wanted and we in some parts of the company early on we like you know a company wanted us to install stream internally we put a ton of effort into it and then like it was quick it's after we did this like big install for them and so I remember talking to one of our investors who's like super highly respected has been around the bush and he like invited me for coffee and he was like Adrian like what are your milestones like you know don't don't just send us investment that extra like we're still building and that's you know you know that's like a little tough and and we were still building and searching basically and and we were we are opposed funderate so it wasn't like we were just totally in like bushwacking exploration though if like there was you know some kind of clock that that was ticking so we we actually wrote a huge slide deck which was like everything that the product can become and we shared it with everyone who was using streamlit and we basically gave him like an hour-long interview and we like data science the whole thing and we're like you know how much of you want this feature that feature that feature we like cluster them and everything and it actually happened it still happened that really the thing that people were most excited about was also the thing that had been actually kind of the urr like original thought which was that you want to once you've built a model or once you've built a simulation or you know once you've built actually even like an uncrevealed data set you want to be able to rapidly like interrogate it potentially in sort of ad hoc manners so you want like arbitrary code and you wanted to sort of elegantly do that without and that's you know that's a different product category than just like tabloor or something it's it's a little bit more computational and so we realized we should make this an app framework sort of basically a shiny for Python so I used that slide here some which it's and it needs to result in a web page that that's interactive and I I resisted it until I was worn down by my co-founders and then we all just agreed to do it and we just went long on that and we launched it and it it found resonance basically so so that's the story interesting so like what did you add to it to make it do that because I feel like when I first saw it I thought oh this is shiny really well you can have saved me six months did yeah well the basic thing was whether we were going to have widgets and then the next thing which is like you know yes in reactive widget is you just say there is a widget until we exist so how is that why is that so hard one of the reasons why it's so hard is because if you really commit to like writing an app framework then it implies a whole bunch of things down the line about how you expect the product to work so it's not like you know lines of code for the prototype doesn't translate and so like how easy it is to get to from product perspective the other thing is that this is trying to get a little nerdy but there's a question of like the event model and one of the things that makes actually the the thing that you know why is it hard to make a little app around your machine learning model right why can't you just whip together a little flash gap with the react front end and it's like boom it's done and basically the reason is because well actually it's because app programming is actually really hard and the hard thing about is that you have these events coming in there's a whole event model and then there's a state model and then these things need to not mess one another app and it's always sort of reflecting properly and that's turns out to be such a hard problem even for humans like wrap their head around that you know we're still seeing major advances every couple years in terms of just from API perspective how to not make that like a nightmare of complexity and so if you then add if you bolt onto that in a naive way oh and there's also a neural net and it's like you know guide knows what it's doing and there's these giant data such and there's thousands of neural nets and you can you know it really becomes insane and so we came up with this I would say like interesting and constrained perspective on this which is basically let's forget this throughout everything we knew about app programming and just pretend it's a it's a Python script and and it just runs me pop a button just as you would rate a Python script and then everywhere you say you know num layers equals five you're allowed to say num layers equals slider and so if you'll notice that there's at no point did you actually said there was an event at no point did you say oh there's a state that gets modified in this way when we get an event from this slider you just said num layers equals slider and so that was kind of like how do we get to that and so we we figured out how to get to that it implies some constraints on like what we can do in terms of the apps that we create but it also like massively massively simplifies the thought process that you have to go through to create an app and and the way we phrase the the product now is like turn your scripts into apps which usually you when you think of creating an app it's like create an app from scratch or layout all the widgets and then implement it right but so if you just think of it as like turn your scripts into apps then it's like a much more unnatural workflow a lot of people didn't think to you what was that cool and then they like try to and then like within five minutes they're like okay it's super cool I'm gonna you know to eat about it or something that's certainly contributed to a lot of basically natural growth that's not mediated by marketing or it's just kind of endemic or what endogenous to the community itself yeah I've been up to say you know we watch this stuff quite closely it waits and vices and it does seem like you have you know maybe the hottest program that data science is some machine learning people use so congratulations yeah well I mean that's really kind and I have to say it's it's been really fun and you know when you're on the inside you're always sort of focused on the worst case scenario like what's the how could this go wrong and how would we get into the employees if this doesn't land and all this kind of stuff so it's really like nice to hear it to hear someone say that too so I'm curious you know one of the things that's always driven me crazy about working with you as you always want to try some new programming language and I'm always kind of like can we use Python like a language that's like well documented that you actually know better than I than me you know kind of curious where you land on that now that you're doing a lot of Python like do you aspire to create this this type of construct for for other languages totally totally our investors are gonna like hate to hear this it's like this is just in no way it is just that effects the bottom line at all but I mean actually someone from the Haskell community tweeted we should see what we can learn from streamer and I was like that was the best album that in the world because those guys are hard-corded and actually had we written streamer in Haskell there was like all these cool optimizations we could have done because you know a lot more about the program in Haskell basically I mean and in Python you're like nothing about going on right you can just like literally you know change the direction of gravity and like one line of code and I was watching your podcast by the way and I saw Jerry me Howard say that the Python can't possibly meet a future of of machine learning which I which I unfortunately don't agree with I think that I wish that were true and he was I guess a big Julia proponent and I do think you know I do think actually the the key concepts in stream but actually are not specifically Python that thing that I was talking about where you just sort of think of your program as a script and there's no events and stuff I mean that's that you could write it in JavaScript you could write it in Julia and I just think of you super fun to do that so hopefully somebody will create enough profits that I can like in legitimately spend some time doing that I think that would be so fun actually if you want to do with me and the wood side I guess that's the front you hate this is something like fun I guess I learned Julia yeah or Elm if you considered Elm I'm gonna go I love Elm I love Elm got me started what what else do you dream of with the with the app like these sort of feel like the structures done or are there sort of like big things that you like you know there's like really big things that are missing actually and we've actually been calling this for feeling the open source promise which is to say that the way that we allow you to build apps is like in some ways like so new that a lot of things that are kind of obvious how to do in a traditional framework are like don't carry over a distribution necessarily and I think one of the you know basically the way that that works is that people do shouldn't is great for some use cases and then you can hit a brick wall where you're like oh and I also need to have persistence state that carries over from session to session but how do I do that there's no way to do it right and yet when we sold this thing to the world and we told them what we wanted it to be we said hey this is a general purpose app framework that's you know specialized for machine learning and data science but you know you should feel confident using it in all these use cases and particularly now I mean streamer is part of the you know it's part of the standard of work you know data science work by Chattubert and it's being used by a bunch of you know big sophisticated companies and and people are really pushing on its you know limits in many directions at once and we know that we lose people because they're just like I can't do this in streamer so I can't you know I can't go forward so we've we've set ourselves this task which is called fulfilling the open source promise which is basically take the big things that you can do in other app frameworks that you can't do in streamer and just address them one by one but not I mean we could actually do that in like five days if we wanted to just like throw the candidate but we wanted you want to do it like elegant way you want to do it you know I mean that sounds very like glibber cliche but like you know it's if for nothing else is part of the fun to actually think about like what's the real way of doing this properly with regard to this this thing we released custom components which is like a plug-in system that allows you to take like arbitrary react-outs or react or other kind of web components and plug them into your streamer app so that like dramatically opens this sort of footprint of possible things you can do it has it's now like as big as react-outs on level and react to sort of everything and now we're adding a way more visual customization and notably on our tubber 15 we are releasing by far the biggest and most like profound thing which is single click the play of any streamer app so the the thing that you've been working on on your laptop and the thing that you may have shared with the world laboriously by putting it on her IQ or on GCP or something and you can now literally push it back in and it's how to URL that everyone can see is that mean that you have to leave your laptop open for that URL to keep yeah no no no no that's actually a cool product too and I people have asked for that product there was actually an executive at Tolja who became like obsessed with streamer and he's like I just wanted to be able to like send in a slack message this app like what I'm looking at on my screen to my you know co-worker but I don't like put the entire thing on a third-party server and I don't want to you know and actually there's a really cool I love the idea I keep trying to like tell everyone we should really do this I feel like people would just be like I'll pay not pay if I'm back for that like you know just let's reflect this app you know off of the streamer server isn't it? Well you know it's so funny because I always use this weird thing that Twilio makes where it can kind of reflect stuff off my server so it's really sitting on the thing that I should pour that presentation tell that executive yeah forget what it's called but it's just like a it's like a bouncing like thinking the cloud where you can have a stable URL. There goes let's get this between you and me and monetize it let's cut this part out of the none. Yeah I think that was just cool probably but the way that the sharing works is we we instantiate your app so there's there's actually a lot of the work is taking your you know requirements not text file and taking other kinds of your app requirements and stuff and building an environment that reproduces your app and doing so in a way that's like same and non-infuriating but on the other hand it's also taking a lot of work to build this thing in one of the reasons why we feel confident like building it is because so many people are building it themselves in like a super ad hoc way and actually companies are building it themselves too and in many cases just being like can you build this for us? So it's like we don't feel like we are in a way we don't really feel like we're blazing a path at all we just feel like we're sort of standardizing whatever one's doing and then hopefully just making it like way easier to do so yeah that's that's why we think it's a really cool feature and and if you wanted to be private you know if it's public for free if you wanted to be a private it's a paid feature and you know that's that's the next step for stream but so you know let's see how it goes nice congratulations one thing I wanted to ask you about I don't know if this is too far out of left field but another thing that's been notable knowing you is how interested you are in meditation and I was already that connects to the work you do at all in stream later if you're kind of like working life is connected to the I guess like the spirit would you call it spiritual? Yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah what is it like full question you know the funny thing on meditation is it's in possible tell whether it's God damn working or not so I can't answer your question um at the time when my projects had been canceled and when I was forced to like reckon with a definition of Adrian that didn't just involve like critical projects that everyone loved when after I'm out there without you know and I suddenly became very interested in meditation and so you know maybe I was seeking a challenge that I could win you know and can do what they were winning on a meditation and or maybe like a self for you know the the pain in life and it actually just wasn't my project being canceled which is fine but you know there were a number of like personal things in my life as well that were like just really painful and I think that I did something that everyone does which is that I took what were legitimate problems you might say in my life and I extrapolated from them more problems and then I extrapolated more problems and I essentially constructed like a prison in my mind this is sort of kind of a Buddhist way of looking at things and and I think that it's like a very very natural thing to do and it actually happens constantly like every second been and it's very harmful basically in the sense that it's if nothing else that's sort of taking you away from what's actually happening I think discovering meditation showed me that you could dissolve those extrapolations and in fact that life wasn't quite as bad as it seemed that my personal problems weren't as insoluble as I thought and that way more than string what or anything professional has altered the direction of my life in essence I think I do believe that meditation and it's not the only thing but meditation can help bring you like a little bit more in contact with reality I also think probably one of the most important things you can do as a product designer slash really anything or is being contact with reality and it's not as easy as you think or least as I thought to do that and therefore you know that might be a problem there yeah I but it's impossible to know if you if you like continue to meditate through your through running stream litter hasn't become less relevant well you know yes it became less relevant and also my life became worse again it became super depressed and I started again and now I'm feeling less depressed and you know I also started taking and I had to press and I had to press and this is like a good 30 minutes of meditation a day easy it only takes two seconds for me like meditation looks like spending a little bit of time every day like just observing my mind constructs and then destroy and constructing the destroy like infinite problems and solutions and fantasies and taking a little bit of time every day and just remembering that that's a happening and being not actually connected with anything real to me anyway that's it's kind of a joy it's kind of so good to think to remember it's like remembering to enjoy everything else in life that's worth enjoying it's just easy not to do and probably get into that so that is a good self-remeditation to me we always end with two questions which I'll keep a man I'm kind of curious how you'll interpret them and the first one is what is an undrient aspect of machine learning that you think people should pay more attention to? There's a lot of people who are very focused on this idea that Robin is our jobs and the computer are going to make all the decisions and I think that a much more plausible kind of outcome for machine learning as we understand it today is just to massively increase our ability to like measure the world basically you know not just have a camera security camera but actually know how many people are walking by and how fast they're like buying one of their manner of women and cars and you know all the kinds of things and understand what appliances are plugged into your wall and all this sort of kind of thing so I think that like in essence looking back on this time we're going to feel like 2019 2018 we're going to hit sort of like informational bedrock like we didn't know anything that was going on in the world before 2018 relative to the future and I think that perspective which is that it's like we're opening our eyes and seeing what's happening in the world at a totally new level of resolution is actually going to be a much more apt description of what the machine learning revolution brings interesting all right interesting it's it and and to the final question is this is simple it's basically what's the biggest challenges that they make it hard sake machine learning models and deploy them in the real world I think every machine learning tools entrepreneur will tell you that it's whatever their company is doing and I think it's a totally legit of the answer by the way so I suppose you'll tell me it is experiment tracking an hyper-pram would have a certain chance like it's how how would you answer that like even and I think it's legitimate you're clearly solving a huge pain point for people what is the piece that the requires streamlit yeah so I saw this at Google at Google X I saw this at Zix it went over and over again it's the machine learning teams and the data science teams are actually the gatekeeper to this really fascinating and exotic like storehouse or stuff you know like data sets and models and you know predictions of the future and that is actually very difficult for other people to get at it's really already it's often like I'm going to send you an email and just do a one-off exploration in Jupiter notebook and tell me the answer and paste it into a you know PowerPoint presentation like that's a lot of how the rest of the company interacts with the data science team and the machine learning team and that's kind of insane it's so inefficient and so I think that the aspiration that I have for streamlit is that almost as a byproduct of existing workflows the engineers working on those teams are empowered to sort of bring their work directly and inject it into the entire to a company and allow the whole company to make decisions and and predictions and stuff and the thing that they can I think it would have a big big impact and they'd already have started to so awesome well thanks Agent what we first started making these videos we didn't know if anyone would be interested or or want to see them but we made them for fun and we started off by making videos that would teach people and now we get these great interviews with real industry practitioners and I love making this available to the whole world so everyone can watch these things for free the more feedback you give us the better stuff we can produce so please subscribe leave a comment and gauge with us we really appreciate it

120.00541

22.80837

3y ago

1m 7s

Nov 21 '22 16:59

po19kcyb

Finished

Nov 21 '22 16:59

2833.008000

/content/sean-gourley-nlp-national-defense-and-establishing-ground-truth-rwgrq-yoziq.mp3

tiny

[00:00.000 --> 00:07.052] As we need to train machines up, they can help us establish ground truth so that when [00:07.052 --> 00:13.044] new information comes available, we can measure it up against that and say, is this consistent [00:13.044 --> 00:15.016] or is this contradictory? [00:15.016 --> 00:18.032] Now just because it's contradictory to the ground truth doesn't make it false, but it [00:18.032 --> 00:20.064] doesn't mean you want to look closer at it. [00:20.064 --> 00:25.020] And this is kind of I think as we build up the fences, the democracy we need, you know, [00:25.020 --> 00:28.044] and I've talked about this a Manhattan project was established ground truth. [00:28.044 --> 00:32.036] It's going to take a lot of work and a lot of effort, but it's very, very hard to see [00:32.036 --> 00:37.080] a democracy functioning if we can't establish information proven and so if we can't establish [00:37.080 --> 00:41.000] whether information is likely to be part of a manipulative attack. [00:41.000 --> 00:45.024] And if we don't have any infrastructure to kind of lean back on and say, well, here's what [00:45.024 --> 00:49.040] we do know about the world and here's what we do understand with it. [00:49.040 --> 00:54.092] And so this is a big problem, I think, for democracies and we need a way around it, it's an asymmetric [00:54.092 --> 00:57.016] fight, but it's one that we have to win. [00:57.016 --> 01:01.064] You're listening to Grady DeSent, a show where we learn about making machine learning models work [01:01.064 --> 01:04.084] in the real world, I'm your host, Lucas B. Well. [01:04.084 --> 01:10.068] Sean Gurley is the founder and CEO of Primer, a natural language processing startup in San Francisco. [01:10.068 --> 01:17.016] Previously, he was CEO of Quid, a dog meant an intelligence company that he co-founded back in 2009. [01:17.016 --> 01:22.028] And prior to that, he worked on self-repairing nanosearchets at NASA Apes. [01:22.028 --> 01:27.096] Sean also has a PhD in physics from Oxford, whereas research is a road scholar, focus on graph theory, [01:27.096 --> 01:32.028] complex systems, and the mathematical patterns underlying modern war. [01:32.028 --> 01:33.088] I'm super excited to talk to him today. [01:35.096 --> 01:38.092] So Sean, it's great to talk to you. I would really appreciate taking the time. [01:38.092 --> 01:42.020] The first thing I want to ask you, since you're an entrepreneur, and so my is [01:42.020 --> 01:45.040] tell me about your company, Primer. I'm sure you want to talk about it. [01:45.040 --> 01:50.028] We're a company specialized in training machine learning models to understand language, [01:50.028 --> 01:53.080] to replicate different kinds of human tasks that run on top of language, [01:53.080 --> 01:58.004] everything from identifying key bits of information, to summarizing documents, [01:58.004 --> 02:02.044] to extracting relationships between entities for a knowledge graph. [02:02.044 --> 02:06.028] We also do a lot of work on language generation as well, and particularly [02:06.028 --> 02:09.096] affect aware language generation. So we spent a lot of time trying to teach [02:09.096 --> 02:15.000] machines not to hallucinate, which tends to be sort of one of the issues of these transform-based [02:15.000 --> 02:19.040] models. So it's really interesting when you're in this world, the machines that dream and [02:19.040 --> 02:25.016] try to teach them not to, but golf for us is to take human actions on top of text and [02:25.016 --> 02:29.064] automate them at scale so that you can kind of find insights that no individual human would be able [02:29.064 --> 02:33.016] to see by themselves. And we've had a lot of success in doing that over the last few years. [02:33.096 --> 02:39.096] And as your goal to kind of make these like individual tasks available to someone who wanted to use them, [02:39.096 --> 02:45.032] or is it to deliver these insights to a customer? I think the golf for us is ultimately to [02:45.032 --> 02:50.036] buddies tools to the customers so that they can take actions that were done by humans and ultimately [02:50.036 --> 02:55.080] automate them. Now you get the automation, but if you do it at scale, then all of a sudden you [02:55.080 --> 03:00.004] do get these insights that no individual human would have found. What we found though as we've [03:00.004 --> 03:05.008] gone through that is that the internal kind of data science teams within these organizations have said, [03:05.008 --> 03:08.060] look, you know, we'd love to kind of have these different components that you've built. And so [03:08.060 --> 03:13.032] we've also been able to sell the different API components to the users as well. But look, [03:13.032 --> 03:17.064] if the end golf for us is to make this available for users with no technical knowledge and that's [03:17.064 --> 03:23.080] where focusing. And do you have a particular end user or domain that you care about or is this like a [03:23.080 --> 03:29.024] broad-based platform for insights? Yeah, looks like we've been focused on defense from day one. [03:29.024 --> 03:34.092] And my background, my PhD work was in the mathematical dynamics of insurgency and so I spent a lot of [03:34.092 --> 03:40.052] time in the world of intelligence and defense. I think they have a really particularly useful use case. [03:40.052 --> 03:45.088] They spend a lot of time dealing with text-based information. Perhaps more than anyone else in the world. [03:45.088 --> 03:51.000] So if you're an analyst, sitting there inside of a three-letter agency, you're going to be [03:51.000 --> 03:56.044] dealing with hundreds of thousands of text-based documents coming across your feed every day. And I think [03:56.044 --> 04:01.048] there's no surprise to anyone in the industry that that's just not a scalable human task. So we were [04:01.048 --> 04:05.088] able to go into that. I think there's three things that make that really attractive for us. One is the [04:05.088 --> 04:12.052] volume of text. I think the tech in is that any age that you can get as an intelligence or defense operator [04:12.052 --> 04:16.076] or analyst, you're going to want to take that. And then the third thing is, you know, we're seeing [04:16.076 --> 04:21.032] I've really really good defense ability once you're in and deployed and these organizations, there's [04:21.032 --> 04:26.004] two year process to get in there. And so it's a good market to kind of land in the oneship [04:26.004 --> 04:32.084] deployed to technology and got it working. As a state of the art, in natural language processing [04:32.084 --> 04:38.020] changed to enable a company like this. Or is there some specific insight that you felt you had? [04:38.020 --> 04:43.032] How do you think about that? Like this moment for your company? Yeah, so when I started this, [04:43.032 --> 04:47.088] I was sort of 25th day and I was watching as you probably were a lot of my friends and a lot of [04:47.088 --> 04:53.040] our friends would have been playing with neural nets and doing image processing. And I remember Jeremy [04:53.040 --> 04:58.028] Howard, this was showing me some of the stuff he was doing with the caption generation on top of [04:58.028 --> 05:02.028] that images. And I'm ever watching that and seeing the caption generation piece and I was like, [05:02.028 --> 05:06.092] this is going to come to language. These technologies are going to come to language. And so that was [05:06.092 --> 05:12.084] sort of end of 2014, started 15 watching friends through that. And for me, I made a bet and said, [05:12.084 --> 05:18.092] look, we've seen computer vision go from 30% error rates to 5% error rates with these new [05:18.092 --> 05:23.096] neural approaches. And language got like the next logical place that that had happened. I think [05:23.096 --> 05:28.052] by modest, like the first two to three years of the company, we'd the technology hadn't [05:28.052 --> 05:33.040] caught up to the vision. But then we saw transformable based models emerge and that's just been a game [05:33.040 --> 05:38.044] changer. And what that's meant for customers is it's meant that these are actually, you know, [05:38.044 --> 05:44.060] trainable, which means they can be customizable, which means that you can actually start to deploy them [05:44.060 --> 05:49.056] to pretty diverse set of use cases. So you mean like fine tuned or something on their own [05:49.056 --> 05:54.020] data sets? Yeah, so instead of having a kind of train with hundreds of thousands of documents and [05:54.020 --> 05:58.060] data points and training examples, you know, you can start with a model that's going to pretty good in [05:58.060 --> 06:03.056] betting structure from reading kind of general information. And then you can retrain that obviously [06:03.056 --> 06:08.060] on a fractured of the information that would otherwise have been required. So I think that's probably [06:08.060 --> 06:13.048] the single biggest thing. And that allows users to engage with this technology, I think with the [06:13.048 --> 06:16.076] we'll talk about, you know, what's your return on investment for the time you want to take the [06:16.076 --> 06:20.084] train this or to get a payoff. And that's come down, you know, significantly with these models. [06:22.004 --> 06:28.076] You do a wide range of kind of traditional NLP use cases, which ones have you seen the the biggest [06:28.076 --> 06:32.092] change and maybe which ones have you still kind of not seen the improvement from this new technology? [06:33.064 --> 06:37.048] Yeah, it's a good question. And we started language generation. It was sort of, you know, [06:38.004 --> 06:43.040] recursive neural nets and velocity ms and you couldn't really generate a sentence with any kind of [06:43.040 --> 06:48.012] credible output, right? So like the idea of even kind of like doing a multi-paragraph summary of [06:48.012 --> 06:53.016] a document was just, you know, science fiction. So this stuff that has technologies enabled that you just [06:53.016 --> 06:59.040] couldn't have done. I think the second bit here is the idea of training a model with a few [06:59.040 --> 07:05.048] dozen examples to pick up a relationship extraction between two entities. Again, that was a scientific [07:05.048 --> 07:11.032] paper that you had to write. So like this stuff that's this is enabled that just wasn't even within [07:11.032 --> 07:17.080] the realm. I think, you know, weight weight is just come where it hasn't had its big an impact. I think [07:17.080 --> 07:22.036] it's really, I mean, limited by the training data that you're so willing to throw it at. And, you know, [07:22.036 --> 07:27.048] perhaps there are tasks and NLP that this wouldn't be appropriate for, but we honestly haven't seen [07:27.048 --> 07:33.040] everything that we've given the training data for these models that have performed in a good way. [07:33.040 --> 07:40.092] I think they make errors that the older NLP models don't make, but they make less errors. So you're [07:40.092 --> 07:47.016] going to take that every time. And, you know, your name primer is a bucket of at least to me of kind of [07:47.016 --> 07:54.036] summarization. Is that, am I correct in making that connection? It's actually, it comes from inspiration [07:54.036 --> 07:59.040] in the El Stevenson's book, The Diamond Age, if you're a science fiction fan, the subtitle of that [07:59.040 --> 08:06.068] is a young lady's illustrated primer. And in that book, the protagonist has a nanotechnology, [08:06.068 --> 08:13.088] which creates a nanotechnology book that is designed to educate the world. And, of course, the [08:13.088 --> 08:18.036] world that's falling the bucket. It kind of falls into the hands of manipulation versus education. [08:18.036 --> 08:23.064] Which I think is as a wonderful kind of theme. And so, you know, obviously underneath that is this idea [08:23.064 --> 08:27.016] that if you could have a self-riding book that could educate us about the world, you know, [08:27.016 --> 08:31.016] would be an a science fiction world and would be able to kind of do fascinating things with that. [08:31.016 --> 08:36.060] And so, for us as a guiding principle, how do we train machines to observe the world, [08:36.060 --> 08:41.000] and teach us about what that's saying so that we can be smarter about the world that we're living in? [08:42.052 --> 08:47.048] So I guess there's some connection, maybe not directly. I guess I was feeling impressed that, [08:47.048 --> 08:51.032] you know, I feel like summarization or text generation, like you said, has been kind of the most [08:51.032 --> 08:56.068] interesting, maybe most impressive use of these, you know, the new kind of transformer technology. [08:56.068 --> 09:01.016] And I was wondering if you sort of felt that was coming or, you know, if you were surprised by it. [09:01.016 --> 09:05.008] Well, my thing always at the start was, you know, we're going to build a self-riding [09:05.008 --> 09:09.032] Wikipedia. And that was going to ultimately be something that this was going to enable. [09:10.028 --> 09:14.092] We were a long way away in 2015 from that technology even kind of existing. And so, you know, [09:14.092 --> 09:19.064] it was a bit on this becoming available and turns out it's been a good bit. So I'll take the [09:19.064 --> 09:24.060] wind on being a bit bright, but I don't know if I had the other right, you know, information. So maybe I'm [09:24.060 --> 09:31.064] just lucky, but we'll take it. And I was kind of curious, you're one of the few people like me, [09:31.064 --> 09:37.048] kind of like a second time founder doing something in sort of a similar space as your first company, [09:37.048 --> 09:41.088] like I am, too. I'm curious if that kind of shaped your views, your new company, like kind of what you [09:41.088 --> 09:46.052] were sort of thinking, you know, maybe doing differently and what, like, you wanted to keep from [09:46.052 --> 09:53.064] from your last company. Yeah, look, I think it's kind of like you always sort of joke like, [09:53.064 --> 09:57.080] you know, when you're right, you're first novel is sort of the easiest because it's a sort of a [09:57.080 --> 10:03.040] collection of it, or your experience is up to that point. Your second novel is, it has to be something new. [10:03.040 --> 10:09.000] It's a kind of carry that analogy on, your first novel's kind of biography. So I think in your first [10:09.000 --> 10:13.040] company, you know, for me anyway, it was that idea you'd always had the back of your head that you wanted [10:13.040 --> 10:19.016] to make real. I think in your second company and it's been true for me, I've become more grounded in [10:19.016 --> 10:25.064] the commercial realities of what's actually going to sell, what's going to scale, how big the [10:25.064 --> 10:30.084] opportunity is, what are the kind of the mega trends that are unfolding and you've been very conscious [10:30.084 --> 10:36.044] of wanting to catch those waves and having a kind of a large commercial market to go after having [10:36.044 --> 10:40.060] defense ability and the space that you're in becomes really important. But I think overall the [10:40.060 --> 10:45.024] biggest thing is just operationally. I think when you're creating your first company, you don't [10:45.024 --> 10:50.044] really know what it's like to scale an organisation. And I think until anyone's been through that, [10:50.044 --> 10:54.060] you don't really have that idea. I think what you've done at the second time, you know, [10:54.060 --> 10:58.092] there's a lot of familiar signposts along the way where you're like, oh, this is what happens at this [10:58.092 --> 11:03.096] time and that's fine and this is what happens at that time and that's fine. Whereas I think the first time [11:03.096 --> 11:11.080] you see it, you sort of like, oh my god, is this the end? Is this what winning looks like? And the [11:11.080 --> 11:16.084] second time you do it, you're like, no, I've got a few more data points and just having seen [11:16.084 --> 11:20.020] something once before is night and day versus seeing it the first time. [11:22.004 --> 11:27.040] Yeah, I guess I can I can relate to that. I'm curious too. I don't know if you think of yourself [11:27.040 --> 11:30.092] this way, but like, you know, when you look at your background, it sort of feels like a data scientist, [11:30.092 --> 11:35.032] right? Like you have a patient physics, I think, right? And then, yeah, it's really, you know, [11:35.032 --> 11:40.052] it's really interesting kind of data stuff we could talk about on mathematics and war. I think [11:40.052 --> 11:46.020] but do you think, I don't meet a lot of other data scientists that run companies? Do you think that [11:46.020 --> 11:53.000] that bent like informs your leadership style? It's funny, I probably hanging out with other data [11:53.000 --> 11:58.092] scientists that run companies to my, I think me and Mike Driscoll, we were, we were, we were [11:58.092 --> 12:03.072] tied to a Pete Scummererach and we did a kind of console ourselves with a data scientist found that [12:03.072 --> 12:08.068] therapy sessions. So, are you probably right there and a balance is probably not a lot of us? [12:08.068 --> 12:13.008] I think there's a few things that come through as a data scientist. You know, one is I think you [12:13.008 --> 12:19.024] have an appreciation of the algorithms and I think the single biggest thing that I've seen is when it comes [12:19.024 --> 12:23.008] to kind of product design, you're designing products that have algorithms at their heart. [12:23.064 --> 12:28.020] It's not algorithms to optimize a product experience that the product is the algorithm. [12:28.020 --> 12:32.012] The algorithm is the product and I think data appreciation is really, really important [12:32.012 --> 12:37.000] and when it comes to kind of the side here of building a product and what a product market fit mains [12:37.000 --> 12:41.048] and all of that. And it's not a direct translation from sort of the old world where you're [12:41.048 --> 12:46.036] designing products that don't have algorithms at their heart. So I think I think that's one piece of it. [12:46.036 --> 12:51.000] I think a second bit is that, you know, the reality is as you're growing this, these organizations, [12:51.000 --> 12:55.040] you're never going to have all the data you need at the start. And so like if you're in a bigger [12:55.040 --> 13:00.028] organization, I tell you a lot of friends that have come from LinkedIn and so on, you've got data that [13:00.028 --> 13:04.060] you can optimize, you can run A/B tests on, you can do all of that. You know, when no one's [13:04.060 --> 13:10.036] using your product because you're trying to get the algorithms to work, you don't have the [13:10.036 --> 13:17.008] traditional kind of data science methodology, it's not that useful for you. So that's definitely a frustrating [13:17.008 --> 13:23.016] piece, you know, you can't lean on that. So I think on the upside, you you understand the algorithms [13:23.016 --> 13:27.048] but on the downside, you don't really have data to make decisions on. It's probably a better [13:27.048 --> 13:34.068] both worlds but I get to say it would be tough to be CEO and founder of a company if you didn't [13:34.068 --> 13:39.016] have a good grasp of these kinds of technology. So it's a pretty steep learning curve. So I definitely [13:39.016 --> 13:45.024] wouldn't try the background on images. It's funny. I think myself, I wonder if I maybe less [13:45.024 --> 13:50.060] data driven in some ways than other CEOs that don't come from a data background because it [13:50.060 --> 13:56.028] I feel like sometimes people use data as almost like a wedge to reinforce their confirmation bias. [13:56.028 --> 14:00.012] And I think as a data scientist, or at least for me, I feel like I maybe a little more skeptical [14:01.000 --> 14:06.060] of the data because I work at it so much, which I think sometimes makes me maybe in some [14:06.060 --> 14:09.048] realms less than a driven. I wonder if you identify that with that at all. [14:10.028 --> 14:14.020] Yeah, there's always skepticism of the questions always where you get the data from and at the [14:14.020 --> 14:19.000] moment when I'm immediately goes to kind of what's wrong with the data. And that side of it I think is right. [14:19.000 --> 14:25.064] I think in this, there's a lot more gut instinct than I think anyone kind of appreciates. I don't [14:25.064 --> 14:31.088] think you can run a deep tech emerging company from data. It just you just as you data [14:31.088 --> 14:37.048] a retarded decision framework is probably not right. I think where I spent a lot of time is in this kind of [14:37.048 --> 14:43.096] like space between the scientific publishing and commercialization. I think perhaps more than anything [14:43.096 --> 14:50.004] having a PhD and it being familiar with how science evolves allows you to sort of make these bets on [14:50.004 --> 14:56.060] scientific breakthroughs that may be seen risky to the outside of it when you're following it and you [14:56.060 --> 15:01.024] know what that trajectory of a kind of an emerging scientific breakthrough feels like. You can kind of [15:01.024 --> 15:07.088] put your chips behind that, place a bed on it and you know, in 12 months, 18 months and you can cash in on that. [15:07.088 --> 15:12.076] And I think perhaps more than anything that the benefit of a PhD in some of life physics is a [15:12.076 --> 15:18.060] familiarity with science and a familiarity with the scientific process and translating that into a [15:18.060 --> 15:24.012] set of strategic bets that we can make as a CEO to position your company to best have upside with [15:24.012 --> 15:29.000] what's going to unfold and I went back as you're saying here, maybe I'm lucky. The other way to look [15:29.000 --> 15:33.024] at it, for example, generous like is I just had a really good grasp of where the field was going and [15:33.024 --> 15:38.052] maybe I can climb some success on that. But that's the bed here is familiarity with science and I think [15:39.016 --> 15:43.056] as we've seen here, you got one hand on archive, one hand on your email and between the two of [15:43.056 --> 15:48.068] those, you're probably staring the company. Interesting. So where do you try to put your algorithms? [15:48.068 --> 15:54.076] Are you trying to push the very state of the art in terms of things like architecture or are you [15:54.076 --> 15:59.072] sort of like intentionally drawing from research and most of using results that you find? [16:00.044 --> 16:05.016] Well, so it's interesting. There's two things. So one is research for sure, right? Like if you've [16:05.016 --> 16:10.020] got breakthroughs and these aren't always the obvious ones, but absolutely right. Like science unfolds, [16:10.020 --> 16:15.032] you want to take that learning and commercialize it. Now the commercializing of science can [16:15.032 --> 16:20.020] everything from making it cost efficient to run through to kind of training it on the right data [16:20.020 --> 16:26.092] through to kind of understanding how to kind of correct for the 15% false positives that pop up, [16:26.092 --> 16:32.036] which you can't do in a kind of mathematically elegant way and it becomes a set of kind of rule-based [16:32.036 --> 16:37.088] kind of corrections at the edge. So that all of that kind of is part of commercialization. But the [16:37.088 --> 16:45.000] other side of it is there's a whole bunch of stuff that just doesn't fit the scientific publishing paradigm. [16:45.000 --> 16:50.044] And a lot of language generation doesn't fit the scientific publishing paradigm because all of [16:50.044 --> 16:56.076] got to the gut blue and rouge and these are useless with regards to kind of any customer experience of [16:56.076 --> 17:02.044] language generation. So in order to evaluate the quality of your language output, you've literally got [17:02.044 --> 17:07.000] to put humans on top of this and kind of have them evaluate everything that you're doing, [17:07.000 --> 17:13.032] which is incredibly expensive and it hasn't been part of the scientific paradigm. So there's very [17:13.032 --> 17:20.076] little kind of publishing on language generation, I think largely because the ability to kind of like [17:20.076 --> 17:25.056] get a decent F score is really really hard and you can probably go through a whole bunch of language [17:25.056 --> 17:31.048] processing tasks that just don't have a decent F score measure or have a difficult F score measure [17:31.048 --> 17:38.020] and such and such don't have a really an active scientific space. So it's been interesting kind of [17:38.020 --> 17:42.084] tracking that through and I think the other thing here is science is still some of the best [17:42.084 --> 17:48.076] inspiration, right? And in terms of like it can just sort of spark an idea and you're like, wow that's [17:48.076 --> 17:55.024] a super cool attempt and that side of science is pretty valuable too. Well you know we're sitting here in [17:55.024 --> 18:00.068] August 2020 talking about you know text generation. So I have to ask you what you make of GPT 3 [18:00.068 --> 18:06.084] right that obviously came out and people seemed very impressed how impressed slash surprised were you [18:06.084 --> 18:15.000] by it's performance. I think the GPT 2 was the bigger jump right I think with GPT 2 came along [18:15.000 --> 18:21.000] it was like wow these transformers scale and they scale really well right you know it's funny [18:21.000 --> 18:26.020] that was exactly my reaction I didn't want to buy as the question but I totally totally agree [18:26.020 --> 18:32.044] just prior to that language generation you know viral STM and that was was pretty was pretty bad [18:32.044 --> 18:38.028] like you couldn't you could make a sentence but you couldn't string two sentences together and so so that [18:38.028 --> 18:44.020] was the first thing was GPT 2 was like wow now what were GPT 3 came and I think it's useful you know [18:44.020 --> 18:50.036] was like oh it keeps scaling right like it doesn't seem to like have a finite kind of like [18:50.036 --> 18:56.028] scaling effect at this sort of level of parameter space. So that's useful right but for me the big [18:56.028 --> 19:02.052] jump was GPT 2 now what we found on that and you can take a different set of transformers you can [19:02.052 --> 19:08.060] take excel net or bar or bird or you know whatever you want but what you found is as the [19:08.060 --> 19:14.020] other party trick is language generation I think the true value of that is the trend ability of these models [19:14.020 --> 19:20.012] is that you can train them to do tasks that are sort of the traditional NLP tasks that you can train them [19:20.012 --> 19:26.092] with a lot less data and it's super impressive to kind of see language generation but attempts in the value [19:26.092 --> 19:32.012] for our customers it's basically saying oh with 20 training examples you can build this thing that [19:32.012 --> 19:39.000] with 95 plus you know precision and 90 plus percent recall what automate your human task every time. [19:39.064 --> 19:46.004] So I think the true commercial value of this is the retrain ability the party trick is the [19:46.004 --> 19:50.060] language generation although if you put on your hack and we maybe will get to that later of disinformation [19:50.060 --> 19:55.000] and manipulation and there's definitely a whole industry that's going to spawn up around [19:55.000 --> 20:01.008] language generation but what gets to that later maybe. Well maybe should move that direction [20:01.008 --> 20:07.056] but I'm kind of curious about how you do you ever thought on why GPT 3 like captured people's imagination [20:08.052 --> 20:13.000] so thoroughly? It's funny it was one of those ones I sort of what we saw the [20:13.000 --> 20:17.088] paper get published went through it and the thing that captured me was was the Fuchshot learning which [20:17.088 --> 20:22.052] was super interesting and I think it got undeplayed in the paper right the Fuchshot learnings was probably [20:22.052 --> 20:28.012] the most I think impressive piece of that work then I woke up like a month after the paper was published [20:28.012 --> 20:33.096] and then all of a sudden entire VC Twitter was like going like bananas for GPT 3 and I sort of had [20:33.096 --> 20:39.040] that moment I was like what's going on here and I just sort of scratched my head I think [20:39.096 --> 20:46.060] open AI has you know done one thing incredibly well we don't probably appreciate that the marketing [20:46.060 --> 20:53.040] that they do is par excellence for you know the world of AI right it it really is impressive and how [20:53.040 --> 21:01.008] they rolled out that release I think of GPT 3 versus kind of the GPT 2 kind of it's two dangerous don't touch it [21:01.008 --> 21:06.068] GPT 3 was like come and play with it if you're special and it was a perfect influence a campaign [21:06.068 --> 21:11.056] that was run beautifully you know and it's up there with the influence a campaign to fire festival and [21:11.056 --> 21:18.092] and I'm being nice but now I feel like maybe you're not I can talk you admire it I'm [21:18.092 --> 21:23.056] of you yeah but that was a wonderful influence a campaign they sent everything to back it up with [21:23.056 --> 21:28.036] I actually think there's a lot more there but in terms of the campaign they did it was was wonderful [21:28.036 --> 21:35.008] I think that that captured sort of the mind of VC Twitter I think the bit that people miss on this [21:35.008 --> 21:41.048] is it matters what training data you've given to these machines and it matters a lot more than you think [21:42.052 --> 21:46.084] right and that's the bit that everyone sort of misses they out of the box we can use this with a [21:46.084 --> 21:53.096] few examples that it learns people talk about austerity or they talk about priming that the system what [21:53.096 --> 21:58.060] you're trying to do is correct for this sort of the somewhat random nature of the training data [21:59.024 --> 22:04.012] and it's a really bad way to steer a model where you don't know what it's been trained on and you're [22:04.012 --> 22:09.024] trying to give it kind of hints in order to keep it away from being racist and you know you don't know [22:09.024 --> 22:15.024] what it's read it kind of feels like just the blind kind of like you know exploration so I think the [22:15.024 --> 22:20.004] learning out of all this is this training data matters and the other bit I think here is that Twitter [22:20.004 --> 22:27.000] is a wonderful medium for displaying outputs of models that have 30% precision because you don't see [22:27.000 --> 22:31.096] the other 70% were at most and I think that's the other piece here is that you know if you look at [22:31.096 --> 22:37.080] 10 cherry-picked examples of these outputs you know you're going to see some great results but as we know [22:37.080 --> 22:44.020] in the commercial world for most applications you know human kind of precision is plus 90% and if you don't [22:44.020 --> 22:50.044] have plus 90% on your task is very very difficult to commercialize it and so I think that the race [22:50.044 --> 22:57.032] as we look at NLP tasks is always you know the race to a 95% precision and that kind of is human comparable [22:59.080 --> 23:05.048] well so you've you've touched on kind of AI and and safety a couple times in the last few minutes and you [23:05.048 --> 23:10.028] you also kind of operate in a world that I think is considered a gray area to a lot of AI researchers [23:10.028 --> 23:15.072] right to you know defense or military applications of curious what you think about generally about [23:15.072 --> 23:21.032] especially natural language models and and safety and you know what should be done how how worried [23:21.032 --> 23:27.040] you think people should be about misuse these models and and like what role you you think you should play [23:27.040 --> 23:32.084] as sort of like a leading company in the space well I think for a first and foremost if we want to be a global [23:32.084 --> 23:39.032] superpower as a america you have to have defense you have to have intelligence you may not want to have [23:39.032 --> 23:44.052] them but then you don't get to be the global superpower so that's the first thing that kind of just [23:44.052 --> 23:49.048] exact is that defense and intelligence a part and parcel of being a global superpower it's also [23:49.048 --> 23:54.044] a part and parcel of defending liberal Western democracy and there are plenty of other organisations [23:54.044 --> 23:59.096] and governments in the world that don't want that to exist so we need we need that as you come back from [23:59.096 --> 24:04.068] that the second thing is say well we want it but we want it to be good right and so you say well [24:04.068 --> 24:09.024] if I wanted to be good well we need to bring artificial intelligence and the latest technologies [24:09.024 --> 24:14.092] that we're developing to bear on that problem space it's sort of a strange philosophical ground [24:14.092 --> 24:21.000] say well we need to have defense but it shouldn't be good right it's a transmission now as you go [24:21.000 --> 24:26.060] through that you say well yeah they're also ethical concerns and moral concerns there are very [24:26.060 --> 24:32.028] very few organisations in the world that think more deeply about the ethical and moral implications [24:32.028 --> 24:39.080] of war than defense and intelligence they live and breed this stuff and we can kind of sit here [24:39.080 --> 24:44.060] and I'm back quarterback from from the valley but the reality is this is something that has [24:44.060 --> 24:48.076] been thought very deeply about it has a lot of care and that kind of rules have been [24:48.076 --> 24:55.032] engagement very very well defined very well thought through and have been shaped and kind of constructed [24:55.032 --> 25:01.072] of a many many years now a lot of them have an imagined what AI does in that but there's also [25:01.072 --> 25:06.092] been a huge amount of work you know going back to me over the last decade with with defense with [25:06.092 --> 25:12.060] intelligence talking about these exact scenarios of what it means to have artificial intelligence [25:12.060 --> 25:20.004] engaging in this kind of process so for me here you know bringing this technology to bear and defense [25:20.004 --> 25:25.032] and intelligence is something that I think is the right thing to do and it's a very very important [25:25.032 --> 25:30.076] mission for myself and for our company as we do that we also realize we've got a responsibility [25:30.076 --> 25:36.060] that it matters if we're generating models that classify things that are unfolding in the world [25:36.060 --> 25:42.020] and same look up we identify it in an event and if you misclassify that that intelligence is now [25:42.020 --> 25:47.024] percolating up a chain which is going to have consequences right so they're very real consequences [25:47.024 --> 25:50.084] when you talk about the precision of the models that you're working with they're very real [25:50.084 --> 25:56.020] consequences when you talk about what the data's been trained on what the susceptibility of the [25:56.020 --> 26:01.088] models that you've got out outside adversarial attacks so all of this becomes something that you need to [26:01.088 --> 26:08.036] kind of work with and deal with I think the sort of the ethical you know components of this [26:08.036 --> 26:13.072] woven into the decisions that we make and you know it's something that's also moving I think pretty quickly [26:13.072 --> 26:18.084] and there's one thing you learn in science and technology is that science and technology moves a [26:18.084 --> 26:23.096] whole lot faster than the philosophical and ideological kind of foundations on which you can kind of [26:23.096 --> 26:30.044] make decisions on top open so you are by nature going to be in grazones and you know this is this is [26:30.044 --> 26:34.060] something you've got to be kind of open to and say look we're going to navigate where perhaps [26:34.060 --> 26:40.076] no one's ever thought about this before and there isn't kind of a a strong kind of rule that you can fall [26:40.076 --> 26:45.040] back to and say hey this is the answer this is what you're supposed to do in the situation because the [26:45.040 --> 26:52.036] situation is never existed before so you know it's it's something that we spend a lot of time and with [26:52.036 --> 26:57.040] both ourselves and also our advisors spending time each and every week going through this stuff [26:57.040 --> 27:01.088] making decisions and trying to kind of navigate the best part that we can through this but [27:01.088 --> 27:05.080] I think it'd be a lie to say that this is really easy and this is clear black and white kind of [27:05.080 --> 27:11.048] distinctions because we're dealing with stuff that simply didn't exist in the world before but we're also [27:11.048 --> 27:16.084] dealing on the geopolitical scale with stuff that simply didn't exist in the world before and do you think [27:16.084 --> 27:28.092] like at this moment in time August 2020 do you think that the that for governments natural language [27:28.092 --> 27:34.052] processing like machine learning is it is an important part of their defense capability? [27:35.072 --> 27:40.004] Yeah I think there's three places where it comes through you know the first is on the intelligent side [27:40.004 --> 27:44.052] there is there's too much information coming in and simply put if you don't have machines [27:44.052 --> 27:49.032] playing some role and helping you navigate that information you're going to have information that no [27:49.032 --> 27:53.048] one ever sees and if you don't see information you can't bring it to bear on decisions that you're [27:53.048 --> 27:59.064] making so step one the volume of information requires a natural language toolkit to actually help navigate [28:00.076 --> 28:05.072] the second thing here is is that the complexity of the world that we're in means that you know drawing [28:05.072 --> 28:11.000] inferences between something that's happening and Russia and something that's happening and you know [28:11.000 --> 28:17.000] East Africa is very very difficult for an individual that has to specialize in an East African [28:17.000 --> 28:21.048] specialist or I'm a Russian specialist machines don't have that limitation right they can look [28:21.048 --> 28:26.036] further they can look wider they can draw inference across larger set of data points because they're not [28:26.036 --> 28:30.060] fundamentally constrained by the bandwidth and information they can consume so I think as we move [28:30.060 --> 28:36.020] to a more complex world it's essential to have machines that can make connections across domains [28:36.020 --> 28:40.084] their humans aren't necessarily looking at the third thing is and this is sort of I think become [28:40.084 --> 28:46.004] increasingly important is that more and more information has been generated by machines and that's [28:46.004 --> 28:50.052] been used to manipulate and if you've got humans that are trying to filter through the output [28:50.052 --> 28:56.012] of propaganda from China that's being machine generated you've got a knife to a gunfight you're going to [28:56.012 --> 29:01.056] lose that and so as we look at things like you know the operations out of Pacific command you know there's [29:01.056 --> 29:07.056] a huge volume of information now that China's got its head around disinformation and manipulation [29:07.056 --> 29:13.000] you can't navigate this as a set of humans it's just not possible and if you try and do that [29:13.000 --> 29:17.096] you're going to lose so I think the disinformation landscape is necessitated as set of machines [29:17.096 --> 29:23.000] that need to come into this so can you be more concrete about the disinformation like [29:23.088 --> 29:28.084] it should be imagining sort of like you know Facebook bots or what what's it's actually so it's [29:28.084 --> 29:35.008] actually evolved a lot like so our standard kind of thing was was Facebook bots you know back in 2016 [29:35.064 --> 29:40.060] what you've got now is a manipulation ecosystem so it's everything from state broadcasting so if you're [29:40.060 --> 29:46.060] in the sort of the Russian example you've got Russia today and you know that sort of state broadcasting [29:46.060 --> 29:52.020] you've got state supported broadcasting so things like sputnik and Russia then you've got kind of [29:52.020 --> 29:58.012] fringe publications which is supported they these can be kind of fringe versions of you know [29:58.012 --> 30:02.060] helping to post but it would be a fringe version of that where anyone can kind of submit then you've [30:02.060 --> 30:07.080] got social media and then you've got sort of cyber-emabled hacking right where you may [30:08.060 --> 30:13.056] falsely release a set of emails that have been doctored so all of these components make up the [30:13.056 --> 30:19.088] sort of the ecosystem of information manipulation and they actually may are together so you can hack a set of emails [30:20.028 --> 30:27.032] falsify emails spread them out have them found on social media have it you know amplified by [30:28.020 --> 30:34.060] a third party fringe voice on a user submitted site like having been post but not having been post probably [30:34.060 --> 30:39.048] you can have it kind of re-blockcast through sputnik and then end up on our tea and then be connected [30:39.048 --> 30:46.044] back into Fox News right so that cycle allows layering of information to come where you don't know the original [30:46.044 --> 30:52.084] source of it you may not be aware of how it came to be and you may be hit with the information [30:52.084 --> 30:59.048] from three different angles that makes it feel like it's a lot more kind of authentic and you [30:59.048 --> 31:03.040] can do this with fake information you can also do it with information there's actually real [31:03.040 --> 31:08.036] but perhaps as important as it should be so maybe there's a shooting that happens [31:08.036 --> 31:14.012] you know which becomes front-centre news when a reality is it was just a local shooting and if it had [31:14.012 --> 31:20.012] been amplified it never would have been on the radar so you're not just in the world of it's this [31:20.012 --> 31:26.084] real or is it fake it's actually who's a gender is being pushed and what organism is actually [31:26.084 --> 31:33.016] pushing the agenda and this is kind of where I think we're sitting now is actually a very complex [31:33.016 --> 31:40.004] this information ecosystem designed to manipulate how does machine learning enable that though because [31:40.004 --> 31:44.076] all those examples you said I could picture that being done with just human beings you know [31:44.076 --> 31:50.044] motivated human beings doing a lot of a taping I guess is that more really changed this yeah so I [31:50.044 --> 31:55.016] think I think state of the art of the moment is as humans at the internet research agency sitting down [31:55.016 --> 31:59.000] and you know what from what we know they have a set of objectives they have to hit [31:59.096 --> 32:06.036] they have a sort of a scoreboard of topics any to cover every day and then they get rewarded based on [32:06.036 --> 32:13.024] the performance it's all very manual I think what we're looking at is it generates on order of [32:13.024 --> 32:20.084] 18 months 24 months for an emerging technology becomes sort of weaponized right so we're not seeing [32:20.084 --> 32:29.000] yet the weaponization of language generation we had just started to see the weaponization of image [32:29.000 --> 32:37.008] generation for fake images and fake profiles we haven't seen the weaponization of yet really although [32:37.008 --> 32:43.088] we should expect it soon of a video generation so languages as language generations a lot newer [32:43.088 --> 32:48.084] I think we're probably two years away from seeing that but there's obviously a very very clear [32:48.084 --> 32:55.048] path that if you can generate you know all sorts of anti-vaccination articles that target to [32:55.048 --> 33:00.004] different demographics and you can do that by the scale of millions you're going to get some really [33:00.004 --> 33:05.072] really persuasive arguments that are able to be captured and propagated so whilst it has an unfolded yet [33:05.072 --> 33:13.048] because the technology is new I think it's very very clear that this is a weapon that if you [33:13.048 --> 33:18.028] were going to take this on that is absolutely something you'd want to kind of have in your disposal [33:18.028 --> 33:22.044] and so I think that's one piece but I think the second but it gets back to more of the traditional [33:22.044 --> 33:27.088] data science which is you know AB testing on the scale of millions and you know whilst you can't really [33:27.088 --> 33:33.048] do that when humans are typing the stuff out once machines are producing it you absolutely can so I [33:33.048 --> 33:38.052] think that gets you into kind of get you into a world where this is going to be a lot more to hear [33:38.052 --> 33:44.012] it the other bit that I'd flag going back to science is one of the most fascinating kind of areas of [33:44.012 --> 33:48.020] scientific research of the moment has been for me anyway there's been a opinion formation and crowd [33:48.020 --> 33:52.084] dynamics right this has got roots and kind of a little bit of an epidemiology it's got roots and a [33:52.084 --> 33:58.044] little bit of stock market trading it's got roots obviously and the world of idea formation and [33:58.044 --> 34:04.084] diffusion of ideas but this is an area where we're actually seeing that crowds can actually be very [34:04.084 --> 34:10.060] manipulable that research is happening it's going on once you couple these other technologies into [34:10.060 --> 34:16.012] that I think we're going to start to see that you can move and manipulate large groups of people [34:16.012 --> 34:20.076] through the information they're exposed to and at that point you've got a fundamental issue with democracy [34:21.040 --> 34:27.008] right and this is why such a big issue right we are based as a society on the free and open debate [34:27.008 --> 34:33.064] and sharing of ideas to come to consensus in a democratic process to alert governance for us once we [34:33.064 --> 34:43.000] lose faith in that democracy dies and there's a very very clear vector of attack with manipulation [34:43.000 --> 34:49.088] by machines and so we need defenses against that and it's coming and you know the defense and [34:49.088 --> 34:54.028] intelligence sector is realized that we're working very closely with them to help with that defense [34:55.000 --> 34:59.096] could you sketch out what a defense to that might look like because it doesn't seem obvious there's [34:59.096 --> 35:05.056] a way to kind of prevent people from creating very persuasive content in fact you might argue that's [35:05.056 --> 35:11.000] that's happening right now yeah so so I think that's right look so one of the things to [35:11.000 --> 35:16.044] recognize is this is an asymmetry right so with any asymmetrical conflict you know one side has the [35:16.044 --> 35:21.048] advantage over the other and you know I sort of draw the example with you know image generation if you [35:21.048 --> 35:26.028] generate a face of an ever person you've got two options right if you want to know if that's real [35:26.028 --> 35:30.092] you can go and check every single person in the world and see if it's there and if you get through [35:30.092 --> 35:35.064] everyone you don't find them then it's fake so obviously it's easy to generate an image in [35:35.064 --> 35:39.048] it is to determine if it's if it's fake or not now of course as you go through that there are [35:39.048 --> 35:44.020] signs and tell-tale signs right they're little too blurry the ears are asymmetrical but teeth don't [35:44.020 --> 35:49.024] quite line up right so then people kind of figure that out and then they generate a new [35:49.080 --> 35:53.064] image and then the old techniques for identifying it aren't working anymore and now you're in a [35:53.064 --> 35:59.024] cycle of effectively what we've seen in cyber security which is things like zero day attacks right where [35:59.024 --> 36:04.076] you get a new model that hasn't been shown before and this statistical signatures that model aren't [36:04.076 --> 36:10.068] known to the defense systems so it's a game of detection and deception right can I just see the algorithms [36:10.068 --> 36:16.020] that are designed to detect with the dysreal or not or can I actually detect it and kind of like stop it [36:16.020 --> 36:20.076] so that's one side of it now that's an image but if you go into language you know obviously there are [36:20.076 --> 36:25.032] signals in here and one of the ones that we spot and look at is the zip distribution so if you look at [36:25.032 --> 36:30.020] language there's a zip distribution which is you know a relative frequency of words that we use [36:30.020 --> 36:34.052] and each author has a kind of a statistical signature of language and machines have [36:34.052 --> 36:38.084] a statistical signature of language and so you can spot them but if you generate a new model then the old [36:38.084 --> 36:44.012] methods of detecting it aren't necessarily there so you've got the whole kind of detection of deception [36:44.012 --> 36:49.064] is this is been generated by machine or not but on the other side of it you've also got you know things [36:49.064 --> 36:55.032] like claims that are being made so if a claim is being made that 5G causes coronavirus you can actually [36:55.032 --> 37:00.052] train that you know trace that claim backwards where did it first originate how did it propagate [37:00.052 --> 37:06.012] and so it's not so much as the language real or fake but has it been propagated by grassroots or has it [37:06.012 --> 37:12.044] been propagated through the network via actors that are intentional about that now to do that you've [37:12.044 --> 37:18.012] got a classified relationship between 5G and coronavirus and as you look at that there's also to ways [37:18.012 --> 37:22.060] to say that it's you know it's caused by it's a result of and so now it's a kind of a relationship [37:22.060 --> 37:27.072] classifier and so you can do that we deploy that technology looking at relationships for claim [37:27.072 --> 37:33.080] extraction you know propagating their backwards but we also look for things that counterback claim right so [37:33.080 --> 37:39.016] you know 5G is not caused by or so coronavirus is not caused by 5G or coronavirus you know it was likely [37:39.016 --> 37:44.028] you know caused by an infection ever bad into a wet market right so these are the claims [37:44.028 --> 37:48.084] revolts with each other so but ultimately the dynamic is how do you get a ground truth [37:49.080 --> 37:54.028] right how do you get a ground truth and I think if we're looking at kind of a long term kind of [37:54.028 --> 38:00.012] game on this as we need to train machines up they can help us establish ground truth so that when [38:00.012 --> 38:06.052] new information comes available we can measure it up against that and say is this consistent or is this [38:06.052 --> 38:11.008] contradictory now just because it's contradictory to the ground truth doesn't make it false but it does [38:11.008 --> 38:17.048] mean you want to look closer at it and this is kind of I think as we build up defenses the democracy we need [38:17.048 --> 38:21.024] you know and I've talked about this a Manhattan project to establish ground truth it's going to [38:21.024 --> 38:27.024] take a lot of work and a lot of effort but it's very very hard to see a democracy functioning if we can't [38:27.024 --> 38:32.028] establish information proven and so if we can't establish whether information is likely to be part of [38:32.028 --> 38:37.072] a manipulative attack and if we don't have any infrastructure to kind of lean back on and say well here's [38:37.072 --> 38:43.088] what we do know about the world and here's what we do understand with it and so this is a big problem I [38:43.088 --> 38:49.016] think for democracies and we need a way around it and so this is going to come down to you know [38:49.016 --> 38:54.044] it's an asymmetric fight but it's one that we have to win. Do you think that it would be wise to [38:55.000 --> 39:01.064] use the same kind of manipulation techniques to spread true information? Yeah this is interesting [39:01.064 --> 39:06.028] right so the one side you've got the detection I think the other side you've got what's your reaction [39:06.028 --> 39:11.040] right what's the action that you take on top of this I think at this point here and you can go [39:11.040 --> 39:15.032] into just kind of the health crisis kind of dynamic of COVID and that's what I'm making makes a [39:15.032 --> 39:21.096] little more real and so if you've got stuff here you know around the diffusion of you know HCQ being an [39:21.096 --> 39:28.028] effective treatment or you know masks don't work this is really dangerous right this is incredibly [39:28.028 --> 39:33.000] dangerous a propagation and you know we've seen bought activity around masks don't work [39:33.088 --> 39:39.080] it doesn't be coordinated attacks around pushing to buy some of the masks. Sorry why would that be true [39:39.080 --> 39:46.044] like who would stand to gain from pushing the other the masks don't work? Well so if you want to create [39:46.044 --> 39:51.080] political division which has been a stated goal of the IRA internet research agency you find any [39:51.080 --> 39:57.016] hot button issue that will divide a country push it and puts in the driverism you have an us an [39:57.016 --> 40:02.028] a them and you lose the car he's cibony why do you want to do that well if you don't have a unified [40:02.028 --> 40:08.052] set of political consensus on anything it's very very hard to go to war it's very very hard to [40:08.052 --> 40:15.016] really the US to say don't invade Crimea if you can't even agree on masks right so like one one way to [40:15.016 --> 40:21.008] kind of neutralize the strongest military in the world is to ensure that the political actions will never [40:21.008 --> 40:26.060] come to agreement about how it will be used and Russia's been incredibly smart on that and so one of [40:26.060 --> 40:31.096] their kind of goals if they look through is to divide the nation so that you can't agree on anything and [40:31.096 --> 40:36.028] so one of the things has been masks now that the ad kind of benefit of the masks is that it kind of [40:36.028 --> 40:43.056] ruins the health of society by having division on that and it also ruins trust in the political system which [40:43.056 --> 40:48.092] is again to Russia's advantage so there's absolutely been something that if you're sitting there this [40:48.092 --> 40:53.056] is this has been one of the things that have popped up on your daily kind of topic board of things [40:53.056 --> 40:58.060] you have to act on and we can see that kind of manifest from the way in which information is propagating [40:58.060 --> 41:04.028] and the way in which bot type activity is engaging and so if you look at that you say well [41:04.028 --> 41:08.004] right there's nothing we can do about that well that's that's the wrong that's the wrong thing to do [41:08.004 --> 41:13.080] because not only are you creating political devices and this lives are being lost right so it's a hard [41:13.080 --> 41:18.036] position to hold that we shouldn't do something I think the question then comes is like you do want [41:18.036 --> 41:23.024] to propagate information out that is that is true and that does kind of you know can conform to the [41:23.024 --> 41:27.088] scientific you know consensus but the interesting thing on that is masks were not a scientific [41:27.088 --> 41:34.036] consensus right and if you went early on it was against WHO regulation and so if you posted and I [41:34.036 --> 41:39.024] conversations with Jeremy Howard about this he posted on Reddit and they said you can't put that here [41:39.024 --> 41:44.084] you can't post that masks are an effective solution and the reason you can't post it is because [41:44.084 --> 41:51.048] this is pseudoscience because science hadn't come to a conclusion so it's really really tough right as you [41:51.048 --> 41:56.012] go through this is to say well what is ground truth particularly if science hasn't figured it out [41:56.012 --> 42:02.020] and then how do we police you know content that may or may not conform to this and so immediately as [42:02.020 --> 42:08.044] you go through this you start to realize that it's a very very difficult problem however it's also [42:08.044 --> 42:14.004] one that you feel like you've got to act on so I think we're going to have to be in a place where we do [42:14.004 --> 42:20.084] use this technology to inoculate ourselves against kind of disinformation and one of the things here is [42:20.084 --> 42:27.064] is kind of to take the virus in analogy if you haven't been exposed to a political stance on on masks [42:28.052 --> 42:34.084] you'll probably take whatever your first exposed to and if your exposed the first information is [42:34.084 --> 42:41.008] that masks don't work it's against ferrisi if that becomes your first exposure it's much much harder to [42:41.008 --> 42:48.068] change your opinion than if your first exposure were mastered good idea if you help me I help you it's a [42:48.068 --> 42:55.024] good idea so one of the things you look about is identify the manipulation campaigns early and enoculate [42:55.024 --> 43:01.080] susceptible populations to the messages by exposing them to good well-grounded ground truth [43:02.052 --> 43:07.032] with those similar techniques similar techniques I think you're going to have to use similar [43:07.032 --> 43:12.060] techniques right and this is kind of to go back to the book from Stevenson you know the line between [43:12.060 --> 43:17.064] education and manipulation is a very very fine and often blurry line and it's that dynamic right [43:17.064 --> 43:22.084] is like well if I'm educating I am manipulating but the differences I'm doing it for the benefit of you I'm [43:22.084 --> 43:28.028] doing it for the benefit of the society not I'm doing it for my own benefit and I think I think that's [43:28.028 --> 43:32.068] kind of the dynamic here is undoubtedly we're trying machines to understand the world and ways that we [43:32.068 --> 43:38.060] can't to do things that we can't what we teach them how we teach them is really important because [43:38.060 --> 43:44.076] they're going to then be tools that you know either benefit us or or a work to our detriment [43:44.076 --> 43:49.056] but that kind of dynamic is it's they're undoubtedly going to see things that we can't see and they're [43:49.056 --> 43:54.004] going to understand things that we just can't understand and we need that because we can't navigate [43:54.004 --> 43:59.008] this world without them so you know they're here but we need to type responsibility with what what's [43:59.008 --> 44:03.080] in front of us well I have lots of more questions but I'm running out of time and we always end with [44:03.080 --> 44:09.024] two questions if you look at the subtopics in machine learning is there one that you think doesn't [44:09.024 --> 44:13.048] get as much attention as it deserves that that you think is way more important than people [44:13.048 --> 44:22.012] give a credit for yeah I think it's information retrieval so the world of IR is sort of machine learning [44:22.012 --> 44:27.056] kind of I mean it's sort of you know begin 25 algorithms and so on and sort of that but I think [44:27.056 --> 44:32.028] information retrieval has been something that we've totally forgotten about but it's so fundamental [44:32.028 --> 44:36.092] to all database technology in the world and yet we haven't really kind of given it the attention [44:36.092 --> 44:41.088] that it deserves so you know aside from some sort of researchers that I'm sure not getting a [44:41.088 --> 44:47.056] paper submitted to next week is it information retrieval is not top of a list but more information retrieval [44:47.056 --> 44:52.020] I assure I feel like that was really the first major application of machine learning at least that I [44:52.020 --> 44:56.068] was aware so yeah and we just haven't touched it the volume of information retrieval literature with [44:56.068 --> 45:02.012] these new kind of technologies is pretty low and yet underneath it it's a search and recall problem [45:02.012 --> 45:07.024] interesting I love it that's a first person that said that I think it's a great answer and [45:07.024 --> 45:12.036] okay the final question is when you look at the projects that you've had of kind of taking you know [45:12.036 --> 45:18.012] machine learning from conception to like deployed in production and useful where were the surprising [45:18.012 --> 45:24.052] bottlenecks in that entire process I think the surprising ones are dangerous that the amount of [45:24.052 --> 45:30.044] training data right and the importance of training data I think coming in when you when you [45:30.044 --> 45:34.036] that thing that data had to be cleaned we knew that there was cost functions when you that there [45:34.036 --> 45:39.088] had been deploy issues we knew that there'd be security issues for deploying on premal sense of data all of [45:39.088 --> 45:46.052] that was known I think coming into this the importance of not just the volume we also knew that there [45:46.052 --> 45:52.068] would be a volume of training data what I didn't think at the top of this and the surprise me was it [45:52.068 --> 45:57.064] the specificity of the training data drives the performance of the models and ways that [45:58.036 --> 46:04.092] are just not obvious when you start out on this and these things are kind of excellent production machines [46:05.056 --> 46:12.004] but they're also excellent cheaters and they'll find ways to cheat and find the right answer but [46:12.004 --> 46:16.060] you know it's because you gave them the wrong data and I think that sense of stability to the data [46:16.060 --> 46:21.016] is something that's really surprised me now the foot side of that if you start investigating [46:21.016 --> 46:25.056] you know methods of exposing these models to the right data you also get wonderful performance [46:25.056 --> 46:30.076] in ways that go above and beyond sort of the general applications so I think it's a blessing and a [46:30.076 --> 46:36.012] curse but I don't think going into this if I'd been told that that would be the thing that kind of drove [46:36.012 --> 46:40.028] the most kind of performance that I would have agreed to that so that's probably the biggest surprise [46:41.080 --> 46:47.016] well thanks so much this is really fun and fascinating my pleasure I've been joining a lot thanks Lakers [46:48.084 --> 46:54.052] thanks for listening to another episode of Great In the Scent do these interviews are a lot of fun [46:54.052 --> 46:59.088] and it's especially fun for me when I can actually hear from the people that are listening to the episode so [46:59.088 --> 47:04.060] if you wouldn't mind leaving a comment and telling me what you think or starting a conversation [47:04.060 --> 47:09.008] that would make me inspired to do more of these episodes and also if you wouldn't mind liking and [47:09.008 --> 47:12.012] subscribing I'd appreciate that a lot

59.27927

47.79087

3y ago

51s

Nov 21 '22 16:58

2007beia

Finished

Nov 21 '22 16:58

2180.688000

/content/piero-molino-the-secret-behind-building-successful-open-source-projects-isivxjqwg-c.mp3

tiny

[00:00.000 --> 00:03.000] [MUSIC] [00:03.000 --> 00:06.050] So this model was predicting the classification of the tickets, right? [00:06.050 --> 00:14.060] And then we decided to build a model that was also suggesting which actions to take in response to this ticket. [00:14.060 --> 00:21.040] And then there was also another model that was deciding which template answer to send back to the user, [00:21.040 --> 00:23.080] depending on what they were telling us. [00:23.080 --> 00:31.080] And so instead of creating all these different models, I thought that that was a really nice application of multitask learning and [00:31.080 --> 00:36.060] so I made this so that you can specify multiple outputs of multiple different data types. [00:36.060 --> 00:42.060] And in the end, we had basically one model that was capable of doing all these tasks using all these features. [00:42.060 --> 00:51.060] And that was basically the base of Ludwig and then I said it to add also images and all other things on top of that and more people said it to you. [00:51.060 --> 00:57.060] You're listening to gradient descent, a show about machine learning in the real world, and I'm your host, Lucas B. [00:57.060 --> 01:03.020] Well, Pierre is a staff research scientist in the Hazy research group at Stanford University. [01:03.020 --> 01:10.060] He's a former founding member of Uber AI, where he created Ludwig, worked on applied projects and published research on NLP. [01:10.060 --> 01:12.060] I'm super excited to talk to him today. [01:12.060 --> 01:17.010] Alright, so Pierre, I'd love to talk to you about your time at Uber and things you worked on, [01:17.010 --> 01:22.060] but I think the thing you may be better known for and the main topic is probably your project Ludwig. [01:22.060 --> 01:29.060] So maybe for some of the people that might be listening or watching, could you just describe Ludwig at a high level? [01:29.060 --> 01:36.010] Sure, so it's actually a tool that I built when I was working at Uber mostly for myself. [01:36.010 --> 01:44.060] I wanted to try to minimize the amount of work that it would take me to onboard a new machine learning project. [01:44.060 --> 01:54.060] And what it resulted in is a tool that allows you to train and then deploy the learning models without having to write code. [01:54.060 --> 02:00.010] And it does so by allowing you to specify a declarative configuration of your model. [02:00.010 --> 02:09.060] And depending on the data types that you specify for the inputs and the outputs to your model, it assembles a different deployment model that solves that specific task. [02:09.060 --> 02:13.060] And then train it for you and then you can deploy it. [02:13.060 --> 02:23.060] So can we make this more concrete? So like what if I want, if my inputs were like bounding boxes, that's something that Ludwig would understand if it's images and bounding boxes. [02:23.060 --> 02:30.060] It would then sort of choose a model and learn, say like, predicting classes or something like that, could it, would that work? [02:30.060 --> 02:34.060] So it doesn't, and right now there's no specific bounding boxes. [02:34.060 --> 02:37.060] It's something that I've featured that I'm going to add in the new future. [02:37.060 --> 02:46.060] But what you do in general is exactly that. So you specify your inputs and your outputs. And you specify what are their type. [02:46.060 --> 02:52.060] So for instance, if you want to do image classification, then you can see your input is an image and your output is a class. [02:52.060 --> 03:04.060] Or if you want to do information extraction from text, then you can have text as input and for instance a sequence is output where the sequence tells you what information you want to extract from the text. [03:04.060 --> 03:11.060] And any combination of these inputs and outputs allow you to create a different model basically. [03:11.060 --> 03:19.060] And it's the idea that underneath the hood, it picks the best city there at algorithm for any particular kind of input and output is that right? [03:19.060 --> 03:22.060] So it works at three different levels really. [03:22.060 --> 03:33.060] The basic level, you don't specify anything, you just specify your inputs and outputs and the types, and it uses some defaults that in most cases are like, you know, pretty reasonable defaults. [03:33.060 --> 03:40.060] Things that are for those kind of types of inputs and outputs state of the art in the literature. [03:40.060 --> 03:47.060] But you can also have, you have full control over all the details of the models that are being used. [03:47.060 --> 04:00.060] So for instance, if you're providing text, then you can specify the new one to encode it using an RNN or you want to encode it using a transformer or a CNN or a pre-trained model like bird. [04:00.060 --> 04:03.060] You can choose among these options. [04:03.060 --> 04:07.060] And you can also change all the different parameters of these options. [04:07.060 --> 04:22.060] For instance, for the RNN, you can say, how many layers of RNN or if you want to use an LSTM cell or a GRU cell or the sides of the hidden state, all the parameters and you may want to change for those models, you can change them. [04:22.060 --> 04:34.060] And additionally, one thing that we recently introduced in version 0 to 3 is the capability to do hyperparameter optimization, so that you can say, I want to use an RNN, but I don't know how many layers do I want to use. [04:34.060 --> 04:41.060] And then you can say, I have this range between 1 and 10 and figure out which is the best parameter configuration for this problem. [04:41.060 --> 04:47.060] And what does it do underneath the hood? Does it have some kind of smart system for finding the best set of hyperparameters? [04:47.060 --> 04:55.060] Yeah, so first of all, the models that it trains are TensorFlow 2 models right now, and we also think about adding additional backends, but that's what. [04:55.060 --> 05:01.060] So the output in the end will be a TensorFlow 2 model that you can use for whatever purpose you want. [05:01.060 --> 05:15.060] And for the parameter optimization, there's also for the parameter optimization process itself, there's the clarity configuration you can give, and you can specify if you want to optimize it using different algorithms. [05:15.060 --> 05:23.060] At the moment, the Zoletree supported, which is grid search, random search, and a Bayesian optimization algorithm called Pyshot. [05:23.060 --> 05:32.060] In the near future, we're going to add more in particular, you want to integrate with rate unit as many many of those algorithms already ready to be used. [05:32.060 --> 05:37.060] And also you can specify where do you want to execute the upper parameter optimization. [05:37.060 --> 05:52.060] If you have a laptop, maybe you want to execute it just on your machine, or if you have a machine with GPU, you may want to, you know, exploit the GPU or if you have a multi processing a multiple GPUs, you can run the training in parallel. [05:52.060 --> 06:00.060] And also if you have access to a cluster, then you can run on the cluster, I've Kubernetes cluster with multiple machines with multiple GPUs. [06:00.060 --> 06:09.060] Let me conclude data preparation, data augmentation techniques, like, is that something you can do with it also? Because I know that's super important to many field these days. [06:09.060 --> 06:17.060] Yeah, so for data processing, there are a bunch of things that will be provides, and a bunch of things that it doesn't provide. [06:17.060 --> 06:25.060] In particular, because that's not 100% domain focus, at least so far, there's not been 100% focus of the library. [06:25.060 --> 06:37.060] So we provide some function, some relatively basic functionalities, and if you have some specific need for pre-processing, we would suggest to do some pre-processing beforehand before providing the data to look with it. [06:37.060 --> 06:54.060] But things that look that it does automatically are, you know, for instance, a normalization of features, some tokenization of different sequences or text features for images with the resizing crop and all like pretty standard things, nothing crazy. [06:54.060 --> 06:59.060] But something that is useful for having a kind of an entwanned kind of experience. [06:59.060 --> 07:11.060] In terms of augmentation, currently we don't have any augmentation that you can do a lot of the box, but it's one of the things that we want to add in versions of the four of the package. [07:11.060 --> 07:23.060] I think one of the things that's striking about your library is, you know, I think some libraries try to help people that do write code, do machine learning without a deep knowledge of machine learning, but I think your library, if I recall correctly, says right in the description. [07:23.060 --> 07:31.060] We're trying to make it possible to do machine learning without actually writing any code at all, so that seems like a grander ambition. [07:31.060 --> 07:37.060] Can you talk a little bit about what made you come to that and maybe what design decisions you make differently to share and able that. [07:37.060 --> 07:47.060] So I think, you know, to a certain extent, it's a little bit aspirational to write because there is still something that you have to provide in this case, this is a clarity of definition of your model. [07:47.060 --> 07:57.060] But I believe that it's so much simpler to write this configuration file than it is to write code than to some intense purposes. [07:57.060 --> 08:04.060] It actually opens up the possibility for more people to try out to use these models, so that was a certain extent to the intent. [08:04.060 --> 08:16.060] In terms of the design decisions, I think the main one that allows for this level of abstraction is probably the choice that I made to. [08:16.060 --> 08:29.060] And you're saying before opinion it about the structure of the models and the fact that there are some data types that I support us and data types that I don't support. [08:29.060 --> 08:36.060] If you are, if your problem is within the realm of those data types that I support, then I make it really easy for you. [08:36.060 --> 08:45.060] If it's outside, then well, either you can go and implement it yourself or you can extend Ludwig to actually incorporate also additional data types that you care about. [08:45.060 --> 08:55.060] And those data types, the fact that you can compose a little bit of the compositionality aspect of it is what makes it general to cover many different use cases. [08:55.060 --> 09:00.060] And that's probably the main secret source. [09:00.060 --> 09:06.060] But it's just not so critical because it's an open source project, but it's probably part of where the magic is. [09:06.060 --> 09:08.060] That's what it is really. [09:08.060 --> 09:14.060] Can you describe how you would compose a data set? Can you give me a concrete example of that? [09:14.060 --> 09:15.060] A data type, sorry. [09:15.060 --> 09:34.060] Yeah, so again, one example, we've been for some examples like you know, text input category output is text classifier, but the interesting thing is that so in some libraries what you have is they provide you with some templates like for instance, the tool the core create. [09:34.060 --> 09:45.060] I believe that allows you to create models for Apple devices does something similar where you have you know, you have a task which is text specification and then you have to provide the text input and the class output. [09:45.060 --> 09:50.060] And then there's another task that is, again, gives you some templates that you have to fit into. [09:50.060 --> 10:03.060] In look the works the other way around you start from the data and you look at the data that you have and for instance you have you know, if you want to classify an article maybe you don't have only the text you also have information about who's the author. [10:03.060 --> 10:12.060] And you also have information about the data when it was published and you know maybe there is a sub title and there's a separation between the title the subtitle and the body. [10:12.060 --> 10:22.060] So what you could do with look the reason you can say well the title is a text input feature but also the subtitle is a separate in text input feature and the body is a separate in feature. [10:22.060 --> 10:32.060] And the author is a category because maybe I have you know working for a website and the website has 20 different outers and information about the author will allow me to figure out. [10:32.060 --> 10:43.060] Because maybe many authors maybe published in a specific topic and so that's additional signal that you will have when you're trying to figure out what class list new article belongs to. [10:43.060 --> 10:54.060] And also time because maybe a certain moment in time there was a spike of interest in a specific topic and so knowing that an article was published in a specific date that helps you figuring out what type of article this is. [10:54.060 --> 11:00.060] And so we've looked at the super easy to specify all these different inputs from different data types. [11:00.060 --> 11:14.060] It's just a list you just say the name of your feature and the type and it's a list of those things and it's all you have to do to have a model that combines all these different inputs into to the same architecture. [11:14.060 --> 11:19.060] What do you do if your data the types every data in consistent can can literally handle that. [11:19.060 --> 11:21.060] What do you mean? [11:21.060 --> 11:32.060] I guess like what if my input data had like I mean missing values might be the simplest case right but I'm thinking about of like the cases that people come to me with and you know they want to do some classifications of crazy data set. [11:32.060 --> 11:37.060] You know like maybe there's like sometimes multiple authors like maybe there's you know just thinking about all these. [11:37.060 --> 11:39.060] How do you deal with that? [11:39.060 --> 11:44.060] So well for it's a for cleaning the like missing values. [11:44.060 --> 11:53.060] Ludwig does some some of it for you but you can specify the default feeling value or you can specify to default to fill with you know some. [11:53.060 --> 12:01.060] Statistics like with the mean with the max these kind of things which are pretty straightforward the Ludwig allows you to do all these things so that that's good. [12:01.060 --> 12:13.060] But if the inconsistencies are bigger like for instance in some cases there's multiple authors well you either treated as a different data type altogether for instance set is a data type in Ludwig. [12:13.060 --> 12:19.060] So if you have multiple authors you can treat it as a set rather than treating it as a class for instance as a category. [12:19.060 --> 12:26.060] And so because I have multiple of those data that actually bring us data type the geolocation is a data type and so on and so on. [12:26.060 --> 12:40.060] I think you will have relatively easy time to find a data type that fits the type of data that you have and again if not Ludwig is really easy to extend to add a data type that kind of matches your your specific use case if you want to. [12:40.060 --> 12:45.060] So do you have examples of people that used Ludwig that really couldn't read any code. [12:45.060 --> 12:47.060] You know people that have tried. [12:47.060 --> 13:00.060] Yeah so that is this really interesting example that I've witnessed I would say of there are a couple articles online from a person who is an expert in CEO search engine optimization. [13:00.060 --> 13:21.060] And they wrote a couple articles on a CEO blog about using Ludwig for doing some predictions that are specifically useful for CEO purposes and I believe you know most of these people don't have a programming background, they cannot code and so it was really nice to see people using it for that purpose. [13:21.060 --> 13:41.060] And another funny example they have so I don't know how much coding did this guy do but okay so there was this application of Ludwig for the public article by the Max Planck Institute on analysis of some biological images about I think it was about worms or cells or worms and remember exactly. [13:41.060 --> 13:55.060] But the point was that the person that was using it was a biologist was not a computer scientist and what it told me is that it would not have been able to implement like it was using resonance within Ludwig. [13:55.060 --> 14:07.060] And we're not able to implement a resonance by himself and so that it would be enabled him to actually do this kind of research that otherwise would not have been easy for him to do so. [14:07.060 --> 14:12.060] Like some examples of what you were talking about and then pretty proud. [14:12.060 --> 14:15.060] Yeah you should be proud of that that's really impressive. [14:15.060 --> 14:25.060] I guess Ludwig came out of the year your use cases and obviously you're a very skilled coder like what were you working on at the time at Uber that inspired you to make Ludwig. [14:25.060 --> 14:31.060] Yeah so it kind of the whole point is that I'm lazy and I don't want to do the same thing twice. [14:31.060 --> 14:38.060] Well I mean twice is fine free time I basically try to automate it for myself for my own sake. [14:38.060 --> 14:44.060] And so I was working on this project called the Co-Taderas. There's a couple articles online if you're interested about it. [14:44.060 --> 14:51.060] It's a customer support model that basically the beginning was we were treating the problem as a tax classification problem. [14:51.060 --> 15:00.060] So we had the input tickets and we wanted to predict what type of ticket this was because depending on the type they were out at the different customers support representatives. [15:00.060 --> 15:08.060] And maybe just before you get too far into it could you like describe what's the scenario what's an example ticket and what would be an example class. [15:08.060 --> 15:15.060] Yeah so I mean I was working at Uber so when example was like I don't know am I right was canceled I want my money back or something like that. [15:15.060 --> 15:29.060] And the class they were like about I think 2,000 classes different classes that the ticket could belong to which could be you know appeasement request or lost item or food not delivered because it's also a Uber. [15:29.060 --> 15:38.060] The Uber eats side of things right so there was like a really wide range of possible types of issues that could happen. [15:38.060 --> 15:42.060] And again at the beginning we were treating the tax classification problem. [15:42.060 --> 15:53.060] But then the PM working on this problem came to me and said you know there is availability for additional features you're like for instance we can have some features from the user that is sending this message for instance. [15:53.060 --> 15:59.060] If they were using the driver app or the rider app or the Uber eats app when they were sending this message. [15:59.060 --> 16:07.060] And so that was again additional signal we wanted to integrate into the model and well I did it once and that was fine. [16:07.060 --> 16:12.060] But then they came back to me with additional features that were related for instance to the right that they were taking. [16:12.060 --> 16:22.060] And I say okay so these features are some of them are numbers some of them are binary values some of them are categories that's making something generic so that if they come to me again with more features to add. [16:22.060 --> 16:31.060] I will not it would be really easy for me to do that and so that's you know the part that they covered for the inputs and then the same up into the outputs because we had. [16:31.060 --> 16:42.060] So these model was predicting the classification of the tickets right and then we decided to build the model there was also suggesting which actions to take in response to this ticket. [16:42.060 --> 16:52.060] And then there was also another model that was deciding which template answer to send back to the user depending on what they were telling us. [16:52.060 --> 17:05.060] And so instead of creating all these different models I thought that meant that was a really nice application of multitask learning and so I made this so that you can specify multiple outputs of multiple different beta types. [17:05.060 --> 17:31.060] And in the end we had basically one model that was capable of doing all these tasks using all these features and that was basically the base of Ludwig and then I said it to add also images and all other other things on top of that and more people said it to use it within within the organization and then later on with us at the finally to read the source because we thought it also other people outside we could find some value in using it. [17:31.060 --> 17:47.060] That's so cool. Do you anticipate more people moving to this model of not worrying about the underlying architecture of what's happening and I guess what should people then focus on if they're using Ludwig if they if you want to make your model better. [17:47.060 --> 17:53.060] What is their left to do so I think this to us with there so I would say. [17:53.060 --> 18:04.060] I believe and maybe wrong but I believe that there's much more people in the world that doesn't know how to implement to deploy model than people that knows how to implement to deploy model right. [18:04.060 --> 18:21.060] And so I would say for I believe that there's also a value that Ludwig can give to an expert in particular because it makes it easy to compare different models makes it very easy for you to have a baseline for instance that it's definitely something that is useful in many situations right. [18:21.060 --> 18:31.060] If you are a super expert and you want to implement you purely search your and you're upgrading a new model that you know probably you want to implement it from scratch and if you're controllable it. [18:31.060 --> 18:41.060] But I think there's the rest of us the rest of the people that don't know how to implement a deployment model and doesn't have the time and resources to study it. [18:41.060 --> 18:59.060] I think there's a lot of value to be unlocked by using that tool like Ludwig and in terms of then what do you do if you're not writing your model well there's all sorts of other things right there's first of all you can figure out the upper parameters both by hand and also automatically. [18:59.060 --> 19:10.060] There's also like other things like you can try to try to for instance figure out which subsets of data on which subsets of data the model performs better or worse and so have some sort of. [19:10.060 --> 19:22.060] Outer look kind of explainability and then trying to make sure that your model is safe and that is not discriminating all these sort of things and usually the way you actually approach. [19:22.060 --> 19:44.060] This kind of problems it to add more data in a specific way that tries to introduce and solve these problems in the behavior of the model right so I would say in general this is like a piece of a human centered kind of process and so the human has a lot of things to do in this process by labeling data [19:44.060 --> 19:54.060] and just adjusting the model integrating the model into a broader application so there's a lot still to do with for the human everything. [19:54.060 --> 20:13.060] Is it part of Ludwig's scope to guide the human building the model into things that are likely to help the model perform better like I'll give you an example I often help people who don't have a lot of experience train models and you know some of the mistakes they make are kind of surprising to people that are like in the field but make total sense. [20:13.060 --> 20:34.060] If you step back like I've noticed in some cases people will have so many classes that they don't have an example literally even one example of every class and then they're surprised in the model can't predict that class or they've literally not provided an example of that and I could think of lots of different ways that people can shoot themselves in the foot when they don't have experience with this type of thing. [20:34.060 --> 20:39.060] Is it part of Ludwig's scope to help people avoid those bad situations. [20:39.060 --> 20:53.060] So that's really interesting question. I would say this scope is changing over time to be honest, right? At the beginning this scope is described like the beginning in the scope was to build a tax classifier and then it became like a much more generic thing over time. [20:53.060 --> 21:11.060] So also with this regard to what we're asking it's something that we don't specifically. So let's put it this way. Ludwig not just you in other action but it does so in particular for model architecture choices and model training and building. [21:11.060 --> 21:26.060] It has some defaults that are kind of reasonable and helps you figure out easily with the parameters what to do. What it does not do right now is what you describe like the more higher level kind of problems. [21:26.060 --> 21:44.060] If if is the problem you're trying to solve a problem is solvable with a machine learning algorithm to begin with for instance that's something that is right now out of the scope of Ludwig you basically start with something that you believe could be useful. [21:44.060 --> 21:51.060] Signal that kind of extends and you know distribution of classes for instance, it kind of extends. [21:51.060 --> 21:59.060] This is slightly switching gears but this has been a surprisingly interesting question recently. What do you think about Python is sort of a lingua frank of. [21:59.060 --> 22:13.060] But what you're saying is very interesting because you know there could be some even relatively easy checks that one could do beforehand and return to the user saying, oh, you know there's for the class A B and C that don't have examples. [22:13.060 --> 22:20.060] Maybe you want to provide them if you want to have good performance or something like that that could be easily added so that's something that I would take into consideration. [22:20.060 --> 22:35.060] Machine learning like do you think that Python is going to stay the dominant language field building models or maybe there'll be something even more high level if your vision is that people don't even need to write code to build these models. [22:35.060 --> 22:43.060] Yeah, I mean, so I had several aspects to discuss it. I think also it depends on who is the user. [22:43.060 --> 23:00.060] I believe that for instance, you know, if you think about databases before SQL was invented while people had to code their own databases by hand, well not really SQL by me, the relational database in general introduction of those kind of management systems. [23:00.060 --> 23:13.060] People had to implement their databases by hand and they were using files and hierarchies as a way the file system was basically an early example of a database really. [23:13.060 --> 23:32.060] And then there was that it is changing to the paradigm on the way that people interacted with data by using language like SQL, it is more declarative doesn't require you to express how things should be computed, but actually what you want to compute. [23:32.060 --> 23:47.060] And I think that the similar shift could happen also for machine learning although this is true for a set of users which are you know the final users those ones that use the models much less so for the people that actually produce the models. [23:47.060 --> 24:12.060] I think I actually loved Python. I think it's a great language as really nice really nice syntax is very simple to pick up very simple to you know look at someone else's code and and being proven and change it so I think it's a great language but I can also imagine that we could be moving towards languages that are probably a little bit more efficient. [24:12.060 --> 24:26.060] And I mean the efficiency of using Python right now is basically wrapping sea stuff maybe there is a world where you know we start to write models in in rust even if for us it's a little bit complicated probably. [24:26.060 --> 24:41.060] But I believe that maybe in Julia I don't know there could be like some candidates language to the throne Python as the lingua franca for machine learning although I don't see that happening in the very near future to be honest. [24:41.060 --> 25:02.060] How do you decide what default model you give someone for a certain configuration especially when the research is changing so fast and I would say especially maybe in natural language processing right now which is sounds like where is where living started like does it ever get contentious. [25:02.060 --> 25:17.060] I think that a lot of no code users if they have no experience in machine learning they're probably going to stick to the default or at least even if you do a hyper parameter search you have to constrain it somehow to some set of defaults how do you think about that. [25:17.060 --> 25:34.060] This is a great point and also you know there are many aspects in my opinion that in there are not I mean there are some researchers that are actually talking about these aspects but they're not the mainstream in particular in the in research. [25:34.060 --> 25:46.060] And those aspects are like you know performance is one dimension that a potential user of system like this may care about but they're also other dimensions it could be you know. [26:16.060 --> 26:28.060] default to use you know t5 as a model for encoding language just because the amount of users that could actually fine tune a t5 models. [26:28.060 --> 26:46.060] And also the degree of advantage that they would get over like a smaller model maybe not as big as to justify the increase cost in computational cost right so I try to you know balance towards the inexpensive. [26:46.060 --> 26:59.060] But living the option for the more expensive so that's one thing I do and on the other hand this is something that I really interesting in doing and starting to do some little you know research around it. [26:59.060 --> 27:09.060] One thing that I want to do is I want to do like a really large scale comparative study this is actually a little bit more on what I do at time for the more than what it did for for specifically for look. [27:09.060 --> 27:30.060] I'm really curious in doing a large comparative study among different models on different with different hyperparameter optimization values on different tasks and maybe one interesting outcome of that could be something that looks like a recommender system that tells you I have this new dataset with this amount of data of this data types. [27:30.060 --> 27:50.060] And what model do you suggest me to use given these constraints because I think that the constraints are important you have you may say like I want only to see models that would take less than zero to one less than 10 milliseconds to run inference on and so maybe they will roll out some of the more expensive but also more effective models right. [27:50.060 --> 27:56.060] So depending on the constraints so just in something that depends on the constraints I think it would be really useful. [27:56.060 --> 28:13.060] Well you know now that we have a weights and biases integration we could give you the data of all the users that chose to make their projects open and that might actually give you kind of real world evaluation of the different things that work and don't work it would be super cool to see if that was useful. [28:13.060 --> 28:22.060] Absolutely I mean this is this is something that you know with your data you probably can already do right we could think about ways to collaborate on that definitely. [28:22.060 --> 28:24.060] That's what you think about. [28:24.060 --> 28:26.060] That we fun. [28:26.060 --> 28:34.060] Step back a little bit one thing that I want to ask you is I noticed that you've been doing NLP work for quite a long time I think you know before. [28:34.060 --> 28:49.060] I'm kind of curious the perspective of someone like you I'm kind of the new stuff that we're seeing like. [28:49.060 --> 29:03.060] Do you feel like you know GPT 3 is like a real step function change in the quality of NLP and kind of changes the possible applications or was it sort of inevitable and how do you look at the field and how do you feel the field has changed. [29:03.060 --> 29:05.060] You've been working in it. [29:05.060 --> 29:13.060] Yeah so I mean this should have been working for at least 10 years right now basically in the field so I've seen quite a few waves. [29:13.060 --> 29:22.060] Tasks that were interesting 10 years ago are still interesting today so there are many things that were unsolved back then and still unsolved right now. [29:22.060 --> 29:32.060] We did progress in terms of performance but I would say the general framework for how things the problems and how we approach them hasn't changed a lot. [29:32.060 --> 29:42.060] When we're using NLP works before we were using SVMs but overall there was not a huge change in particular in the way things work in industry really. [29:42.060 --> 29:55.060] But in particular the capabilities for you shot actually capabilities for interacting with the model itself through language that is shown by something like GPT 3. [29:55.060 --> 30:09.060] Those are kind of change kind of the paradigm of interaction with those systems and I think I am not sure of the commercial usefulness and application of something like that. [30:09.060 --> 30:24.060] But what I'm sure of is giving a few having a general system to which you could give a really small amount of examples and then the system takes on that and is able to perform the same kind of task. [30:24.060 --> 30:33.060] You've shown it on and seen data right off the bat without needing training for specific training for solving those tasks. [30:33.060 --> 30:39.060] That's a very compelling thing and something that may bring the industry in a different direction. [30:39.060 --> 30:45.060] So I see an interesting world in the future where that shift happens. [30:45.060 --> 31:02.060] Although I still have my questions the jury still it's not we haven't settled on a final answer on how much and which scenarios this actually works to the point that we can actually use it. [31:02.060 --> 31:04.060] But let's see about that. [31:04.060 --> 31:07.060] I'm curious to see what the near future holds. [31:07.060 --> 31:09.060] Cool. [31:09.060 --> 31:16.060] Well I can see we're running out of time and we always end on two questions and I want to give you a little bit of time to answer these questions. [31:16.060 --> 31:27.060] The penultimate question that we ask is what is the topic in machine learning broadly that you think doesn't get as much attention as it deserves. [31:27.060 --> 31:34.060] So I think now it's getting a little bit more attention than it was before so I may be a little bit out of time giving this answer. [31:34.060 --> 31:40.060] But I believe that something that I think it's very important is systematic generalization. [31:40.060 --> 31:47.060] And again there's been work from Marco Baroni, Brennan Lake and also Justin and Baum on the topic. [31:47.060 --> 32:03.060] But as not being for a long time at the forefront of research but it's something that is super interesting and it's something that if solved may unlock many applications also of machine learning. [32:03.060 --> 32:15.060] But now we have a hard time applying machine learning for instance scenarios where there's a lot of shift in distribution of data over time or in scenarios where we need to train from less data. [32:15.060 --> 32:21.060] If we had a solution for system activation, we could be able to apply machine learning models in this scenario. [32:21.060 --> 32:24.060] So I'm really looking forward more research on that topic. [32:24.060 --> 32:28.060] And you could define what system that generalization means. [32:28.060 --> 32:41.060] Yeah, I may be but showing it a little bit but at least the way I see it is the fact that you have a model that can figure out a way to generalize beyond the turning data obviously. [32:41.060 --> 32:43.060] But generalizing a way that is systematic. [32:43.060 --> 32:54.060] So that learns that I can give a like a practical example of all the specific instances of specific phenomena behaves in the same way. [32:54.060 --> 32:59.060] Like it realizes that for instance if you're talking about text, right? [32:59.060 --> 33:10.060] That the is invariant to the choice of entities or is invariant to the choice of some synonyms when it's returning its predictions. [33:10.060 --> 33:18.060] And I think it's really important because those models that exhibit the behavior like that are models that we can trust. [33:18.060 --> 33:24.060] And the final question is and maybe this is you can really rely on your experience at over here. [33:24.060 --> 33:33.060] What's the hardest part about taking an ML project from conceiving of the idea to getting it deployed in production and doing something useful? [33:33.060 --> 33:39.060] Yeah, I think the answer to this is changes a lot for the planning on the type of kind of organization that you're working in. [33:39.060 --> 33:45.060] Like if you're in a startup, you can do things differently if you're in a bigger organization maybe different. [33:45.060 --> 33:50.060] So I can speak for in particular for the legal organization kind of setting. [33:50.060 --> 34:01.060] They can say that in particular for researchers of something that is difficult is then to put whatever you obtained in your research into production. [34:01.060 --> 34:05.060] And there's at least two sets of problems why that's difficult. [34:05.060 --> 34:12.060] One is like a practical one engineering one usually the infrastructure for deployment is not the same that you use for training your models and so there's a mismatch of the problem. [34:12.060 --> 34:26.060] That you use for training your models and so there's a mismatch there that has to be field and also maybe your models are a little bit too slow for the what are the needs for for inference at scale and so there needs to be some compromises there. [34:26.060 --> 34:28.060] And that's one of the problems. [34:28.060 --> 34:40.060] But the other problem which in my opinion it's more important because not technical one it's harder to solve is a misalignment in the goals really of what the model should be doing. [34:40.060 --> 34:54.060] You may be optimizing your model for you know with whatever metric you care about let's say for central pillars or maybe you have a ranking problem in your optimizing for the mirror, see political rank or whatever other metric you're using for both optimization and evaluation. [34:54.060 --> 35:09.060] But in the end in many real scenarios those metric are just proxies for what you actually care about and what you actually care about if you're doing for instance or a commender system is you care about how many people are clicking on the items that you're looking for. [35:09.060 --> 35:16.060] On the items that you are suggesting and maybe if there's like a store and many people actually buying something. [35:16.060 --> 35:25.060] You may have the model that is 20% as 20% better MRI of flying you deploy it and people don't buy more. [35:25.060 --> 35:38.060] That's not all that is going to be deployed right and so that's something that machine learning people usually don't think a lot about and it's something that in my experience as being the main. [35:38.060 --> 35:47.060] The restriction between developing something offline and then getting something that deployed in for real and front of the users. [35:47.060 --> 35:52.060] I make sense thank you so much Piero. It's a real pleasure to talk to you. [35:52.060 --> 35:56.060] Yeah thank you for the really interesting questions. Who has really fun to chat with you too. [35:56.060 --> 35:58.060] Yeah thank you. [35:58.060 --> 36:01.060] Thanks for listening to another episode of Great descent. [36:01.060 --> 36:08.060] Doing these interviews are a lot of fun and it's especially fun for me when I can actually hear from the people that are listening to the episodes. [36:08.060 --> 36:16.060] So if you wouldn't mind leaving a comment and telling me what you think or starting a conversation that would make me inspired to do more of these episodes. [36:16.060 --> 36:20.060] And also if you wouldn't mind liking and subscribing, I'd appreciate that a lot.

44.5768

48.9198

3y ago

1m 2s

Nov 21 '22 16:57

xav6m5dq

Finished

Nov 21 '22 16:57

2967.768000

/content/vladlen-koltun-the-power-of-simulation-and-abstraction-htdspsgblqo.mp3

tiny

[00:00.000 --> 00:09.012] I wanted to understand how we train intelligent agents that have this kind of embodied [00:09.012 --> 00:15.044] intelligence that you see in us and other animals where we can walk through an environment [00:15.044 --> 00:22.008] gracefully deliberately we can get to where we want to go, we can engage with the environment [00:22.008 --> 00:28.072] if we need to rearrange it, we rearrange it, we clearly act spatially intelligently. [00:28.072 --> 00:36.032] And by intelligently in an embodied fashion and this seems very core to me and I want to [00:36.032 --> 00:41.092] understand it because I think this underlies other kinds of intelligence as well. [00:41.092 --> 00:46.032] You're listening to gradient descent, a show about machine learning in the real world [00:46.032 --> 00:52.080] and I'm your host, Lucas B. Well, Gladlin Colton is chief scientist for intelligent systems [00:52.080 --> 01:00.072] at Intel where he runs a lab of researchers working on computer vision, robotics and mapping simulations [01:00.072 --> 01:06.040] to reality. Today we're going to talk about drones for like at robots and whole bunch of cool stuff. [01:07.076 --> 01:12.032] Alright, Gladlin, thanks so much for talking with us. I saw your title as somewhat [01:12.032 --> 01:16.088] evocative. It's the chief scientist for intelligent systems at Intel. Can you say a little bit [01:16.088 --> 01:22.072] about what this scope is? It sounds intriguing. Yeah, I prefer the term intelligence [01:22.072 --> 01:31.044] systems to AI. AI is a very loaded term with a very long history, a lot of baggage. As you may [01:31.044 --> 01:38.072] remember, the term fell out of favor for a very long time because AI over promised and under [01:38.072 --> 01:47.068] delivered in the 80s and 90s. And when I became active in the field when I really learned [01:47.068 --> 01:55.036] quite a bit about AI, the term AI was not used by many of the most serious people in the field. [01:55.036 --> 02:01.084] People avoid the term artificial intelligence. People identify primarily as machine learning [02:01.084 --> 02:11.028] researchers. And that persisted into, I'd say, the mid-20 tens actually. It's only very recently [02:11.028 --> 02:19.060] that the term AI became respectable again and serious researchers on a large scale was started to identify [02:19.060 --> 02:28.040] themselves as artificial intelligence researchers. I saw now find that term intelligence systems [02:28.040 --> 02:33.084] broader. First of all, because it doesn't have the word artificial. So if we're interested in [02:33.084 --> 02:38.064] intelligent systems, we clearly are interested in artificial intelligence systems, but also in [02:38.064 --> 02:43.068] natural intelligence systems, we want to understand the nature of intelligence. We are concerned with [02:43.068 --> 02:52.032] intelligence, understanding it and producing it, inducing it in systems that we create. It's a more [02:52.032 --> 03:00.056] neutral term with less baggage. I like it. I don't mind AI, but somehow I'm more predisposed to intelligence [03:00.056 --> 03:05.004] systems. Cool, I love it. And I always try to take their perspective at these as someone who's [03:05.004 --> 03:10.008] knows about machine learning or intelligence systems. But maybe it doesn't. Isn't an expert in your field. [03:10.008 --> 03:15.028] It'll be super easy in this interview because I know very little about robotics and a lot of [03:15.028 --> 03:22.016] stuff that you've been working on. I'm very intrigued by it. I think anyone understands how cool this [03:22.016 --> 03:27.060] stuff is. I'd love to ask you about some of the papers that I was looking at. One of the kind of [03:27.060 --> 03:33.044] just struck out to myself now, but also my younger self, is just unbelievably cool. [03:33.044 --> 03:40.072] It was the paper that you're at in quadruped locomotion where you have a walking robot navigating [03:40.072 --> 03:45.060] terrain. I think what was maybe most evocative about it was you say that you've basically [03:45.060 --> 03:52.024] trained in a completely insimulation. And so then it's zero shot learning in new terrain. [03:52.024 --> 03:56.048] And I guess, could you say for someone like me, actually, who's not an expert in the field, [03:56.048 --> 04:01.068] kind of what's like hard about this, like just in general, and then kind of what did your [04:01.068 --> 04:10.048] paper offer that was sort of near to this challenge? Yeah, legate locomotion is very hard because [04:10.048 --> 04:19.036] you need to coordinate the accruation of many actuators. And there is one very visceral way to [04:19.036 --> 04:25.092] understand how hard it is, which is to control an animated character with simple legs, [04:25.092 --> 04:31.020] where you need to actually, they're different joints or they're different muscles with different [04:31.020 --> 04:37.012] keys and the keyboard. And there are games like this. And you can try doing this even with just four joints. [04:37.012 --> 04:43.060] So try actuating four joints yourself. And it is, it's basically impossible. It's just brutally [04:43.060 --> 04:51.084] brutally hard. It's this delicate dance where at the same time, in synchrony different muscles need [04:51.084 --> 04:57.076] to fire, just ride and one is firing more and more strongly. And the other needs to subside. [04:57.076 --> 05:04.008] And this needs to be coordinated. This is a very precise trajectory and a very high-dimensional space. [05:04.008 --> 05:09.028] This is hard to learn. And if you look at human puddleers learning it, it takes them a [05:09.028 --> 05:15.076] good couple of years to learn it. This is even for human intelligence, which is awesome. And on I [05:15.076 --> 05:21.004] use the term awesome here in this original meaning. I don't mean awesome, like I really good [05:21.004 --> 05:29.044] cup of coffee. Awesome. Even for this level of intelligence, it takes a couple of years of experience [05:29.044 --> 05:36.040] to get a hang of legato locomotion. So this is very, very hard. And we want to want our systems [05:36.040 --> 05:45.004] to discover this, to master this delicate dance that as adult humans we basically take for granted. [05:45.044 --> 05:52.008] And you can look at basically the most successful, I would say, attempt so far, which is Boston dynamics, [05:52.008 --> 06:01.052] which is a group of incredibly smart, incredibly dedicated, insightful engineers who are some of the [06:01.052 --> 06:10.016] best in the world at this, a large group, and it took them 30 years. It took them 30 years to really get it, [06:10.016 --> 06:19.028] to really design and tune, legate locomotion controllers that are very robust. We did this, and depending [06:19.028 --> 06:26.072] how you count, but I would say about two, three years primarily with two graduate students. Now, [06:26.072 --> 06:33.068] these are amazing, graduate students. These are really extraordinary graduate students, but still, [06:33.068 --> 06:39.044] the fact that we could do this in two, three years speaks to the power of the approach. [06:39.044 --> 06:47.044] And the approach is essentially taking the system through a tremendous amount of experience in simulation, [06:47.044 --> 06:57.044] and how it do all the trying and falling in simulation. And then the key question after that is [06:57.044 --> 07:03.068] what happens when you learn in simulation and put the model, put the controller on the real robot, [07:03.068 --> 07:11.084] in reality, will it work? And there are a few ideas that make it work, and a few pleasant surprises, [07:11.084 --> 07:17.084] where it worked better than we expected. One key idea that was introduced in our previous paper, [07:17.084 --> 07:24.048] the Science Robotics paper that we published a couple of years ago, is to empirically characterize [07:24.048 --> 07:33.012] the actuators that are used on the real robot. So you basically measure, you do the system identification, [07:33.012 --> 07:40.040] you measure, that dynamics model each actuator empirically by just perturbing the robot, [07:40.040 --> 07:46.008] actuating the actuator, and just seeing what happens, seeing how the system responds. And that means [07:46.008 --> 07:53.012] that you don't need to model complex motors with their delays and the electro mechanical phenomena [07:53.012 --> 07:59.060] that happen in the actuators, you don't need to model that analytic, you can just fit a little [07:59.060 --> 08:06.008] neural network, a little function approximator to what you see. Then you take this empirical actuator model [08:06.008 --> 08:12.080] into your simulated legged system, then you have the legged system walking or walk around on simulated [08:12.080 --> 08:21.052] terrain. That's where the pleasant surprise comes, which is that we didn't have to model all the possible [08:21.052 --> 08:28.096] behaviors of simulated terrains and all the times of simulated terrains in simulation. We didn't [08:28.096 --> 08:36.080] have to model vegetation. We didn't have to model gravel. We didn't have to model crumbling. We didn't [08:36.080 --> 08:47.020] have to model snow and ice just with a few simple types of terrains and aggressively randomize [08:47.020 --> 08:56.016] geometry of these terrains, we could teach the controller to be incredibly robust. And the [08:56.016 --> 09:03.020] amazing thing that we discovered, which is maybe the most interesting outcome of this work, is that in [09:03.020 --> 09:10.040] the real world, the controller was robust to things it never really explicitly saw in simulation. [09:10.040 --> 09:21.028] Snow, vegetation, running water, soft yielding, compliant terrain, sand, things that would be [09:21.028 --> 09:28.024] excruciating, hard to model. Turns out we didn't need to model them. That's so cool. We've talked [09:28.024 --> 09:35.044] to a whole bunch of people that work on different types of simulated data, often just for the cost savings, [09:35.044 --> 09:41.084] being able to generate infinite amounts of data. It seems like if I get summarized with what they [09:41.084 --> 09:48.056] seem to say, you often benefit from a little bit of real world data in addition to the simulated data. [09:48.056 --> 09:55.004] But it sounds like in this case, you didn't actually need it. It did literally work like the first time [09:55.004 --> 10:00.016] you tried it or there's some tweaks that you had to make to the simulation to actually get it to [10:00.016 --> 10:06.096] bridge the gap between simulation reality. It worked shockingly well and what helped a lot [10:06.096 --> 10:14.024] is that June Hode just kept going. I love working with young researchers, young engineers, [10:14.024 --> 10:20.080] young scientists because they do things that would seem crazy to me and if you ask me to predict, [10:20.080 --> 10:27.004] I would say that's not going to work. But fortunately often they don't ask me and they just try things. [10:27.004 --> 10:35.012] And so we would just watch June Hode try things out and things kept working. So the fact that you don't [10:35.012 --> 10:43.076] need to model these very complex physical behaviors in the terrain, in the environment. This is [10:43.076 --> 10:50.080] empirical finding. We basically discovered this because June Hode tried it and it worked and then he kept [10:50.080 --> 10:59.004] doing it and it kept working and it kept working remarkably well. So somehow it was very good that [10:59.004 --> 11:07.036] he didn't ask me and others. Is this a good idea? Should I try this? It seems like there's these obvious [11:07.036 --> 11:13.052] extensions. It would be amazingly useful. I could be tried to do bipedal locomotion and then making [11:13.052 --> 11:20.016] the robot, you know, like you're using engaging with this world. Where does this line of inquiry get stuck? [11:20.016 --> 11:26.056] Like it seems so promising? Yeah, we're definitely, we're definitely pushing this along a number of [11:26.056 --> 11:35.076] avenues. I'm very interested in bipeds and we do have a project with bipeds. We're also continuing to work [11:35.076 --> 11:41.028] with quadrupeds. We have multiple projects with quadrupeds and we're far from that with quadrupeds. [11:41.028 --> 11:46.080] There's definitely more. There's more to go. And then you mentioned interaction. You mentioned [11:46.080 --> 11:52.080] engaging engaging with the world and this is also very interesting frontier and we have projects [11:52.080 --> 12:00.072] like this as well. So ultimately you want not to just navigate through the world. You also want to [12:00.072 --> 12:06.048] interact with this more deliberately. Not just be our best and not fall and get to where you want to go. [12:06.048 --> 12:11.036] But after you get to where you want to go, you actually want to do something. Take something, [12:11.036 --> 12:17.052] where somewhere else or manipulate the environment in some way. What physics-simulated did you [12:17.052 --> 12:26.056] use? Is this something you've built? This was a custom. This is a custom physics simulator built by [12:26.056 --> 12:32.048] Jimin, Jimin Huangbo, who led the first stage of that project. That's why I said by the way that [12:32.048 --> 12:39.028] it took three years because I'm including the previous iteration that was done by Jimin that laid a lot of [12:39.028 --> 12:45.084] the groundwork and a lot of the systems infrastructure we ended up using. So Jimin basically built a [12:45.084 --> 12:55.012] physics simulator from scratch to be incredibly efficient. So it's very easy for these simulation [12:55.012 --> 13:01.076] times to get out of hand. And if you're not careful, you start looking at training times and the [13:01.076 --> 13:09.004] algorithm of the week or more. And I've seen this happen when people just code in Python and take off the [13:09.004 --> 13:15.028] shelf components. They get hit with so much overhead and so much communication. And then I tell them that [13:15.028 --> 13:21.060] they can get a one or two or three orders of magnitude if they do it themselves and sometimes it's [13:21.060 --> 13:29.020] really necessary. And so our debug cycle was a couple of hours in this project. So that helped. [13:29.020 --> 13:35.020] That's incredible. And that seems like such an undertaking to build a physics simulator from scratch [13:35.020 --> 13:40.080] where there was somehow constrained to make it a more tractable problem or a hat. [13:40.080 --> 13:50.000] So I think what helped is that Jimin did not build a physics simulator for this project. It's not [13:50.000 --> 13:56.032] that he started this project and then he said, "I need to pause the research for about a year [13:56.032 --> 14:02.024] to build a custom hyperformance physics simulator." And then I'll get to do what I want to do. He built it up [14:02.024 --> 14:09.028] during his PhD during many prior publications. And it's a hobby project just like every, [14:09.028 --> 14:15.020] you know, self-respecting computer graphics, student has a custom rendering engine that they're maintaining. [14:15.020 --> 14:21.092] So in this area a number of people have kind of custom physics engines that they're maintaining just [14:21.092 --> 14:26.072] because they're frustrated with anything they get off the shelf because it's not custom enough. It doesn't [14:26.072 --> 14:31.020] provide the interfaces they want. It doesn't provide the customizability that they want. [14:32.040 --> 14:36.048] Yeah, one of the things he'd mentioned in the paper or one of the papers was using privilege [14:36.048 --> 14:40.032] learning as a learning strategy, just something I hadn't heard of. Could you describe what that is? [14:41.060 --> 14:48.048] Yeah, it's an incredibly powerful approach that we've been using in multiple projects. [14:48.048 --> 14:56.096] And it splits the training process into two stages. In the first stage you train a sensory motor agent [14:56.096 --> 15:03.020] that has access to privileged information. That's usually the ground truth state of the agent. [15:03.020 --> 15:09.020] For example, where it is exactly what its configuration is. So for example, for an autonomous car, [15:09.020 --> 15:14.064] it would be it's absolutely precise ground truth position in the world down to the millibier. [15:15.052 --> 15:22.064] And also the ground truth configuration of the environment, everything that matters in the environment, [15:22.064 --> 15:28.048] the geometric layout of the environment, the positions of the other participants, the other agents in [15:28.048 --> 15:35.068] the environment, and maybe even how they're moving and where they're going and why. So you get this [15:35.068 --> 15:43.004] god-sive view into the world, the ground truth configuration of everything. And this is actually a [15:43.004 --> 15:50.024] much easier learning problem. You basically don't need to learn to perceive the world through incomplete [15:50.024 --> 15:57.020] and noisy sensors. You just need to learn to act. So the teacher, this first agent we call it, [15:57.020 --> 16:05.004] the teacher, the privileged teacher, it just learns to act. Then you get this agent, this teacher, [16:05.004 --> 16:12.016] that always knows what to do. It always knows how to act very, very effectively. And then the [16:12.016 --> 16:19.092] teacher trains the student that has no access to privileged information. The student operates only on real [16:19.092 --> 16:25.076] sensors that you would have access to in the real world, noisy incomplete sensors, maybe cameras, [16:25.076 --> 16:33.084] I am you, only on board sensors, only on board computation. But the teacher can always query the [16:33.084 --> 16:38.056] student. I can always query the teacher and ask what would you do? What is the right thing to do? [16:38.056 --> 16:43.004] What would you do in this configuration? What would you do in this configuration? So the learning process [16:43.004 --> 16:49.044] problem is again easier because the student just needs to learn to perceive the environment. [16:49.044 --> 16:56.032] It essentially has a supervised learning problem now because in any configuration it finds itself, [16:57.004 --> 17:00.008] the teacher can tell it, here's the right thing to do, here's the right thing to do. [17:01.012 --> 17:07.092] So the sensory motor learning problem is split into two. First learning to act without perception [17:07.092 --> 17:15.044] being being hard and second learning to perceive without action being a being hard. Turns out that's [17:15.044 --> 17:22.048] much easier than just learning the two together in a bundle. That's really interesting. So [17:22.048 --> 17:30.000] the way you do the second part of the training, make sure I got this. This second model with the [17:30.000 --> 17:39.052] realistic inputs, is it trying to match what the teacher would have done? Yeah. But it doesn't [17:39.052 --> 17:45.060] actually try to figure out in intermediate true representation of the world. It's just kind of matching [17:45.060 --> 17:51.012] the teacher. It doesn't somehow try to actually do that mapping from noisy sensors to real-world [17:52.000 --> 17:57.060] state. Right. It doesn't need to reconstruct the real world state. So there are different [17:57.060 --> 18:02.000] architectures we can imagine with different intermediate representations, but the simplest and [18:02.000 --> 18:08.064] initiation of this approach is that you just have a network that maps sensor input to action. [18:08.064 --> 18:16.032] And then this network is just trained in a supervised fashion by the actions that the teacher produces. [18:16.088 --> 18:22.032] I say cool. Okay, so I'm really just cherry picking your papers. They just seem kind of [18:22.032 --> 18:28.096] awesome to me. And I was also pretty impressed by your paper where you type drones to do like crazy [18:28.096 --> 18:38.040] acrobatics. Oh, what a tiger. Yeah. I thought you talked about the simulation in that one. [18:38.040 --> 18:45.036] And it seemed like it must be really hard to simulate what actually happens to a drone as it kind of flies [18:45.036 --> 18:51.060] in crazy ways. I mean, I'm not sure, but it seems so stochastic to me just watching a drone. [18:51.060 --> 18:55.092] It's so hard to control a drone. I was actually wondering if that it seems like it must have been a real [18:55.092 --> 19:00.040] simulation challenge to actually make that work. Also, we should put a link to the videos because it's [19:00.040 --> 19:07.076] super cool. Yeah. Yeah. This was an amazing project driven again by amazing students from [19:07.076 --> 19:14.088] from University of Zurich, Antonio, Lucarcio. And first we benefited from some infrastructure that the [19:14.088 --> 19:21.036] quadorder community has, which is they have good quadrovers inulators. They have good models [19:21.036 --> 19:29.084] for the dynamics of quadrovers. We also benefited from some luck, which is that not everything that can [19:29.084 --> 19:35.012] happen to a quadorder needs to be simulated to get a good quadorder control. So for example, [19:35.012 --> 19:44.000] we did not simulate aerodynamic effects, which are very hard to simulate. So a quadorder goes close to a wall [19:44.000 --> 19:53.012] and then gets aerodynamic push rack. It gets really, really hairy. We did not simulate that and [19:53.012 --> 20:00.008] turns out we didn't need to. Because the neural network makes decisions moment to moment, moment to [20:00.008 --> 20:07.036] moment. And if it gets a bit of track, if it's thrown around, no problem. In the very next moment, [20:07.036 --> 20:15.092] it adjusts to the state that it finds itself in. So this is close loop control. If it was open loop control, [20:15.092 --> 20:21.092] well, it would have failed. I see. Interesting. Were there any other details that you had to get right [20:21.092 --> 20:27.092] to make that work? I mean, I'm really impressed to hear. It seems like you're sort of effortlessly [20:27.092 --> 20:32.040] able to jump from simulation to reality. And everyone else that I talked to is like, this is like the [20:32.040 --> 20:38.000] most impossible step, but something about these domains or something about something you're doing. It seems [20:38.000 --> 20:46.064] to work really effectively for you. Yeah. Yeah. So we're getting a hang of this. And there are [20:46.064 --> 20:56.024] a few key ideas that have served us well. One key idea is abstraction. So abstraction is really [20:56.024 --> 21:04.040] really key. The more abstract the representation that a sensor or a sensor modality produces, [21:04.080 --> 21:11.020] the easier it is to transfer from simulation to reality. So what do you mean, abstract? Can you give [21:11.020 --> 21:16.088] me an example of both the abstract versus not? Yeah. Let's look at three points on the abstraction [21:16.088 --> 21:22.016] spectrum. Point number one, a regular camera, like the camera that is pointing at you now, [21:22.016 --> 21:28.072] in the camera that is pointing at me now. Point number one. Point number two, a depth map coming out [21:28.072 --> 21:34.008] of a stereo camera. So we have a stereo camera. It's a real sensor. It really exists for diseases of [21:34.008 --> 21:42.016] depth. Let's look at that depth. Point number three, sparse feature tracks that a feature extractor [21:42.016 --> 21:49.036] like like sift would produce. So just very salient points in the image and just a few points [21:49.036 --> 21:56.016] that are mean tracked through times. So you're getting just a document. So the depth map is more abstract [21:56.016 --> 22:02.072] than the colorage. Why is that? Because there are degrees of variability that would affect the [22:02.072 --> 22:10.088] color image that the depth map is invariant to the color of that rack behind you would massively affect [22:10.088 --> 22:18.080] the color image but would not affect the depth map. Is it sunny? Is it dark? Are you now at night [22:18.080 --> 22:26.056] with your environment lit by lamps? All of that affects the color image and it's brutally hard to simulate? [22:26.056 --> 22:33.068] And it's brutally hard to nail the appearance so that the simulated appearance matches the statistics [22:33.068 --> 22:39.036] of the real appearance because we're just not very good at modeling the reflectance of [22:39.036 --> 22:47.012] real objects were not good at dealing with translucent, see, refraction were still not so great at [22:47.012 --> 22:54.040] simulating, simulating light transport. So only things that determine the appearance of the color image [22:54.040 --> 23:00.096] very very hard to simulate. The depth map is invariant to all of that. It gives you primarily [23:00.096 --> 23:08.080] a reading of the geometric layout of the environment. So if you have a policy that operates on depth maps, [23:08.080 --> 23:14.080] it will transfer much more easily from simulation to reality because things that we are not [23:14.080 --> 23:24.056] but at simulating like that the actual appearance of objects. And then if you take something even more abstract [23:24.056 --> 23:30.048] let's say you run a feature extractor, a farce, a farce feature tracker. Through time the video will [23:30.048 --> 23:38.040] just be a collection of points like a moving dot, a moving point display. It actually still gives you [23:38.040 --> 23:45.052] a lot of information about the content of the environment but now it's invariant to much more. It's [23:45.052 --> 23:51.036] invariant also to geometric details and quite a lot of the content of the environment. So maybe you [23:51.036 --> 23:56.088] don't even have to get the geometry of the environment and the detailed content of the environment [23:56.088 --> 24:03.044] right either. So now that's even more abstract. And that last representation is their presentation [24:03.044 --> 24:11.084] that we used in the deep-drawn, probiotics project. So the drone even though it has a camera and it could [24:11.084 --> 24:19.084] look at the color image, it deliberately doesn't. It deliberately obstructs away all the appearance [24:19.084 --> 24:25.084] and the geometric detail and just operates on sparse feature tracks. And turns out that we could [24:25.084 --> 24:35.060] train that policy with that sensory input in very simple simulated environments and they would just [24:35.060 --> 24:41.044] work out of the box in the real world. Well that's so interesting. It makes me wonder, I mean people [24:41.044 --> 24:45.044] that we've talked to talk to about sort of end-to-end learning with like autonomous vehicles versus [24:46.016 --> 24:51.076] pieces. And I guess I would ever consider that if you kind of break it up more, have like more [24:51.076 --> 24:57.036] intermediate representations, it might make simulation easier transferring from simulation to the real [24:57.036 --> 25:04.080] but that makes it actually a total sense. Yeah. So I think for example the output of a lighter is [25:04.080 --> 25:11.092] easier to simulate than the original environment. That gave rise to that output. So if you look at [25:11.092 --> 25:18.040] the output of a lighter it's a pretty sparse point. If you train a policy that operates in this sparse [25:18.040 --> 25:25.004] point set, maybe you don't need a very detailed super high fidelity model of the environment. Certainly [25:25.004 --> 25:31.052] maybe not of its appearance because you don't really see that appearance are reflected much in [25:31.052 --> 25:38.000] in the lighter reading. Interesting. I guess I also wanted to ask you about another piece of work [25:38.000 --> 25:42.056] that you did that was intriguing which is this simple factory paper where you have kind of a setup [25:42.056 --> 25:48.024] to train things much faster. And I have to give as to this is what I kind of struggle to understand [25:48.024 --> 25:55.036] what you were doing. I would love just kind of a high level explanation. Maybe I'm not reinforcement [25:55.036 --> 26:00.008] learning expert at all. Maybe we kind of like set up with the problem is in kind of what your [26:00.008 --> 26:07.004] contribution is that made these things run so much faster. Yeah. So our goal is to see how far we can [26:07.004 --> 26:14.008] push the throughput of sensory motor learning systems in simulation. And we're particularly interested [26:14.008 --> 26:22.000] in sensory motor learning in immersive three-dimensional environments. I'm personally a bit less [26:22.000 --> 26:30.000] jazzed by environments such as board games or even Atari because it's still quite far from the real [26:30.000 --> 26:35.004] world. Although you have to refer a lot of work on it, haven't you? I don't say it from someone. [26:36.008 --> 26:43.036] Right. So we've done we've done some but with really exciting. I see, okay, definitely. [26:43.036 --> 26:50.056] Is it training systems that work immersive in immersive 3D environments because that to me is the [26:50.056 --> 26:57.052] big prize. If we do that really really well, that brings us closer to deploying systems in the physical [26:57.052 --> 27:02.024] world. The physical world is three-dimensional. The physical world is immersive, [27:02.024 --> 27:09.052] perceived from a first person view on board sensing and computation by animals including humans. [27:09.052 --> 27:15.084] And these are the kinds of systems that I would love to be able to be able to to create. So that's where [27:15.084 --> 27:21.052] we try to go and our simulated environments. And these simulated environments tend to be, if you're not [27:21.052 --> 27:27.044] careful, they're pretty computationally intensive. And if you just use, again, if you use out of the [27:27.044 --> 27:34.064] box systems, you will notice a pattern here. If you just use tools out of the box and have some high [27:34.064 --> 27:40.016] level Python scripting on top of existing tools, you'll basically have a simulation environment that runs [27:40.016 --> 27:47.044] at 30 frames per second, maybe 60 frames per second. You're roughly collecting experience and [27:47.044 --> 27:55.060] something that corresponds to real time. Now, as we mentioned, it takes a human toddler a couple of [27:55.060 --> 28:01.092] years of experience to learn to walk. And a human toddler is a much better learner, a much more [28:01.092 --> 28:10.008] effective learner than any system we have right now. So two years is a bit slow if you ask me for a [28:10.008 --> 28:17.044] debux cycle. I don't want to have a debux cycle of two years. And in fact, what we need to do is take [28:17.044 --> 28:25.044] this amount of experience and then multiply it by several orders of magnitude because the models [28:25.044 --> 28:33.060] that we're training are much more data hungry and they are much poorer learners than the human toddler. [28:33.060 --> 28:40.080] So then basically we're looking at compressing maybe centuries of experience until we get better at [28:40.080 --> 28:45.028] learning algorithms and the models we design. But with the current models and algorithms, [28:45.028 --> 28:52.040] the challenges to compress perhaps centuries of experience into overnight and overnight training, [28:52.040 --> 28:57.028] which is a reasonably comfortable debux cycle. You launch a run, you go home, you come back in the [28:57.028 --> 29:03.060] morning, you have experimental results. That basically means that you need to operate, you need to [29:03.060 --> 29:10.096] collect experience and use it for learning. And on the orders of hundreds of thousands of frames per [29:10.096 --> 29:17.076] second, millions of frames per second. And this is where we're driving. So in this paper, we demonstrated [29:17.076 --> 29:26.008] a system architecture that in an immersive environment trains agents that act collect experience and learn [29:26.080 --> 29:34.080] in these 3D immersive environments at on the order of 100,000 frames per second on a single [29:34.080 --> 29:43.020] single machine, single single, single server. And the key was basically a bottom-up from scratch [29:43.020 --> 29:51.020] from first principles system design with a lot of specialization. So we have processes that just [29:51.020 --> 29:59.004] collect experience. Agents just run non-stop collect experience. We have other processes that just [29:59.004 --> 30:04.096] learn and update the neural network weights. So it's not that you get an you have an agent that [30:04.096 --> 30:11.004] goes out collect experience. Then does some gradient descent step steps updates its weights, [30:11.004 --> 30:15.084] goes back into the environment, collects some more experience with better weights and so on and so forth. [30:16.056 --> 30:24.040] Everything happens in parallel, everybody is busy all the time. And the resources are utilized, [30:24.040 --> 30:31.084] very, very close to 100% utilization. Everything is connected through high bandwidth memory, [30:31.084 --> 30:37.020] everything is on the same node. So there's no message passing. Because if you look at these rates [30:37.020 --> 30:43.020] of operation, if you're operating at hundreds of thousands of frames per second, message passing is too slow. [30:43.020 --> 30:49.012] The fastest message passing protocol you can find is too slow. It would become the message passing [30:49.012 --> 30:54.072] becomes the bottleneck in the system. So what happens is that these processes just read and write [30:54.072 --> 31:00.032] from shared memory. They just all access the same memory buffers. When the neural network weights [31:00.032 --> 31:06.000] are ready, they're written into the memory buffer. When a new agent is ready to go out collect experience, [31:06.000 --> 31:12.088] it just reads the latest weights from the memory buffer. And there is a curie idea that we borrowed from [31:12.088 --> 31:19.036] computer graphics, which is double buffer. And double buffering is one of the very, very first things [31:19.036 --> 31:26.040] I learned in computer graphics as a teenager. We wrote a simply code and basically, you know, [31:26.040 --> 31:32.024] a lesson one in computer graphics. How do you display, how do you even display an image? Double [31:32.024 --> 31:38.040] buffer is part of, is part of lesson one. The idea is that there are two buffers. The display points to [31:38.040 --> 31:43.060] the front buffer. And that's what's being displayed. That's the active buffer. In the meantime, [31:43.060 --> 31:49.084] the logic of your code is updating the back buffer with the image of the next frame. When the back [31:49.084 --> 31:57.068] buffer is ready, you just swap pointers. So the display points to the start pointing to the back [31:57.068 --> 32:05.028] buffer that becomes the primary one. And then the logic of your code operates in what used to be the front [32:05.028 --> 32:10.072] buffer. So the back buffer becomes the front buffer, the front buffer becomes the back buffer, you keep going. [32:10.072 --> 32:19.004] We introduced this idea into reinforcement learning again to just keep everybody busy all the time. [32:19.004 --> 32:27.052] So the learning processes work on a buffer and write out the, the, the, the, the, the, the, the, the, [32:27.052 --> 32:35.020] the experience collectors have their own, their own buffer that they're writing out sensory data [32:35.020 --> 32:40.016] into. And then they, they swap buffers. There's no delay. And they should, they just keep going. [32:40.016 --> 32:47.012] Hmm. Interesting. Would, would it be possible to scale this up if, if there were multiple [32:47.012 --> 32:53.092] machines and there was a delay in the message passing. So the distributed setting is more complex. [32:53.092 --> 33:01.092] We have avoided it so far. If you are connected over a high speed fabric, then it should [33:01.092 --> 33:11.020] should be possible. We've deliberately maybe handicapped ourselves still even in a fallout project [33:11.020 --> 33:18.008] that we, that we have now that was accepted to to to, to I clear, we limited ourselves to a single note, [33:18.008 --> 33:24.088] because we felt that we will learn useful things if we just constrain ourselves to a single note [33:24.088 --> 33:32.072] and ask how far can we push single note performance. And in this latest paper that was just accepted to [33:32.072 --> 33:40.024] I clear, we basically showed that with a single note, if we again take this holistic end to end [33:40.024 --> 33:47.020] from first principle system design philosophy, we can match results that previously were obtained [33:47.020 --> 33:54.040] on an absolutely massive industrial scale cluster. Yeah, I mean, your learning speed is so [33:55.044 --> 34:00.080] fast to me. It seems faster than actually what I would expect from like supervised learning, [34:00.080 --> 34:06.032] where you're literally just pulling the images off your hard drive. Am I wrong about that? [34:07.036 --> 34:13.084] Oh, yeah. So in the latest work, it's basically the forward pass through the con vanette [34:13.084 --> 34:19.084] is one of the big bottlenecks. It's no longer the simulation. We can simulate so fast, we can [34:19.084 --> 34:25.028] simulate the environment so fast. It's no longer the bottleneck. It's actually like routine [34:25.028 --> 34:31.028] process like even just doing the forward pass in the con vanette. Amazing. So I guess one more [34:31.028 --> 34:35.020] project that you worked on that I was kind of captivated by kind of what to ask about, because I think [34:35.020 --> 34:40.064] a lot of people that watch these interviews would be interested in it too is Carla, which is kind of an [34:40.064 --> 34:45.028] environment for learning autonomous vehicle. So can you maybe describe it and what inspired you to make it? [34:46.008 --> 34:54.000] Yeah, Carla is a simulator for autonomous driving and it's grown into an extensive open source [34:54.000 --> 35:00.032] simulation platform for autonomous driving. That's not widely used both in industry and in research. [35:01.012 --> 35:07.076] And I can answer your question about inspiration. I think it two parts. There is what originally inspired [35:07.076 --> 35:15.084] us to create Carla and then there is what keeps it going. And so what originally inspired us is actually [35:15.084 --> 35:24.048] basic scientific interest in sensory motor learning and sensory motor control. I wanted to understand [35:24.048 --> 35:32.024] how we train intelligent agents that have this kind of embodied intelligence that you see in us and other [35:32.024 --> 35:40.016] animals where we can walk through an environment gracefully deliberately we can get to where we want to go. [35:40.016 --> 35:46.000] We can engage with the environment if we need to rearrange it we we rearrange it. We clearly act [35:46.080 --> 35:54.016] spatially intelligently intelligently in an embodied fashion. And this seems very [35:54.016 --> 36:01.012] important to me and I want to understand it because I think this underlies other kinds of intelligence as [36:01.012 --> 36:08.040] well and I think it's important for us on our way to AI is the loaded term. I think it's very important [36:08.040 --> 36:14.016] for us to understand this aspect of intelligence. It seems very core to me the kind of internal [36:14.016 --> 36:20.088] representations that we maintain and how we maintain them as we move through immersive through [36:20.088 --> 36:27.052] dimensional environments. So I wanted to study this and I wanted to study this in a reproducible fashion. [36:27.052 --> 36:34.096] I wanted to go to tooling. I wanted good environments in which this can be studied. And we looked [36:34.096 --> 36:42.056] around and when we started this work they just weren't very good, very satisfactory environments for us. [36:42.056 --> 36:48.008] We ended up in some early projects we ended up using the game the game do which is a first person [36:48.008 --> 36:56.072] shooter that I used to play as a as a teenager and I still have a warm you know warm spot as spot for. [36:56.072 --> 37:03.068] And we used to do and we used it to a good effect and in fact we still use it in projects and it was [37:03.068 --> 37:10.016] used we used it in the sample factory paper as well. In sample factories another paper that is based on [37:10.016 --> 37:18.000] do essentially on derivatives of John Carmax all code which tells you something something about the guy. [37:18.000 --> 37:23.060] So if people still use your code 25 years later you did something. [37:26.000 --> 37:30.080] But do if you just look at it it's it's not always less than ideal. [37:30.080 --> 37:39.092] I mean you watch or you walk around in a dungeon and you know you engage in a certain diplomacy [37:39.092 --> 37:46.072] of the kind that maybe we don't want to always look at and we don't want our our graduate students to [37:46.072 --> 37:53.044] always be confronted confronted with. I mean there's a lot of blood and corn and somehow wasn't designed [37:53.044 --> 38:01.052] for for AI it was designed for the entertainment of like a primarily teenage boys. So we wanted [38:01.052 --> 38:08.088] we wanted something something a bit more modern and that connects more directly to the kinds of [38:08.088 --> 38:15.036] applications that we have in mind to use full productive behaviors that we want our intelligent systems [38:15.036 --> 38:22.056] to learn. And what I was driving was it was clearly one such one such behavior and I [38:22.056 --> 38:27.076] held the view at the time that I still hold that anonymous driving is a long term problem. It's a long [38:27.076 --> 38:36.016] term game. It wasn't about to be solved as people were saying when we were creating Carla and I don't [38:36.016 --> 38:43.012] think it's I still don't think that it's about to be solved. I think it's a long term ever. So we created [38:43.084 --> 38:51.036] a simulation platform where the task is autonomous driving and as an embodied artificial intelligence [38:52.000 --> 38:56.064] task as an embodied artificial intelligence domain I think it's a great domain. You have a complex [38:56.064 --> 39:04.008] environment you need to navigate through it you need to perceive the environment to make decisions [39:04.008 --> 39:11.060] in real time the decisions really matter if you get something wrong it's really bad so the stakes are high [39:11.060 --> 39:17.084] but you're in simulation. So that was the original the original motivation it was basic scientific [39:17.084 --> 39:25.044] interest and intelligence and how to how to develop intelligence and then the platform became very [39:25.044 --> 39:32.072] widely used people wanted people wanted it for the engineering task of autonomous driving and people kept [39:32.072 --> 39:38.072] asking for more and more and more and more and more features more and more functionality other large institutions [39:38.072 --> 39:44.056] like actual automotive companies I started providing funding for this platform to be maintained and [39:44.056 --> 39:51.012] developed because they wanted it and we put together a team that the team is able to lead by her [39:51.012 --> 39:58.008] man Ross from the original developers of Carla who is now leading an extensive international team that [39:58.008 --> 40:06.024] is really primarily devoted to the autonomous driving domain and supporting the autonomous driving domain [40:06.024 --> 40:12.064] so that's a cool I feel like maybe one criticism of academia I don't know if it's fair or not [40:12.064 --> 40:18.048] is that it has trouble with incentives to to make tools like this that are really reusable like did you [40:18.048 --> 40:24.040] feel pressure to write papers instead of building a robust simulating tool that would be useful [40:24.040 --> 40:34.040] for lots of other people. Well I maintain a portfolio approach where I think it's okay for one thrust of [40:34.040 --> 40:40.064] my research and one thrust of my lab to not yield a publication for a long time because other [40:40.064 --> 40:51.012] thrust's just very naturally and up publishing more so it balances out it balances out I [40:51.036 --> 40:59.020] personally don't see publication as as a product or as a go I see publication as a symptom the [40:59.020 --> 41:05.068] publication is a symptom of having something to say so publications come out they come out of a [41:05.068 --> 41:12.088] healthy rage just because we end up discovering a useful things that we want to share share with people [41:13.020 --> 41:20.072] and I personally find it very gratifying to work on a project for a long time and do something [41:20.072 --> 41:30.048] substantial maybe then publish and if people use our work and it's useful to them that is its own reward [41:30.048 --> 41:39.028] to me so even if there is no publication if people find our work useful I I love it I find it very very [41:39.028 --> 41:45.044] gratifying yeah I can I can totally relate to that could I ask you more open-ended questions since [41:45.044 --> 41:51.076] we're kind of getting to the end of this I guess I wonder you know when I look at ML applications I [41:51.076 --> 41:58.008] guess probably to find ML you know the one that is kind of mysterious to me is robotics right like I [41:58.008 --> 42:04.048] feel like I see ML like working all over the place you know like I see it's just so easy to find you know [42:04.048 --> 42:10.072] like suddenly like my camera can search semantically you know and you know but then you know I feel [42:10.072 --> 42:17.020] like the thing that that that I can do that computers most can't do is kind of pick up an arbitrary [42:17.020 --> 42:22.080] object and move it to where and it seems like you've been really successful getting these things to work [42:22.080 --> 42:28.096] to some degree but I guess I always wonder like what is so hard about robotics and and is this like [42:29.052 --> 42:35.076] do you think there will be like a moment where the suddenly starts working and we see ML robot applications [42:35.076 --> 42:43.084] all over the place or is it is it's always going to remain like a huge challenge I don't think it will [42:43.084 --> 42:51.060] always remain a huge challenge I don't think there is there is magic here the problem is qualitatively [42:51.060 --> 42:58.016] different from your perception problems such as computer vision and you know being able to tell your [42:58.016 --> 43:06.072] camera you know where is Lucas and and the camera will find the fine because the problem is qualitatively different [43:06.072 --> 43:12.048] but I don't think the problem is insurmountable I think we're making good good progress so the [43:12.048 --> 43:19.028] challenges that to learn to act you need to actually act to act an environment you need to act in a living [43:19.028 --> 43:25.044] environment if you act in a physical environment you have a problem because the physical environment runs [43:25.044 --> 43:31.068] in real time so you're potentially looking at the kinds of debug cycles that we mentioned with a human [43:31.068 --> 43:36.016] toddler or something takes a couple of years to learn and in these couple of years I mean that the [43:36.016 --> 43:44.072] toddler is also an incredibly robust system the toddler control problem so during this time you know [43:44.072 --> 43:52.056] you run out of battery power you fall you break things you need a physical space in which all of this [43:52.056 --> 44:01.068] happens and then if you're designing the outer learning algorithms you need to do this in parallel on many [44:01.068 --> 44:06.040] many many many many variations you have many many many many many many slightly different toddlers to see [44:06.040 --> 44:12.064] which one learns learn learn better so it's very very hard to make progress in this regime so I [44:12.064 --> 44:21.028] think we need to identify the essential skills the underlying skills that and I think many of these [44:21.084 --> 44:31.020] can be understood and modeled in essentially our equivalent of model systems so if you look at the [44:31.020 --> 44:38.096] neuroscience for example much of what we know about the nervous system who was not discovered in humans [44:38.096 --> 44:48.008] in the human nervous systems it was discovered in model systems such as squids so a squid is pretty [44:48.008 --> 44:55.076] different from human but it shares some essential aspects when it comes to the operation of the nervous system [44:56.048 --> 45:03.012] and it's easier to work with okay for very many reasons as squids are just easier to work with [45:03.012 --> 45:10.032] than than humans nobody says that if we understand squid intelligence we will understand everything about [45:10.032 --> 45:16.080] about human intelligence and how to write novels and compose music but we will understand many many [45:16.080 --> 45:23.068] essential things that advance the field forward I believe we can also understand the essence of embodied [45:23.068 --> 45:33.068] intelligence without worrying about let's say how to grasp wet slippery pebbles and how to pour coffee [45:33.068 --> 45:41.012] from a particular type of container maybe we don't need to simulate all these complexities of the physical [45:41.012 --> 45:49.004] world we need to identify the essential features that that really bring out the essence of the problem the [45:49.004 --> 45:58.000] essential aspects of spatial intelligence and then study these inconvenient model systems that's what we try [45:58.000 --> 46:05.052] to do with a lot of our work and I think we can actually make progress make progress enough to bootstrap [46:06.024 --> 46:13.084] physical systems that are basically intelligent enough to survive and not cause a lot of damage when they [46:13.084 --> 46:19.036] are deployed in the physical world and then we can actually deploy them in the physical world and [46:19.036 --> 46:27.012] start tackling some of these last millimeter problems such as how to grasp a slippery slippery glass [46:29.012 --> 46:32.072] that's so just because it really lasts millimeter because I feel like something just like [46:33.036 --> 46:38.072] I mean you would know better than me but just like the way fabric hangs or the way like liquid spill I [46:38.072 --> 46:46.000] understand that those are incredibly hard to simulate with any kind of accuracy as we would recognize it [46:46.000 --> 46:51.012] you think that that's like actually in the details and the more important thing is like what what is the [46:51.012 --> 46:58.040] more important thing then to know how to simulate quickly or what what where is the productive access to [46:59.004 --> 47:09.036] the improved but one problem that that I think a lot about that seems pretty key is the problem of internal [47:09.036 --> 47:17.036] representations of spatial environments that you need to to maintain so suppose you want to find your [47:17.036 --> 47:23.004] keys you're in an apartment you don't remember where you left your keys you want to find your keys [47:24.064 --> 47:31.060] so you need to move through the apartment and you need to maintain some representation of it or you [47:31.060 --> 47:38.056] are in a new restaurant and you want to find the bathroom you've never been there before you want to [47:38.056 --> 47:43.068] find the bathroom I've done this experiment many times you always find the bathroom and you don't [47:43.068 --> 47:50.096] even need to ask people right how how how do you do that what what what what is that so I think I think [47:50.096 --> 47:57.076] these questions these behavior step into actually an important so what to be feels like an essential aspect [47:57.076 --> 48:04.016] of embodied intelligence and essential aspect of spatial intelligence and I think if we figure that out [48:04.016 --> 48:12.048] we will be on our way we will not be done but we will be on our way then there is the very detailed [48:12.048 --> 48:20.064] aspect one of my favorite challenges long-term challenges for robotics is Steve Wozniak's challenge which is [48:20.064 --> 48:28.080] that a robot needs to be able to go into a new house that it's never been in before and make a coffee [48:31.092 --> 48:42.000] so that I think will not be solved with just the skill that I mentioned to you that does rely on [48:42.000 --> 48:47.084] some of these less millimeter problem of sort of the detailed accentuation also reasoning about the [48:47.084 --> 48:54.024] functionality of projects of objects and I think we're actually we're actually far I don't think it's [48:54.024 --> 49:00.024] going to happen next year I think we're quite far but it's a very exciting journey awesome I love it [49:00.024 --> 49:06.072] thanks so much for your time that was a lot of fun thank you so much Lucas thanks for listening to another [49:06.072 --> 49:12.064] episode of Green Decent do these interviews are a lot of fun and it's especially fun for me when I can [49:12.064 --> 49:18.024] actually hear from the people that are listening to the episode so if you wouldn't mind leaving a comment [49:18.024 --> 49:22.056] and telling me what you think or starting a conversation that would make me inspired to do more of these [49:22.056 --> 49:27.028] episodes and also if you wouldn't mind liking and subscribing I'd appreciate that a lot

54.11845

54.83838

3y ago

47s

Nov 21 '22 16:56

1hn08xuw

Finished

Nov 21 '22 16:56

2038.896000

/content/nimrod-shabtay-deployment-and-monitoring-at-nanit-agwzytw7tcs.mp3

tiny

[00:00.000 --> 00:09.020] The focus as I see it in the industry is shifted from sometimes making the models into [00:09.020 --> 00:15.084] make them work well in the real world and be able to be flexible enough and adapt changes. [00:15.084 --> 00:21.068] So that's, I guess they can say that many times maintaining the model and make it good and [00:21.068 --> 00:27.008] reliable out there is sometimes much harder than actually developing it. [00:27.008 --> 00:33.016] You're listening to gradient descent, a show about machine learning in the real world and I'm your host, Lucas B. [00:34.016 --> 00:41.016] Nimrod is a senior computer vision algorithms developer at Nanette and the father of two children. [00:41.016 --> 00:45.088] Nanette develops smart baby monitoring systems and it's a product that I happen to use every day. [00:45.088 --> 00:47.088] So I'm extra excited to talk to him. [00:49.048 --> 00:54.092] So Nimrod, I'm super excited to talk to you about the article you wrote on ML and production, [00:54.092 --> 01:01.056] but I'd say I'm especially excited to talk to you because you make maybe the app that I use the most these days, [01:01.056 --> 01:07.000] the Nanette app. So my daughter actually turned one today and we've been using it for the last year. [01:07.000 --> 01:12.012] And basically every morning my mother-in-law and my wife kind of discussed the stats from the previous [01:12.012 --> 01:17.088] night's sleep. So I really, really love your app. I could say that honestly and I was proud to discover [01:17.088 --> 01:22.004] that you were customers of weights and biases. But as wondering if you could start by maybe [01:22.004 --> 01:25.088] talking about what your app does and what the history of the company is and how you think about that? [01:25.088 --> 01:32.012] Yeah sure. So first I'm happy to be here and the whole company started by an idea of one of [01:32.012 --> 01:39.008] a staff, one of the founders that actually wanted to monitor his son's sleep during the night since [01:39.008 --> 01:45.064] he came from the whole world of processes and monitoring using cameras and he wanted to [01:46.068 --> 01:53.088] take that to his son and it started as a project when he was at Cornell University and everything just [01:53.088 --> 02:00.028] rolled from there actually and since we have a camera and he's from the field of computer vision, [02:00.028 --> 02:04.068] we started the cameras and started doing a smart baby monitor using computer vision algorithms that [02:04.068 --> 02:12.060] can attract as you know sleep also the breathing motion and let you celebrate the milestones of your [02:12.060 --> 02:18.092] baby for example you know sleeping feeling sleep first time on his own and sleeping through the night without [02:18.092 --> 02:23.096] any visits from the parents which is great for us the parents of course and they're giving you [02:23.096 --> 02:31.000] specific as lip tips in order to improve your babies and sleep and actually the key or I can [02:31.000 --> 02:37.096] say what guides the company is what value can we extract from visual data that the camera collects. [02:38.052 --> 02:45.088] So it's kind of obvious on sleep and of course on on breathing for young babies but also this is [02:45.088 --> 02:50.084] the guidelines that guide us for the next products and features how to give value in terms of wealth [02:50.084 --> 02:59.016] and wellness to our customers and it's also really unique since also this product has two hats basically [02:59.016 --> 03:06.028] we can have the hat of a consumer electronic product as you use it and it's also a for research [03:06.028 --> 03:12.012] tool which started to to being used more and more recently researchers are doing the homeship research [03:12.012 --> 03:17.016] so it's pretty cool that you know science and technology are working together and we get to deliver a [03:17.016 --> 03:24.028] really interesting product. That is really cool and I think you know folks who are listening to this [03:24.028 --> 03:31.024] who haven't had children yet might not realize how essential sleep is for your sanity. It's apparent [03:31.024 --> 03:38.060] and how also how important sleep is for the sanity of your child so that's not much more about sleep in [03:38.060 --> 03:45.000] the last year than I ever thought about before. One of the key advantages of the product is you know [03:45.000 --> 03:50.052] as parents you get up at night for your children and your drowsing and you don't remember exactly [03:50.052 --> 03:57.064] did I get up like two times it was in 3 a.m maybe it was 5 I don't remember and then it just collects [03:57.064 --> 04:04.076] you the data and serves to you clearly in order to make you know useful summary of the of the night [04:04.076 --> 04:09.064] and you can also you know make data driven decisions if you want and not by beliefs because you know [04:09.064 --> 04:16.036] this whole field of babies sleep is full with beliefs some say that this method works better some say [04:16.036 --> 04:21.072] the other and here you get you know the facts you get it either the baby's left well the baby's left [04:21.072 --> 04:27.096] better the baby didn't slept that good this night and we also see that you know since parents are more [04:27.096 --> 04:33.072] focusing on the baby's sleep also babies with a night sleep's better. They sleep longer the [04:33.072 --> 04:38.084] their sleep quality is better because everyone is you know it's in this process and they're focusing [04:38.084 --> 04:43.064] so so it's pretty amazing I'm not saying that's really amazing how do you know that babies with [04:43.064 --> 04:53.064] that is nanitsleep better. So we have all our juser base and we often serve as to tour customers and [04:53.064 --> 04:59.008] they actually respond to that and we see in the statistics and what they're telling that just babies [04:59.008 --> 05:04.068] with nanitsleep better because you're more aware of that and they actually the tips are useful [05:04.068 --> 05:11.016] so you're in a mindset of proving and how sleep is important so I guess that's that's very cool so [05:11.016 --> 05:17.016] can you break down what the you know this is you know supposed to be an ML podcast of the [05:17.016 --> 05:21.056] parenting has been coming up I'm not for a lot lately or we've been talking to you can you kind of break [05:21.056 --> 05:26.084] down the the pieces that are kind of ML problems or computer vision problems that you need to [05:26.084 --> 05:34.092] solve to make the app work. Yeah so we we use all sorts of computer vision algorithm in order to get [05:35.064 --> 05:41.000] good understanding of the scene. I mean in order to know for example when the baby is falling [05:41.000 --> 05:47.048] sleep on his own and whether a parent comes to visit or not all those are actually computer vision [05:47.048 --> 05:52.076] problems that we need to solve and we actually serve multiple models during the night in order to get [05:52.076 --> 05:58.092] the whole scene understanding and on top of that we take those outputs from the models and [05:59.048 --> 06:06.028] serve you the data much more clearly so it's been a lot going on during the night. And so do you run [06:06.028 --> 06:11.072] the models on the phone or do you run it them in the cloud how does that work mostly I'm mostly in [06:11.072 --> 06:17.032] the cloud we do have some algorithms that are running on the camera as well but mostly on the cloud. [06:19.000 --> 06:23.088] And can you give me some sense of like what the scale of this is like how much data your models [06:23.088 --> 06:27.056] are handling or like how how many streams of video you get in a typical night. [06:28.028 --> 06:37.040] Yeah so for a let's take a short example we have more than 100,000 users and you know we have [06:37.040 --> 06:44.084] a full night which basically means that if we serve for example every 10 minutes or so we get into [06:46.044 --> 06:54.052] a few tens of millions of calls for models per night so it's a nice scale I mean we get to [06:55.016 --> 07:02.004] serve over tens of millions of requests per night to all our users. And these are pretty sensitive [07:02.060 --> 07:07.048] models and I've noticed that you've never gone down I mean at least in my experience like it seems [07:07.048 --> 07:12.052] like you do a really good job with reliability and I would think you'd have maybe a higher reliability [07:12.052 --> 07:18.044] bar than some other applications that folks who talk to. Yeah well you're right since babies are [07:18.044 --> 07:22.076] actually the most important things to the parents that we try to be reliable is possible in terms [07:22.076 --> 07:29.000] of robustness of the models and accuracy of the models and also in terms of run time and to reduce [07:29.000 --> 07:34.012] down time as much as possible because again as ever and expect our algorithms to work all the time [07:34.012 --> 07:40.084] and give them the data especially when it comes to babies. So we're putting a lot of effort and that [07:40.084 --> 07:47.088] as well. And I guess the sleeping model is important but the one that seems must be kind of anxiety [07:47.088 --> 07:52.012] producing and I'm just talking about it it's giving me anxiety but the breathing motion monitoring [07:52.012 --> 07:59.056] you know is that also an ML model that checks for that? Well we use multiple models there so [07:59.056 --> 08:04.044] there are some models that are more of machine learning deep learning base and there are some computer [08:04.044 --> 08:11.048] vision classics models as well. There are also some models and why do you use multiple models for [08:11.048 --> 08:18.036] a single application? Well we have many any tasks that we need to solve in order to to get this product [08:18.036 --> 08:25.000] to be reliable and robust enough especially when we're talking about breathing motion. So I guess when you [08:25.000 --> 08:31.008] look at like handling all like you know millions of requests per night I guess like what are some things [08:31.008 --> 08:36.052] that you do to make sure that this is reliable and make sure that your compute spend is same like how [08:36.052 --> 08:41.008] do you how do you think about like model architecture and how do you deploy your models and you know [08:41.008 --> 08:47.016] what frameworks and toolstays? So it's pretty interesting. It's our team. We actually [08:47.016 --> 08:54.052] responsible for the whole flow and to end. I mean from developing and defining the task all the research [08:54.052 --> 09:00.028] selecting the model architecture even conducting proof of concept many times will probably [09:00.028 --> 09:07.000] elaborate on that later because I think it's really important nowadays for predictions in the industry. [09:07.064 --> 09:12.068] Also the whole training process of course over you can in the picture with some great tools [09:12.068 --> 09:19.000] helping us find which which models and experiments are better evaluating which is actually pretty interesting [09:19.000 --> 09:25.008] because we try to conduct an evaluation metrics that also holds the product objectives inside as well [09:25.008 --> 09:30.028] because you know we're not building models in vacuum we're all tied up to a product and a value to [09:30.028 --> 09:35.080] give to our customers so it's not always that straightforward and until deploying a two production [09:35.080 --> 09:42.068] and including building monitoring systems which should be our eyes out there eventually and runtime optimization [09:42.068 --> 09:48.076] as you said enough to spend so much on compute so it's it's pretty complicated flow but over the last [09:48.076 --> 09:56.092] few projects we actually formed a nice formula for it which I I posted on a medium blog post as a [09:56.092 --> 10:04.076] guidelines which proven to be successful in the fast few times and it's actually in the trend at least [10:04.076 --> 10:11.000] as I see it now I mean every time we don't Twitter or LinkedIn or whatever about people that [10:11.000 --> 10:18.060] are talking how to maintain and deploy and make good models in production because there isn't any [10:18.060 --> 10:24.028] silver bill there so you know and there are companies that always trying to solve the whole pipeline [10:24.028 --> 10:30.036] some sort part of it so it's really interesting I mean I mean the focus as I see it in industries [10:30.036 --> 10:38.036] shifted from sometimes making the models into make them work well in the real world and be able to be [10:38.036 --> 10:44.068] flexible enough and adapt changes so that's I guess they can say that many times maintaining the [10:44.068 --> 10:51.016] model and make it good and reliable out there is sometimes much harder than actually developing it [10:51.016 --> 10:56.052] which is kind of amazing if you think of it I guess that wasn't exactly the focus like a few years ago [10:56.052 --> 11:01.064] but kind of like get there tell me some stories about you know some stuff that you're running to and tell [11:01.064 --> 11:06.036] me like if you get tell me like specifically like maybe pick a model and what it does and kind of like [11:06.036 --> 11:12.020] what were the issues that you ran into in the process of getting it deployed and running yeah we can [11:12.020 --> 11:17.088] take object detector as an example we use them of course you know product and and in this case [11:17.088 --> 11:21.064] an object detector be like a baby detector or like a parent detector is that [11:22.036 --> 11:30.004] our example yeah it can be let's say for example yeah baby a baby detector and so when you take a [11:30.004 --> 11:36.036] baby detector and and you actually want to start building it you must be aware of for example [11:36.036 --> 11:41.048] devaluation on how are you going to be formed I mean that's that's a pretty that's a common [11:41.048 --> 11:46.068] pitfall I mean I mean people can choosing the right evaluation metrics is pretty tricky and I know [11:46.068 --> 11:53.032] that I can say for myself I have to recover from some bad decisions you know and and it's actually [11:53.032 --> 11:58.084] how you look on the model and and if you could break that down I mean so like what would be a bad [11:59.048 --> 12:03.024] evaluation metric from a baby detector because I can think like probably some people are listening [12:03.024 --> 12:08.052] this and thinking like okay like accuracy sounds like a pretty good metric but like what what would [12:08.052 --> 12:15.016] be kind of a metric that might lead you astray with a baby detection model so okay let's you know [12:15.016 --> 12:21.032] let's take just a toy example about it and let's say we have a baby detector and you know it's [12:21.032 --> 12:26.036] it's accuracy is let's say it's pretty good but you know we care more about eventually in the product [12:26.036 --> 12:33.008] we care more about the false positive than the false negative for example okay and and and how you look [12:33.008 --> 12:38.036] on the evaluation metrics can really affect that so if you file give a little bit more a way to the [12:38.036 --> 12:44.044] false positive like we saw for example a decrease in the accuracy on some metrics that actually [12:44.044 --> 12:51.040] average everything and once but that eventually this is the right metric and we get a much higher performance [12:51.040 --> 12:58.004] or also the other way around I mean we have a model that is very high accuracy but eventually since [12:58.004 --> 13:04.052] the product was aimed to try to decrease false positives the product metric was way lower [13:05.032 --> 13:10.076] so it's really how you look at it and that's a trick part I think so I guess what metric then could you [13:10.076 --> 13:16.084] move to and then what would you do to improve that metric so once once you once you define the metric you [13:16.084 --> 13:23.064] can also you can always try and see where the weak cases and then maybe how you can strengthen them [13:23.064 --> 13:30.028] even if it's more data or even if it's especially kind of annotation alpha augmentations so but again [13:30.028 --> 13:35.096] are those things that can be you know under the radar if you don't give them enough weight I mean that's [13:35.096 --> 13:41.080] common failure case that actually happened in the past wait so could you explain one more time like what [13:41.080 --> 13:48.012] happens there in this in this failure case yeah so let's say for example we choose an accuracy overall [13:48.012 --> 13:53.056] measure for a baby detector but we missed the text sorry the baby when it wasn't there but we had the [13:53.056 --> 13:59.024] high recall and which come but say that and eventually we got to very high accuracy but for example for [13:59.024 --> 14:04.036] other product purposes the precision needed to be higher in order to give you enough value to the [14:04.036 --> 14:09.056] product and so it's actually another way of looking at this looking over the precision is you know the biggest [14:09.056 --> 14:15.072] parameter for us and so once we change to look at that we could clearly see the problem and fix that [14:16.068 --> 14:23.064] and how do you fix a problem like that so collecting the data my zoom in a much dedicated way [14:23.064 --> 14:31.000] to your problem maybe see whether you're actually collecting right data and not just you know maybe [14:31.000 --> 14:37.096] random sample the data at some point but actually direct yourself to two devices at the model we look [14:37.096 --> 14:44.028] when it's in production and so you want to try to imitate that and collect data from those parts in [14:44.028 --> 14:49.032] order to make your model trained on what's actually going to see and not on what's easy to collect [14:49.032 --> 14:54.092] that's one of probably the best solutions so collecting data of the cases that where you think your model [14:54.092 --> 15:02.060] struggling and adding that as opposed to random sampling for example or maybe collecting the right data to [15:02.060 --> 15:07.072] your problem I mean I mean you can collect data in many ways and collecting the data that sort of [15:07.072 --> 15:13.088] suits your problem is the first thing actually I think you need to do and put a lot of [15:13.088 --> 15:19.000] thought about it it's actually my first bullet on the guidelines start by defining what [15:19.000 --> 15:25.048] what's the right data for you don't just collect data and start working on the model because you're going to waste time [15:27.024 --> 15:34.028] and do you have ways of explaining to like you know a business person how to justify the cost of [15:34.028 --> 15:39.080] data collection in terms of some metric that they care about like is that an important thing for [15:39.080 --> 15:47.048] you we try and add it to keep close connection between the product and the algorithm [15:47.048 --> 15:55.064] performances because you know data collection is very expensive and and our time and our resources are [15:55.064 --> 16:02.084] very expensive so we try not to make perfect models that will have no effect on the product so yeah [16:02.084 --> 16:08.084] I guess this process is pretty easy for us because this is one of the first priorities when we start a project [16:09.088 --> 16:15.008] and are you also in parallel experimenting with different kinds of algorithms or doing hyper-primator [16:15.008 --> 16:18.076] searches like is that important to you at all or is it really just the actual collection? [16:18.076 --> 16:25.024] No no no no it's I mean the data collection is good but we actually we're doing all sorts of [16:25.080 --> 16:31.080] hyper-primator tuning and choosing models and we have really organized mythology about what you [16:31.080 --> 16:37.080] do first so could you tell me your methodology? Well I mean that in particular but we start I [16:37.080 --> 16:42.020] guess the good thing to do is maybe start we try to get the best model you can get and try to [16:42.020 --> 16:47.072] get that and up around the performance and you know ignore speed in ignore runtime for example just to see [16:47.072 --> 16:54.004] what you're up and about from the program because in in many cases you know the algorithms are working [16:54.004 --> 16:59.096] on on public data sets and everyone you know detectors work on MS cocoa and classification for [16:59.096 --> 17:06.084] example on image net but not in all cases it's a good proxy to your problem medical medical images have [17:06.084 --> 17:13.088] their own datasets but some other parts the data is not always you know natural image styles so you [17:13.088 --> 17:20.060] gotta try models and many and many hyper-primator tuning it's most of the world from training I mean [17:20.060 --> 17:28.068] it's not actual work but like so on of time so. And then once the models deployed D-D stop there I would [17:28.068 --> 17:34.004] imagine you'd have a pretty you'd have kind of new problems that would come up like D-D-D-C like data [17:34.004 --> 17:41.000] drift as an issue for you or like how do you think about production monitoring so we put a lot of effort [17:41.000 --> 17:46.036] in production monitoring I think it's really important and people sometimes underestimate that because [17:46.036 --> 17:53.032] once you deploy a model I guess nothing it's not ending it's actually the beginning because it's much harder [17:53.032 --> 18:00.036] and you need to invest really good planning and making your monitoring systems to be [18:00.036 --> 18:05.064] reliable enough and give you enough confidence once you deploy the model that's the all thing you can see [18:05.064 --> 18:12.004] and the performance on the tests that you get before you deploy the model is just a single time [18:12.004 --> 18:18.004] and after that you'll get many time friends with performances as you need your monitoring to be [18:18.004 --> 18:23.064] reliable enough to spot some shifts and maybe sudden drops and try to understand what happened [18:24.036 --> 18:31.080] so I guess I can say that we we never stop with the models we always look on the monitoring and see [18:32.092 --> 18:38.060] where we can see any problems and what's it's connected to so. I think one of the issues is you don't [18:38.060 --> 18:44.036] really have ground truth in production so how do you know if there's a problem. It's true it's pretty complicated [18:44.036 --> 18:51.056] so we we always consider a prediction distributions and common stuff like that we also use other routes [18:51.056 --> 18:57.096] as well for example user satisfaction and maybe tickets they open and so we can spot some maybe a problem [18:57.096 --> 19:03.072] stared that we didn't caught up in our monitors so we tried to find a source whenever we can [19:04.036 --> 19:09.080] usually for a matter of time the company as well. Interesting I always wonder how people do [19:09.080 --> 19:15.000] I've heard different variants but well you actually like you know file a ticket against the ML team if you [19:15.000 --> 19:20.004] find like a bad prediction and like what do you do with a ticket like that? Well and they don't [19:20.004 --> 19:26.044] file it specifically to the ML team but yeah there's people file tickets for bad predictions because their [19:26.044 --> 19:33.040] everything is actually based on that so you know you can get wrong statistics and bad results and [19:33.040 --> 19:40.036] your parent you want to get the data for your child you pay for this product and you want answers. [19:40.036 --> 19:47.000] It's actually it's actually quite a challenge since we have so many users and we need to keep our [19:47.000 --> 19:54.004] models in a very high performance level in order enough to make so many tickets for us and also make the [19:54.004 --> 20:02.020] experience for a user as much better so it's a challenge. I want to get you talked about in your [20:02.020 --> 20:08.044] paper or your your medium post was kind of preparations before deploying a model to production. [20:08.044 --> 20:15.008] Can you talk about like how that works? Yeah we try to simulate as much as possible how everything [20:15.008 --> 20:24.028] will be in production for example we we actually create a production like environment and we also get [20:24.028 --> 20:30.084] some of the users to use that of course there are supportive and they are aware that there are going [20:30.084 --> 20:37.040] to be changes and we try to monitor everything we can in order to see that our model formed the [20:37.040 --> 20:43.056] way we expect that we don't see any issues and that's of course in the mean in parallel we also do all [20:43.056 --> 20:50.028] those end-to-end tests or follow our algorithms together to see that the new model behaves as it should be [20:50.028 --> 20:55.096] and it doesn't rise any special problems from anything new block or maybe improving the [20:55.096 --> 21:03.080] that's most of the work that's done there. Got it got it. Could you tell me a little bit about how [21:03.080 --> 21:09.056] weights and biases fits into your workflow and how you use the weights and biases tool? Yeah so with [21:09.056 --> 21:15.040] weights and biases we do all of our we manage all of our experiments which is great we also use your [21:15.040 --> 21:23.040] visualization tools in order to compare between experiments and and since you have everything so [21:23.040 --> 21:30.044] shiny and dynamic we can also try different parameters and see what could have been and without [21:30.044 --> 21:36.076] training the all the model over and over again which would save time. I'm pretty huge fan of [21:36.076 --> 21:42.068] their reports that you can do because as I said before we are really tied up with the product team about [21:42.068 --> 21:51.096] the algorithms we do so it actually makes us a way to show them what we do and and visualize on [21:51.096 --> 21:58.052] real time how each parameter affects the results and we talk about what should be better for the [21:58.052 --> 22:04.020] product and the algorithm team together so yeah we'll try it a lot and so you actually use reports [22:04.020 --> 22:11.008] to share results with the product team. Yeah we also we also use reports to to summarize and share [22:11.008 --> 22:16.044] with product team students and maybe model weaknesses whether we want to be to deal with this now [22:16.044 --> 22:24.060] maybe deal with later and and for example how changing parameters can help it also it's better for [22:24.060 --> 22:29.064] mutual work and transparency because sometimes you try to be a little bit suspicious from things you don't [22:29.064 --> 22:35.024] understand and once we understand their job and understand our job I think the mutual job is much better [22:35.024 --> 22:39.072] and we've seen that once you talk about it and you explain and they can understand your role [22:39.072 --> 22:44.068] then you can understand theirs we can make decisions which are much more good for the company. [22:45.096 --> 22:51.072] So it's actually pretty useful for us. Do you often go down paths where there's a [22:51.072 --> 22:56.044] product feature you might want to make but you're not sure if you're going to be able to make [22:56.044 --> 23:01.080] the machine learning algorithm accurate enough for powerful enough to actually make the feature possible [23:01.080 --> 23:07.008] do you ever get in situations like that all the time this this is one of the main challenges we have [23:07.008 --> 23:13.032] when working with this scale and working on such sensitive data I mean we got such so many [23:13.032 --> 23:19.000] cool ideas and papers and works and it's really hard to get them into production this gap is [23:19.000 --> 23:26.020] this is sometimes pretty big I can you know just name one example that pops in my head against for example [23:26.020 --> 23:32.068] against for example are pretty very amazing example for that they do marvel things but it's really [23:32.068 --> 23:39.040] hard to get them into production they often tend not to converge and it's worked well on this [23:39.040 --> 23:44.044] data set but not in this data set and this data set is working not good enough so it's a pretty big [23:44.044 --> 23:52.020] challenge how to be innovative and giving good and valuable features but also reliable and accurate [23:52.084 --> 23:56.068] which is what what might you do with again I'm trying to picture that like I I don't I don't [23:56.068 --> 24:05.016] want to deepfake with my baby no no no not not did fakes but there are there are many other uses of [24:05.016 --> 24:13.032] again that we can use maybe for enhanced images and make your nice fun features you know that you can [24:13.032 --> 24:18.092] celebrate like your baby with different background and stuff like that so there's so close all [24:18.092 --> 24:24.068] sort of stuff that gets can be really useful but again there's a big gap between an experiment and [24:24.068 --> 24:30.036] paper and actually getting into production I mean I know that in the last couple of years I've [24:30.036 --> 24:37.080] spent a lot of advances like almost like a tsunami of advances in computer vision have anything been relevant to [24:37.080 --> 24:44.004] you like to you take recent stuff and get them in production or is that stuff like to kind of theoretical [24:44.004 --> 24:49.040] to really matter for the practical stuff you're doing we we always try to take state of the art [24:49.040 --> 24:57.016] and trying to adapt them to our domain and our fields which is easier maybe mainly object detection [24:57.016 --> 25:04.060] we talked about it so it's a disaster pretty much solved let's say a pretty much comfortable to get them [25:04.060 --> 25:11.056] into production so yeah it's much easier but there are other fields that we we try I honestly say [25:11.056 --> 25:17.048] we try all the time sometimes it's really hard to bridge this gap but it's definitely something that [25:17.048 --> 25:23.056] keeps motivated and try to do it all the time I mean if you stay behind in this field you probably [25:23.056 --> 25:29.040] want to exist that long this is why I sure is there any I guess any paper or like line of research that [25:29.040 --> 25:33.088] you can talk about is being especially relevant to the work through the I can talk about some nice [25:33.088 --> 25:40.020] research resources we did lastly and all of them are actually somehow related I mean they're all using [25:40.020 --> 25:45.048] the slick metrics I don't have a which have the algorithm at the back so during for example during [25:45.048 --> 25:51.064] the pandemic during COVID I actually nanit kept help to kept families together you know when for [25:51.064 --> 25:59.040] example when the grandparents can see their grandchildren and it's allows that and we also checked during [25:59.040 --> 26:05.080] the COVID what are the effects on babies and we actually is trying to study the difference between [26:05.080 --> 26:11.088] children that their parents were essential and went to work as usual and parents that stayed at home [26:12.052 --> 26:20.012] and we actually saw that at the first few weeks from from the end of March let's say for first few [26:20.012 --> 26:29.000] weeks we saw that the slip of the babies actually got worse but yeah but it was actually improved [26:29.040 --> 26:36.028] after a couple of months we we saw that those the slip of the babies that their parents stayed at home [26:36.028 --> 26:42.020] actually got back to normal which is pretty amazing it's actually means that babies are resilient [26:42.020 --> 26:51.016] to the change and they adapt that which is kind of cool. Can I ask you so this is I mean this is like I [26:51.016 --> 26:56.052] think for a lot of parents the the most drama field topic is it's kind of sleep training the baby where [26:56.052 --> 27:01.080] you leave the baby and let them cry at various lengths and kind of teach them to go to sleep on their [27:01.080 --> 27:07.056] own instead of with you holding them do you have an opinion on that? Well since some not a [27:08.028 --> 27:14.060] sleep expert I can only say for for my experience it's important to let the baby out sleep on their own [27:14.060 --> 27:19.000] I guess not in any cost but do you have any data on that? Is it I guess you do sort of track when the [27:19.000 --> 27:25.080] baby falls asleep on their own? Yeah yeah we do I'm not sure if I have any relevant research [27:25.080 --> 27:30.060] that you've done in this field but again you can this is the beautiful beauty of nanny I mean you can [27:30.060 --> 27:36.060] actually test your assumptions I would say because if you believe in that then you know and then [27:36.060 --> 27:42.020] the objective data tells you that it's right so that's that's good and if not so you might really [27:42.020 --> 27:47.072] want to reconsider but that's after you you get the data you can I can decide? Do you publish like [27:47.072 --> 27:52.068] aggregate statistics like that on different things that help babies sleep? We do have a research [27:52.068 --> 27:58.020] that we would that be published I'm not sure regarding those what helps and what doesn't specifically [27:58.020 --> 28:04.004] we did publish research about screen time and how it's effects on babies and and you know [28:04.004 --> 28:10.012] children and it's actually pretty amazing we found out that for example touch screens I have a bigger [28:10.012 --> 28:16.012] effect on on the slip of babies as opposed to for example television I mean television has less [28:16.012 --> 28:23.008] effect which is pretty amazed me I mean we saw that touch screen are causing fragmented sleep [28:23.008 --> 28:27.088] and less sleep time overall which is pretty I mean it's really amazing you can you can [28:27.088 --> 28:34.052] conduct a research and see it quickly because we have large user base and engage users that can allow us [28:34.052 --> 28:41.048] and and ask your questions this is also a good research tool. It's amazing yeah and it seems [28:41.048 --> 28:47.008] like I guess from your app I feel like your benchmarks of sleep are actually a little less sleep than [28:47.008 --> 28:52.020] I see and sort of like the parenting books that I read you think because you're actually monitoring [28:52.020 --> 28:58.052] it instead of doing getting self-reported data do you see systematic bias in the self-reported sleep data? [28:58.052 --> 29:03.016] Like it'll tell me like you know how my daughter is doing like kind of compared to averages [29:04.012 --> 29:09.064] and it's funny because the app is kind of telling me she's doing pretty good but then when I compare it [29:09.064 --> 29:14.020] to like you know books that I'm reading it seems like she's sleeping a little less than that average so [29:14.020 --> 29:18.084] maybe you're just trying to be like you know positive and helpful but I also kind of wonder because [29:18.084 --> 29:23.056] you know we try to write down every time she wakes up and like you know when she goes to sleep and when she gets up and [29:23.056 --> 29:29.088] I always kind of feel like our written notes imply a little more sleep than the data actually shows us [29:29.088 --> 29:35.000] that that she got and so I kind of wonder if previous studies are lying on kind of parents memories [29:35.000 --> 29:38.044] and making us think that babies are sleeping more than they're actually sleeping. [29:38.044 --> 29:45.040] So what I can say about it so I guess that's something is true also I guess getting data for babies [29:45.040 --> 29:51.064] especially from babies is really expensive I mean I'm not sure researchers can do you know thousands of [29:51.064 --> 29:58.092] babies and then record their sleep what 90 to actually can do so maybe you know there's a small portion [29:58.092 --> 30:04.044] this is why you see some big variants between studies about sleep I guess I guess that would be the reason [30:04.044 --> 30:09.064] I guess this is my solution. Is there any other takeaways besides avoiding touch screens to help [30:09.064 --> 30:15.008] a baby sleep any conclusions you come to with your your large scale data collection? [30:17.040 --> 30:23.000] So most of actually the significant tips that we see are actually incorporated in the app [30:23.000 --> 30:28.020] so helping baby phones sleep on his own is of course they're remarkable sign for that because when [30:28.020 --> 30:35.008] she wakes up during the night you can come back to bed and so I guess what we see and what this is [30:35.008 --> 30:43.008] you know we're trying to translate that and validate it of course and send it as as tips if possible. [30:44.052 --> 30:49.056] Cool. Well I guess we always end with two questions that I want to make sure we have a little [30:49.056 --> 30:54.084] time for that so you know the the second last question is what is one underrated aspect of [30:54.084 --> 31:00.068] machine learning that you think people should pay more attention to than they do? I would say building [31:00.068 --> 31:07.032] a good process for deploying the models I mean making something that this works as a system and it's not [31:07.032 --> 31:13.032] occasionally working not sometimes people tends to yeah okay let's take the data let's train it okay [31:13.032 --> 31:19.080] it's very good and you know on accuracy okay we can deploy it and then the performance are bad and [31:19.080 --> 31:25.008] now the models in the air and you get it's much harder to fix it so I'd say conducting this this [31:25.008 --> 31:31.048] mythology this pipeline of how to work my better is something that people should pay more attention and I [31:31.048 --> 31:36.036] think that's what we see at least what I read on Twitter and LinkedIn stuff like that people are paying [31:36.036 --> 31:40.036] more and more attention to that and I think that's important for the industry. [31:41.048 --> 31:48.068] Another tool that you used to help with that in building those those pipelines so we use [31:49.024 --> 31:55.008] whatever for example managing experience experiments and you know showing the report and and see [31:55.008 --> 32:00.068] everything really helps us to get understanding on how it's exactly done trying to simulate [32:00.068 --> 32:05.016] the production like this is what works for us but I know there are several companies and there are [32:05.016 --> 32:09.088] several products out there that can do many things and this is why I wrote it as guidelines but [32:09.088 --> 32:16.084] probably some of some of the tips there it could be useful for many people some of them are not so [32:16.084 --> 32:21.040] totally and then I guess maybe you answered my last question but I'll ask it anyway so like you know when [32:21.040 --> 32:25.016] you look at like machine learning in general and making it work into production what what is used the [32:25.016 --> 32:32.012] biggest challenge from going from like you know research to deployed model working for customers. [32:32.012 --> 32:39.072] So yeah as I said I think this gap is sometimes is really being as this fact or maybe the ability [32:39.072 --> 32:46.036] to understand which paper is you know is nice but what will it hold in production it's a pretty big [32:46.036 --> 32:52.060] problem you need to foresee it and we've tried a lot of you know cool features that we saw like in [32:52.060 --> 33:01.016] carcasses and papers but it in hold on our day-dauer maybe they weren't good enough so we had to drop them [33:02.060 --> 33:06.012] well I really appreciate you being kind of public about your work and willing to do [33:06.012 --> 33:10.052] case studies and things like that I think it really helps a lot of people learn best practices as they [33:10.052 --> 33:14.076] try to get models and production so what some links to some of the work that you've put out but I [33:14.076 --> 33:20.060] would say please keep doing it if you're open to it it's super helpful for our community yeah totally I [33:20.060 --> 33:25.024] hear you this is how we learn and this is how we can share the knowledge and I think as much as people [33:25.024 --> 33:30.036] those shared and knowledge will be better and everyone could have great productivity which I think is [33:30.036 --> 33:38.036] important totally thanks and I'd really appreciate thank you so much thanks for listening to another episode [33:38.036 --> 33:43.072] of great descent doing these interviews are a lot of fun and it's especially fun for me when I can [33:43.072 --> 33:49.032] actually hear from the people that are listening to these episodes so if you wouldn't mind leaving a comment [33:49.032 --> 33:53.064] and telling me what you think or starting a conversation that would make me inspired to do more of these [33:53.064 --> 33:58.028] episodes and also if you wouldn't mind liking and subscribing I'd appreciate that a lot.

40.23841

50.67039

3y ago

1m 9s

Nov 21 '22 16:55

3iibvkte

Finished

Nov 21 '22 16:55

2721.120000

/content/will-falcon-making-lightning-the-apple-of-ml-kdrsnub9zea.mp3

tiny

[00:00.000 --> 00:01.036] [MUSIC] [00:01.036 --> 00:04.036] >> Users are always going to tell you kind of incremental things. [00:04.036 --> 00:06.004] They're always going to tell you they want this better. [00:06.004 --> 00:07.092] They're never going to tell you they want the iPhone. [00:07.092 --> 00:11.072] They're always going to tell you, can you make my Blackberry keyboard slide out instead or whatever, right? [00:11.072 --> 00:13.088] Those inputs are going to usually improve the product, [00:13.088 --> 00:17.024] they're not going to help you create like a leapfrog product, right? [00:17.024 --> 00:19.020] >> You're listening to gradient descent, [00:19.020 --> 00:21.068] a show about machine learning in the real world, [00:21.068 --> 00:27.036] and I'm your host, Lucas B. Well, William Falken started his career training to be a Navy [00:27.036 --> 00:32.068] SEAL before becoming an iOS developer, and eventually the CEO of Lightning.AI, [00:32.068 --> 00:37.032] which makes PyTorch Lightning a very successful ML framework, [00:37.032 --> 00:42.016] and Lightning AI, which is an awesome website that calls itself the OS for machine learning [00:42.016 --> 00:43.080] that we're going to talk a lot about today. [00:43.080 --> 00:46.056] This is a super fun conversation, then I hope you enjoy it. [00:46.056 --> 00:49.040] [MUSIC] [00:49.040 --> 00:51.092] >> I thought it would be fun to start with your background. [00:51.092 --> 00:54.064] We don't have a lot of people that went through Navy SEAL training on this podcast, [00:54.064 --> 00:59.012] so could you tell us a little bit of your story and how you came to found Lightning? [00:59.012 --> 01:00.044] >> Yeah, sure. [01:00.044 --> 01:04.004] So I'm originally from Venezuela, so I don't know if people know that. [01:04.004 --> 01:08.064] I'm actually born and raised there, so English is my second language, [01:08.064 --> 01:12.040] which is why you hear me slip up today and a few things. [01:12.040 --> 01:15.040] Card does not care what language a speaker is great. [01:15.040 --> 01:22.008] So yeah, so I moved here when I was in my teens and then eventually ended up going to the [01:22.008 --> 01:26.016] US military and I went through SEAL training, but so I was there for a few years. [01:26.016 --> 01:32.016] If anyone knows about some classes, T7, T7, T7, T7, which is great. [01:32.016 --> 01:38.026] And yeah, I came out injured actually, and so I basically got stashed on one of the [01:38.026 --> 01:41.048] SEAL teams that does a lot of intelligence work. [01:41.048 --> 01:47.068] So very interesting team, so I also happened to be a care big from just found I guess. [01:47.068 --> 01:50.016] So there's a lot of course of that we're doing there. [01:50.016 --> 01:55.024] And when it was some to go for me to go back into training, this is when we pulled out of Iraq [01:55.024 --> 02:01.040] in 2012, which 2013, so then they would give me an option to leave or become a pilot or [02:01.040 --> 02:04.024] something and I chose to leave. [02:04.024 --> 02:07.056] Maybe if I'd seen Top Gun, I would have stayed as a pilot potentially. [02:07.056 --> 02:15.008] But it was a great time, and yeah, we did a lot of good work there, and very happy about [02:15.008 --> 02:16.008] the time. [02:16.008 --> 02:18.096] I think it really set me up for success for everything I did afterwards. [02:18.096 --> 02:23.020] So I liked it and care about school until I left the military turns out. [02:23.020 --> 02:25.064] And then how did you get into machine learning? [02:25.064 --> 02:36.020] So I was at Columbia during my undergrad, and so around 2013 I want to say, and basically [02:36.020 --> 02:43.052] people started telling me about this machine learning thing, and I wasn't super into math [02:43.052 --> 02:45.032] or any of the stuff back then. [02:45.032 --> 02:52.084] I started my degree as computer science, and for some reason the CS part was fun, but it was [02:52.084 --> 02:57.004] in the most interesting part, I really gravitated towards math at some point. [02:57.004 --> 03:01.032] And I think if you were doing anything with statistics or math in 2013, and you were touching [03:01.032 --> 03:06.052] code, it's impossible not to run into SVMs and random force and all this stuff. [03:06.052 --> 03:10.072] I remember taking my first known at work class, and they were like, yeah, you got this image, [03:10.072 --> 03:15.060] and it was, you know, we've all seen this, I'm this thing of that young put together back in [03:15.060 --> 03:18.008] the day, with like a carousel music. [03:18.008 --> 03:22.080] And I was like, I don't know why it's useful, like I don't see the value of this. [03:22.080 --> 03:27.064] And then many, many years later, you know, and then I've worked with Jan, this one of my [03:27.064 --> 03:33.064] PC advisors, and yeah, so at some point in my undergrad, I wanted to finance because, [03:33.064 --> 03:36.048] you know, it was interesting, I guess. [03:36.048 --> 03:39.076] And I wanted there to try to use deep learning and the trading floor. [03:39.076 --> 03:45.024] And you know, finance today is probably maybe not so allergic to deep learning anymore, but [03:45.024 --> 03:49.004] like that it was, right, because of the all-depth or ability problems. [03:49.004 --> 03:54.080] So I didn't love that, and so I went back to school, I got into computational neuroscience, [03:54.080 --> 03:59.036] and that's really where I learned about deep learning, and got really, really into machine learning. [03:59.036 --> 04:03.004] And so really the science is trying to decode neurolectivity and try to understand how [04:03.004 --> 04:04.004] the brain works. [04:04.004 --> 04:09.092] I still care a lot about that, and that's a lot of my drive is really the site, the pursuit of science, [04:09.092 --> 04:15.000] but I find that yeah, a lot of the tools are really limiting to enable science to advance and [04:15.000 --> 04:16.080] do what it is to do. [04:16.080 --> 04:20.088] And then what were you seeing when you started lightning, like what was the problem? [04:20.088 --> 04:24.000] You were setting up the solve in the very beginning of it. [04:24.000 --> 04:31.028] So I don't think I was like explicitly when I started lightning, I was still at undergrad, [04:31.028 --> 04:37.084] so this is around 2015, I was doing my research, and I wasn't like building lightning for lightning, [04:37.084 --> 04:41.052] or anything like that, it was just my research code that I had internally. [04:41.052 --> 04:48.040] And what I was trying to optimize for was how do I try ideas as quickly as possible without having [04:48.040 --> 04:52.064] to rewrite the code over and over again, but in a way that doesn't limit me, right, because [04:52.064 --> 04:57.016] it's a researcher, the worst thing that you can do is you can adopt something, and you [04:57.016 --> 05:01.092] spend six months going to research, and then suddenly the last few months, you're like blocked, [05:01.092 --> 05:05.032] and you're like, "Oh my God, I have to rewrite everything," and then it discredit totally [05:05.032 --> 05:06.032] results. [05:06.032 --> 05:09.092] So flexibility was like number one thing that I cared about, right? [05:09.092 --> 05:12.036] And so that's a lot of what I was solving. [05:12.036 --> 05:16.084] And over the years, really, I did not have a person to 2019, and so I took about four or [05:16.084 --> 05:19.044] five years to get there. [05:19.044 --> 05:24.008] One, what I did during that time was just try so many different ideas, right? [05:24.008 --> 05:29.008] So my first research was like I said, "Newer science, a lot of that was using GAN, some [05:29.008 --> 05:31.040] B.A.E.s, then after that I'm moving to an LP, right? [05:31.040 --> 05:33.016] When I started my PhD. [05:33.016 --> 05:37.056] So chose one of the main authors on the seek to seek an attention paper. [05:37.056 --> 05:41.048] So my first thing was to implement attention from scratch and a seek to seek network and [05:41.048 --> 05:47.068] all the stuff, and learn, you know, just very rough if you guys have ever tried this. [05:47.068 --> 05:48.068] It's not trivial. [05:48.068 --> 05:51.040] I know Lucas has implemented this about two times now. [05:51.040 --> 05:54.016] I tried to do it once and I really needed it. [05:54.016 --> 05:55.016] It's not trivial. [05:55.016 --> 05:56.056] Maybe it's not quite as hard. [05:56.056 --> 05:58.036] Like daunting as it seems at first, I didn't know. [05:58.036 --> 06:00.072] I guess there's probably less resources when you did it. [06:00.072 --> 06:04.056] Yeah, I mean, back then you're writing everything yourself. [06:04.056 --> 06:07.072] Nowadays there's like attention heads and other stuff you can plug in, but you know, there, [06:07.072 --> 06:12.036] you're like calculating your own stuff, and then you know, pipes or supports, certain things, [06:12.036 --> 06:16.016] you're like blocked, and it was really confusing. [06:16.016 --> 06:17.084] So it was rough. [06:17.084 --> 06:20.092] And we took that and we started working on complex. [06:20.092 --> 06:24.004] So it showed also introduced GRU units, right? [06:24.004 --> 06:30.048] So we started working on complex GRUs and the idea there was to help eliminate the need for, [06:30.048 --> 06:34.076] like the eliminate gradient from exploding or zeroing out. [06:34.076 --> 06:38.012] And so complex numbers can help you do that, especially for audio, right? [06:38.012 --> 06:40.052] With some normalization techniques and all that. [06:40.052 --> 06:44.064] But, you know, complex numbers is not something that pipes are supported early until like a [06:44.064 --> 06:45.064] year ago. [06:45.064 --> 06:50.052] So, you know, little old PhD me, I'm like sitting there and I'm like, okay, an app to implement [06:50.052 --> 06:54.056] this whole complex number library, which I did and it's open source, super slow. [06:54.056 --> 06:55.056] Don't use it. [06:55.056 --> 06:56.080] Use the Python's one is better now. [06:56.080 --> 07:00.052] But, you know, it's like, willing to do what it takes, I guess, to get the thing done. [07:00.052 --> 07:05.056] But, yeah, I mean, you know, through all those learnings, eventually there have been computer [07:05.056 --> 07:06.096] vision and subsurbiage research. [07:06.096 --> 07:12.000] I think if you work with YAN, there's no way you don't do subsurbiage learning at some point. [07:12.000 --> 07:17.060] So, I kind of fell into it and this is like 2019, I think, before like blew up, well, you know, [07:17.060 --> 07:21.028] before the world found out about it, people have been doing this for many years. [07:21.028 --> 07:23.060] And so, all of that stress is a lightning, right? [07:23.060 --> 07:27.060] And so, that was pretty flexible by the time that it got open source. [07:27.060 --> 07:29.060] Like, I knew you could do a lot of this stuff. [07:29.060 --> 07:33.048] And then when I joined Fair, it was a lot of like, oh, can we use it for this or that? [07:33.048 --> 07:34.076] I'm like, yes, of course you can. [07:34.076 --> 07:35.076] Let me show you how. [07:35.076 --> 07:39.012] And it just took forever to explain all the possibilities you could use it. [07:39.012 --> 07:43.004] And today, I think it's obvious that it can work for pretty much anything. [07:43.004 --> 07:44.004] But it wasn't back then. [07:44.004 --> 07:46.044] And, you know, we still learn as we go sometimes. [07:46.044 --> 07:50.056] And someone finds that it's not flexible for something and we fix it and we move on, right? [07:50.056 --> 07:52.028] But it's a lot of process. [07:52.028 --> 07:54.012] It's taken a lot of years to get here. [07:54.012 --> 07:58.096] So, when you go back to 2015, it was PyTorch. [07:58.096 --> 08:02.016] Like actually like, like in use at the time, like it was just torch, right? [08:02.016 --> 08:04.008] I'm trying to remember like what years these things came out. [08:04.008 --> 08:09.088] But it's certainly an unusual choice to build on top of PyTorch and in 2015, if that's even possible. [08:09.088 --> 08:11.056] Like, how did that happen? [08:11.056 --> 08:14.088] Well, so my original version wasn't on top of PyTorch. [08:14.088 --> 08:17.044] So I had actually started in the piano, right? [08:17.044 --> 08:22.008] So basically what happened, I was using Ciano and Escalerd mostly. [08:22.008 --> 08:26.084] So I think I did whatever one does where they take the model and they add the dot fit to it. [08:26.084 --> 08:30.020] And then you start like building off of that. [08:30.020 --> 08:33.020] And so that was my original version of that was the piano, right? [08:33.020 --> 08:38.020] And the end of if you have your work on Ciano, I don't know when you started, Lucas. [08:38.020 --> 08:41.020] Yeah, I think I might have touched the end of it. [08:41.020 --> 08:43.020] I very little. [08:43.020 --> 08:46.020] I think I was using Paris and top of the end of it. [08:46.020 --> 08:47.020] That dates me. [08:47.020 --> 08:48.020] Yeah, yeah. [08:48.020 --> 08:49.020] No, for sure. [08:49.020 --> 08:51.020] So I got really annoyed at it. [08:51.020 --> 08:54.020] I mean, I think it was great to show proof of concepts for sure. [08:54.020 --> 08:57.020] So I started using Paris immediately, right? [08:57.020 --> 08:59.020] And I think that helped me unlock a lot of stuff. [08:59.020 --> 09:03.020] But I got very at some point in running into limitations and I'm sure that's changed. [09:03.020 --> 09:05.020] But back then, that was true. [09:05.020 --> 09:07.020] And so that happened. [09:07.020 --> 09:09.020] And that's what I was like fine. [09:09.020 --> 09:11.020] I guess I have to go and get into TensorFlow. [09:11.020 --> 09:13.020] This I was like trying to avoid it, right? [09:13.020 --> 09:17.020] And so my first version actually was built on top of the TensorFlow. [09:17.020 --> 09:23.020] But the second that PyTorch came out, which was a few years later, I rewrote it all in PyTorch. [09:23.020 --> 09:26.020] And mostly because it just felt more mathematical. [09:26.020 --> 09:29.020] I could like see the math. It was easier. [09:29.020 --> 09:35.020] Right? Where's in TensorFlow? You had this duplicate layer where it was like a meta language on top of the thing. [09:35.020 --> 09:37.020] Which again, that's changed in sand. [09:37.020 --> 09:40.020] But back then, that's kind of how the world lived in. [09:40.020 --> 09:42.020] So yeah, it was very experiment. [09:42.020 --> 09:46.020] You know, Torch back then was very hard to work with. [09:46.020 --> 09:51.020] Oh, sorry. It was easy, but it's you know installing things like that was really difficult. [09:51.020 --> 09:52.020] That's really interesting. [09:52.020 --> 09:55.020] So like were you at all inspired by the way, [09:55.020 --> 10:01.020] Carousetid things or do you feel like your lightning was sort of in contrast to parts of carouset? [10:01.020 --> 10:02.020] How did you think about that? [10:02.020 --> 10:10.020] Because I sort of feel like lightning plays a similar role to PyTorch's carouset, you know, plays to TensorFlow. [10:10.020 --> 10:13.020] Do you feel like that's too simple or wrong? [10:13.020 --> 10:18.020] Yeah, I mean, I think when I, you know, when I first released lightning, [10:18.020 --> 10:23.020] and we put it on the torch thing, I called it the carouset for PyTorch. [10:23.020 --> 10:27.020] Because at a high level, it kind of looked like it, but it really wasn't, right? [10:27.020 --> 10:32.020] So I may be the cost of this confusion unfortunately. [10:32.020 --> 10:38.020] But yeah, like I just said, you know, I used the I know, I used carouset, I used TensorFlow, I used Escaler. [10:38.020 --> 10:41.020] Right? So a lot of my inspiration obviously comes from a lot of these things. [10:41.020 --> 10:45.020] Before I got into machine learning, though, I was a knife on developer. [10:45.020 --> 10:48.020] So I worked on iOS for a long time, right? [10:48.020 --> 10:53.020] And so a lot of these ideas that people bring in as callbacks and all these things are actually ideas [10:53.020 --> 10:56.020] of it introduced in Objective C, since the 70s, 80s, right? [10:56.020 --> 11:01.020] So if you work in mobile, if you work at web, you've been exposed to these ideas. [11:01.020 --> 11:09.020] So I would say a lot of my inspiration really was, I think the, like, API simplicity, like that fit kind of thing, [11:09.020 --> 11:12.020] came from what's likely a skillet like I would say. [11:12.020 --> 11:17.020] And then I think that a lot of the callback and things like that. [11:17.020 --> 11:21.020] I thought it was actually very opposed to callbacks, turns out, like, you know, a lot of the hook names. [11:21.020 --> 11:26.020] I even if you see the way of name things, a lot of it more inspired by, like, objective C and like, [11:26.020 --> 11:28.020] these super long names that objected. [11:28.020 --> 11:30.020] Actually, you told me you'd start objective C. [11:30.020 --> 11:33.020] So I'm sure you know what I'm talking about. [11:33.020 --> 11:37.020] But yeah, it's a lot like super long syntax names, right? [11:37.020 --> 11:42.020] I'm looking for as you like Objective C, like, you know, I feel like most people they hate it. [11:42.020 --> 11:46.020] And I think like one of the reasons people tend to hate Objective C is the verbosity, [11:46.020 --> 11:49.020] but it sounds like you see the sense in it. [11:49.020 --> 11:52.020] Yeah, I mean the verbosity makes us out of to think about it, right? [11:52.020 --> 11:56.020] Like I hate when names are so short and you're like, what do you mean by this, right? [11:56.020 --> 12:01.020] Like Objective C is like, did, you know, view did load on this and that and that. [12:01.020 --> 12:04.020] And you're like, that makes sense. I read this whole thing, you know? [12:04.020 --> 12:07.020] [laughs] [12:07.020 --> 12:13.020] I think like, you know, all of them did inspire me and I would say, I think something I really like about [12:13.020 --> 12:16.020] carous was kind of the feedback that you get, right? [12:16.020 --> 12:20.020] So the summary tables and all of that, like that's inspired by carous as well. [12:20.020 --> 12:23.020] So I would say it's a combination of a lot of things, right? [12:23.020 --> 12:28.020] But I would say most of my, most of the things that I've really thought about really are kind of driven in that [12:28.020 --> 12:32.020] fundamental like Objective C worlds and like that IOS world, right? [12:32.020 --> 12:39.020] And in fact, if you look at lightning apps, now the new instructions that we put into lightning, a lot of them are kind of similar to that, right? [12:39.020 --> 12:41.020] So they're, they have a lot more elements of that. [12:41.020 --> 12:49.020] So yeah, I think over the years things have evolved, but no, I think lightning stick and kind of its own soul and its own thing. [12:49.020 --> 12:57.020] And it's started to become kind of a tone paradigm that, you know, I hope that does become a standard in the industry [12:57.020 --> 13:02.020] and I hope that it doesn't spiral out of their people and especially in their APIs and how they write things, right? [13:02.020 --> 13:04.020] Because I do think it works in scale. [13:04.020 --> 13:10.020] So I'm not offended if people grab the APIs and do something with them because it means that, you know, [13:10.020 --> 13:14.020] the very least we standardize a mall which is a win for everyone, right? [13:14.020 --> 13:22.020] So what's the part of the lightning API that you feel super proud of that you feel like was different than what was around when you built it? [13:22.020 --> 13:30.020] Yeah, so I mean, I would say the main two things in lightning are the lightning module on the trainer, right? [13:30.020 --> 13:36.020] And I think those are the two that everyone uses and those two together allow you to abstract most of it away, right? [13:36.020 --> 13:39.020] And so I think that's really what I'm proud of. [13:39.020 --> 13:44.020] I think I'm proud of the trainer, really, I think has changed a lot, right? [13:44.020 --> 13:51.020] And it's starting to become a standard across many other things, you know, outside of lightning because it is, it is a good API. [13:51.020 --> 13:55.020] And I think, you just the simplicity of it, right? [13:55.020 --> 13:59.020] The ability to see what's happening, change things and just see magic happen. [13:59.020 --> 14:08.020] So yeah, and I would say like probably honestly the news stuff that we just released with the lightning work, lighting flow and lighting app, you know, [14:08.020 --> 14:20.020] it's taken us a few years to really think about this and figure out how do we take those ideas from building models and how do you generalize that to building full into an ML workflow? [14:20.020 --> 14:24.020] Right? Research workflows, right? Research workflows, production pipelines, all that stuff. [14:24.020 --> 14:29.020] And that's just not an easy thing to do. So we wanted to do it in a way where it was not, um, [14:29.020 --> 14:32.020] it felt lightning. It like, it has a spirit and the DNA of lightning. [14:32.020 --> 14:35.020] And you feel like you're using lighting, we're using it, right? [14:35.020 --> 14:39.020] So I'm very proud of that and that's something that was a team effort. [14:39.020 --> 14:43.020] I mean, all of it, all of this by the way has been a team effort collectively. [14:43.020 --> 14:48.020] I think I've, I've seen it some ideas, but there's no way that we would have been here at all without the community and [14:48.020 --> 14:51.020] the team here at lighting specifically. [14:51.020 --> 14:57.020] Yeah, I totally want to talk about the lightning launch that you just came out with recently. [14:57.020 --> 15:03.020] I'm super impressed by what you did there, but I guess I'm curious before we go into that. [15:03.020 --> 15:08.020] Like, I remember a moment where I think PyTorch had something called Ignite. [15:08.020 --> 15:13.020] I think that was really similar to lightning, or at least the PyTorch seemed thought it was similar to lightning. [15:13.020 --> 15:18.020] I'm kind of curious, you were actually working at Facebook, I think, where you work at the Facebook at the same time that [15:18.020 --> 15:23.020] Facebook is also sort of making a somewhat competitive piece of software, too. [15:23.020 --> 15:28.020] And is that awkward? Like, do you have any sense, like, did it feel competitive at the time? [15:28.020 --> 15:34.020] So two things. One Ignite is not done by PyTorch and it's not a Facebook product. [15:34.020 --> 15:40.020] It is a third party product where all they're doing is hosting the docs for it, right? [15:40.020 --> 15:46.020] So it's not actually built by Facebook or PyTorch. It just seems that way because of the way the docs have been structured. [15:46.020 --> 15:53.020] So that's the first thing. The second thing is, you know, I was a researcher and a student, [15:53.020 --> 15:58.020] and I was literally trying to build papers, not build software for machine learning. [15:58.020 --> 16:02.020] So I wasn't like sitting around using tools, I'm looking around and stuff, right? [16:02.020 --> 16:05.020] So I had no idea that they were in a round. I had no idea of mostly socials around. [16:05.020 --> 16:10.020] There were ones I've used, so they were only used when I was literally in the new. [16:10.020 --> 16:14.020] You've been in research. I'm sure there's a ton of stuff to do. [16:14.020 --> 16:18.020] That's cool, but never used it because I don't care because I'm doing my research. [16:18.020 --> 16:23.020] So I think it's a pretty normal thing for researchers to be pretty narrow focus. [16:23.020 --> 16:28.020] It wasn't until I got launched that people at Calfreddo and everyone else was like, "Oh my God, it's kind of like this." [16:28.020 --> 16:33.020] I'm so kind of like, "What is that thing?" I guess it's kind of like this, but it's kind of owned DNA. [16:33.020 --> 16:37.020] It's not surprising though. It's happened in research. [16:37.020 --> 16:41.020] You have people who are parallel working on something because something has happened on blocks at. [16:41.020 --> 16:44.020] So it's going to trigger similar ideas and a lot of people. [16:44.020 --> 16:47.020] But when they come up at the end, they're going to be very different things. [16:47.020 --> 16:55.020] My analogy is always like, if you and I are like, "Hey, let's paint the face of a person, as you say, I describe the face." [16:55.020 --> 17:00.020] I bet you and I are going to paint it differently, even though we're trying to do the same thing. [17:00.020 --> 17:05.020] I guess what caused you to actually start a company around lightning? What was that journey like? [17:05.020 --> 17:09.020] Very interesting because the first adoptive lightning was Facebook. [17:09.020 --> 17:13.020] That kind of got us enterprise features very quickly. [17:13.020 --> 17:16.020] I mean, I was really annoyed because I was literally trying to do my PhD. [17:16.020 --> 17:21.020] I was like, "You know, we have this thing internally, it could work place where people message each other." [17:21.020 --> 17:25.020] And I kept getting paint by the Facebook team, not that fair. [17:25.020 --> 17:28.020] Like the actual people building all the fun stuff. [17:28.020 --> 17:34.020] And then I did an instructor's thing. I was literally, I mean, we've tried to change emails. [17:34.020 --> 17:36.020] You know, I'm not the best study emails, right? [17:36.020 --> 17:41.020] So I hadn't checked the singer literally for like four months. [17:41.020 --> 17:44.020] And then my manager came in and said, "Dude, you have to check workplace." [17:44.020 --> 17:49.020] I was like, "Why?" And then it sees these Facebook teams being like, "Hey, we want to use your thing. [17:49.020 --> 17:53.020] I'm like, dude, it's a PhD project. Why would you want to do that?" [17:53.020 --> 17:55.020] And they're like, "No, it's okay. We'll help you make it better." [17:55.020 --> 17:59.020] I was like, "Fine." And so they took it and started working on it. [17:59.020 --> 18:02.020] And we've been super tied with the teams since then. [18:02.020 --> 18:06.020] But then it was crazy because then big companies started using it immediately. [18:06.020 --> 18:09.020] It was like, someone would submit a PR and they're like, "Hey, can you fix this? [18:09.020 --> 18:13.020] I'm like, "No, I'm not doing, I don't know, FFT researcher, whatever you're doing. [18:13.020 --> 18:16.020] I don't want to fix that." And they're like, "But I'm a Bloomberg. I'm like, that's cool." [18:16.020 --> 18:20.020] All right, I guess I should help you out, right? [18:20.020 --> 18:25.020] And so then, you know, that's the developer, that's the best thing. You're like, "Cool, my suspect is for real." [18:25.020 --> 18:34.020] Like, that's great. So I think when I had like hundreds of these, I was like, "Okay, well, these people are really struggling with this bigger problem, which is what we just launched, right?" [18:34.020 --> 18:38.020] So let's go ahead and really solve that problem in a meaningful way. [18:38.020 --> 18:43.020] But you know, it turned out that you couldn't do it alone and you needed a ton of money and people all in someone. [18:43.020 --> 18:46.020] And so that's how we ended up here. [18:46.020 --> 18:49.020] And I guess what year was that? Was that 2019? [18:49.020 --> 18:54.020] Yeah, that was summer of 2019. And then I left Facebook in December of 2019. [18:54.020 --> 18:56.020] So started the company. [18:56.020 --> 19:02.020] January 2020, 2014. Before COVID, right? So Lucas, yeah, you built a few companies. [19:02.020 --> 19:07.020] You've been successful and I'm sure you know how hard it is to build the recovery. [19:07.020 --> 19:13.020] Well, I mean, actually here we are somewhere 2022. Have big is your company. [19:13.020 --> 19:18.020] Yeah, good question. So we're about 60 people now all over the world. [19:18.020 --> 19:24.020] And yeah, I think we've mostly clustered around New York, San Francisco and London. [19:24.020 --> 19:27.020] And then we have people kind of everywhere else. [19:27.020 --> 19:30.020] I was saying one thing that I'm really proud of in the company. [19:30.020 --> 19:33.020] It's again, I'm not from the US, I'm not from Silicon Valley. [19:33.020 --> 19:37.020] So I think that that's kind of been the DNA of the company now. [19:37.020 --> 19:40.020] Like, we have a ton of people from like 20 different countries. And it's amazing. [19:40.020 --> 19:44.020] Because everyone speaks all these languages and it's pretty cool. [19:44.020 --> 19:48.020] It's a year feels pretty international. So I think for like a New York startup, this is great. [19:48.020 --> 19:51.020] It's exactly what you want. Right? That melting pot. [19:51.020 --> 19:57.020] That's awesome. What is the experience been like to go from kind of like a researcher, [19:57.020 --> 20:03.020] seller developer to like suddenly running like a really significantly large company. [20:03.020 --> 20:08.020] Do you find time to to think to write code on your instill? [20:08.020 --> 20:12.020] Yeah, good question. Maybe ask you this. Don't you feel like building a company is kind of like doing research? [20:12.020 --> 20:14.020] There are a lot of parallels now. [20:14.020 --> 20:18.020] You know, I do think there's some parallels, but you go first, tell me what you think the parallels are. [20:18.020 --> 20:21.020] Yeah, so you know, what are you doing in research? [20:21.020 --> 20:26.020] So you have, you have hypotheses and you're proven wrong most of the time. [20:26.020 --> 20:31.020] And you've got to just try something quickly and then move on to the next thing and try the ideas and so on. [20:31.020 --> 20:34.020] And so you find something that works right and then you dig into it. [20:34.020 --> 20:37.020] That's no different than a company, right? [20:37.020 --> 20:42.020] The difference is you have to do it through people, which is really hard, right? [20:42.020 --> 20:46.020] So it's not just a solo person building and I think people forget this, right? [20:46.020 --> 20:50.020] It's like if you want to build anything meaningful, you have to have a team. [20:50.020 --> 20:52.020] You cannot do it alone. I'm at this point. [20:52.020 --> 20:56.020] I have to tell you, like I just said that lighting took about five years to go live. [20:56.020 --> 21:01.020] If I've been working with this team, but probably would have been, I put a could a gotten there in a year, right? [21:01.020 --> 21:06.020] Because it's a lot faster when you have really smart people around you and you're working together. [21:06.020 --> 21:12.020] So I don't love this notion of like the solo, whatever who did whatever that doesn't work, guys. [21:12.020 --> 21:15.020] Like I don't do that, right? So it's been amazing. [21:15.020 --> 21:18.020] So you have to build a company through people and that's really hard to do, right? [21:18.020 --> 21:24.020] So people manage, man, take an vision and getting everyone to go towards that same vision. [21:24.020 --> 21:26.020] Where they don't even know what the output is going to look like. [21:26.020 --> 21:31.020] That's really hard because you're asking 60 people to just dismiss this belief and say, you know what? [21:31.020 --> 21:33.020] Fine, we're going for it. [21:33.020 --> 21:37.020] And when we get there, we'll see what it is, right? And so you have to trade that off a lot of salirium. [21:37.020 --> 21:46.020] And I think, honestly, you know, spending the first six years in the military, you know, even though I didn't do all the seal training that everyone does and then we come a full seal. [21:46.020 --> 21:54.020] But the stuff that I did go through, and especially leading small teams and training and the seal team actually did translate really well, right? [21:54.020 --> 22:02.020] It's like, how do you get an aggressive bunch of people to go towards a goal really fast when you have no information. [22:02.020 --> 22:06.020] And you have limited resources, right? It's like perfect. [22:06.020 --> 22:16.020] But it's really cool. They tell me more about that. I'm really curious. What are some of the sort of things that you learned about leadership in the military that you applied to running your company? [22:16.020 --> 22:23.020] Yeah, I mean, like, you know, if you show up to Buds as a junior officer, right? So I was 20 when I started seal training. [22:23.020 --> 22:27.020] You know, I got put in charge of about 300% class. Like, that's crazy, right? [22:27.020 --> 22:36.020] And so you have to be accountable for everything, all their gear where they are. And it's like all 18, 19 year olds. They're all getting in trouble out in town. They're all doing really silly things, right? [22:36.020 --> 22:42.020] So you haven't to do with a ton of people issues. And it's your like 20. You're like learning on the job, right? [22:42.020 --> 22:50.020] And then you show up to your first seal team and then you're like, putting charge of a team. And those guys have been there for 30, 40 years. There's so much better than you in every possible way, right? [22:50.020 --> 22:57.020] So if you show up trying to teach, yeah, you're feeling like, hey, I'm here, big bad boss. I'm going to do whatever you're done, right? That's not how it works. [22:57.020 --> 23:04.020] So I think specifically, I can't speak for the whole military, but I can say in the sealed teams and special operations, you're taught to leave from the front, right? [23:04.020 --> 23:11.020] So as an officer, you are supposed to be the fastest runner or the best swimmer, all of that because you're always leading from the front, right? [23:11.020 --> 23:19.020] And so that I still carry that here, right? So that's why I'm not like coding all the time right now, but I do like do want the team to be at a specific level, right? [23:19.020 --> 23:27.020] And I can get there because I can push the team. So I think it's a lot about that. And as I'm in the salary that if I'm going through that door, I'm going first, right? [23:27.020 --> 23:31.020] And I'm going to be there first always, right? And so a lot of those lessons carry over. [23:31.020 --> 23:39.020] So it's there are a bunch of civilian terms for this, whatever leadership is called, but you know, that's kind of ingrained in me since I was 20 basically. [23:39.020 --> 23:57.020] That's really interesting. Do you think there's any like really striking differences about managing a company of like, you know, mostly highly technical people distributed around the world that you're surprised by that's different than, you know, leading a team of, you know, 18 and 18 year olds? [23:57.020 --> 24:03.020] Yeah, I mean, for sure. So in the military, it's very like dictatorial, I guess. [24:03.020 --> 24:09.020] You like, you make a decision and that's it. There's no there's no question right? No one questions or anything like that. [24:09.020 --> 24:14.020] You of course take people's input and everyone has that, but at the end of the day you say something and it just happens, right? [24:14.020 --> 24:21.020] And there's no like second guessing, whatever. In the civilian world, oh my god, there's questions and this and that and blah, right? [24:21.020 --> 24:29.020] And so like you have to really learn how to live in that world. So it's fascinating. I think the the fears that I spent in finance were the best kind of middle ground, right? [24:29.020 --> 24:34.020] And I actually think a lot of veterans have a hard time adjusting to the civilian world probably for this reason, right? [24:34.020 --> 24:40.020] Because the way you do things in military is just so different. So you can't approach people that way you have to learn they cue, right? [24:40.020 --> 24:47.020] So in finance, it's kind of this hybrid like super aggressive ground, but you still have to learn how to talk to people. [24:47.020 --> 24:56.020] And so if any veterans are watching this, I would urge you to go to finance first so you can learn a soft landing and then go into tech because, you know, [24:56.020 --> 25:01.020] what's your tech you're doing with designers and creators and people are very different there. [25:01.020 --> 25:09.020] That's awesome. Do you think there's um, you have any role to play? This is a total aside. I'm just curious if you have any thoughts on this, but you know, [25:09.020 --> 25:18.020] sometimes I feel like at least in Silicon Valley there's often like a lot of friction between military and tech, like working together. [25:18.020 --> 25:28.020] Think about that at all. Like do you hope that there's military applications of lightning and do you think you can play a translation role or how do you think about that? [25:28.020 --> 25:38.020] Yeah, I mean look, I think that specifically I'm in the military like the two everyone's like autonomous weapons blah, right? [25:38.020 --> 25:44.020] Like that's what everyone jumps to and like yes, that is an extreme use of it for share and that's not a use that I want to support, right? [25:44.020 --> 25:50.020] I think any of those want to support that especially having been in some situations where it's pretty clear that you don't want to enable more of that, right? [25:50.020 --> 25:57.020] But I think what people don't understand is that some of these tools can be used in also positive ways, right? [25:57.020 --> 26:05.020] Like there are ways where you could for example, I don't know, I mean I don't know what to get into it because people are going to judge all the parts, right? [26:05.020 --> 26:14.020] But there's ways so you can use it still in a good way of translation, right? You're in the field and you're meeting someone in a new village and you can't speak to them, right? [26:14.020 --> 26:25.020] How do you do that? And how do you, you know, a lot of what we a lot of the military has done during the war has been around winning hearts and minds in Afghanistan and Iraq. [26:25.020 --> 26:32.020] And that's really making this connection with villagers and try to understand what happens and try to read what countries and so on, right? [26:32.020 --> 26:41.020] And I think that a lot of AI could actually facilitate a lot of these things, right? Casualties. When you have casualties and you need to call something out, maybe the person can't speak, right? So translating or something. [26:41.020 --> 26:51.020] So there are some great applications of it, but it's like anything. Like yes, can they answer it, be used to find your long-lust family of course it can, but can it use to be used to traffic people? Yes it can. [26:51.020 --> 26:55.020] So what are you going to do, shut it down? You know, like it's hard. It's not a simple answer, right? [26:55.020 --> 27:03.020] All right, so tell me about the new lightning website. What's the best way to talk about it? Lightening the operating system? [27:03.020 --> 27:11.020] I'm curious to know how you can see of it and how you both do. It's such an impressive launch, it's a very impressive demo. [27:11.020 --> 27:14.020] I'd love to know about the process and your vision here. [27:14.020 --> 27:21.020] Yeah, for sure. So if you go to lightning.au today, you're going to see the new homepage for the lightning community, right? [27:21.020 --> 27:28.020] So I think the first thing to note is, you know, PyTorce lightning has grown. The project is no longer called PyTorce lightning. It's called lightning now, right? [27:28.020 --> 27:34.020] Because when it was just PyTorce lightning, you let you do one thing which is build models. So that's cool. [27:34.020 --> 27:41.020] Except that when you build that model, there's a ton of other stuff you have to do around it. You need to, you know, wrangle data and you have features stores. [27:41.020 --> 27:47.020] You need to manage experiments, right? You need to do a lot of the stuff that you guys are doing. Adalize it, understand what's going on. [27:47.020 --> 27:55.020] So what we are now enabling the framework to do, so the framework is now lightning. It enables you to build models still. You can do that. [27:55.020 --> 28:03.020] But now when you want to build research workflows or production pipelines, you can now do that within the framework as well in the lightning way, right? [28:03.020 --> 28:08.020] And what we really want to do is allow people to connect the two stitch together the best tools in class. [28:08.020 --> 28:13.020] So we're really thinking about it as kind of the glue for machine learning, right? [28:13.020 --> 28:18.020] So if I want to use weights and biases, feature acts with this other thing, I should be able to, right? [28:18.020 --> 28:24.020] And what we really, I think you should think about us like Apple, like we're really introducing kind of the iPhone equivalence, right? [28:24.020 --> 28:29.020] So that people can build apps on there, so they can build their own apps and publish them. [28:29.020 --> 28:39.020] But these apps are extremely complex workflows, right? They're not just demos or something like that. These are actual end-to-end production workflows or research workflows that can run in distributed cloud environments, right? [28:39.020 --> 28:45.020] But they stitch together the best in class tools. So lightning AI today is really the page for where these apps get published. [28:45.020 --> 28:53.020] So if you're trying to start an e-machine learning project, you can go there, find something similar to what you're working on, run it on your infrastructure very quickly within minutes, [28:53.020 --> 28:55.020] and then change the code and off you go, right? [28:55.020 --> 29:00.020] And so I think some of the things that I'm super excited about, and you know, I have chat about a lot about this is water, [29:00.020 --> 29:06.020] some of those integrations we can do with partners, right? And so what are some of the great tools that we can enable, for example from weights and biases there, [29:06.020 --> 29:10.020] and so people can embed into their apps in really cool ways that probably are not possible today, right? [29:10.020 --> 29:21.020] And so it's really around that. I think I like to part in with every single framework and every single tool out there to help them shine and really provide the best capabilities of what they have for the community, right? [29:21.020 --> 29:23.020] So I think that's what we're shooting for. [29:23.020 --> 29:32.020] And I guess how long has this been in the works? Like how did you, like, I mean it seemed like a pretty different vision as I understand it than, you know, [29:32.020 --> 29:40.020] pie torchlighting when it first came out, like how did you come to it? And was this always on your mind ever since he started the company? [29:40.020 --> 29:44.020] Yeah, for sure. So that was definitely a vision from day one. [29:44.020 --> 29:48.020] It's just, it's really hard to build up front so you really have to do the work for it. [29:48.020 --> 29:55.020] But you know, that's how lightning kind of sorry pie torch lightning had already kind of started to do a lot of this. [29:55.020 --> 30:00.020] I mean, we were so on the first or the partners there, right? So when pie torch lightning first launched, [30:00.020 --> 30:07.020] you know, we have to go back to 2019, you know, I don't know, may June, whatever it was, you have frameworks that were running. [30:07.020 --> 30:14.020] And if you wanted to watch your experiments or something, it was really hard to do, right? You had to integrate something. [30:14.020 --> 30:18.020] And so you had tensor board. I think you guys were probably live by that noise soon. [30:18.020 --> 30:23.020] And it was like no one knew about these things because they weren't there, right? They weren't easy to use. [30:23.020 --> 30:28.020] And so one of the first things we did was I personally used tensor board, right? So I used to back then. [30:28.020 --> 30:32.020] And I was like, hey, you know what, I don't want to start it up myself. Let me just let this thing do it. [30:32.020 --> 30:40.020] And so we started integrating that in there. And then very quickly, you know, your user started coming by and saying, hey, can we add with some biases and so on. [30:40.020 --> 30:46.020] And then we kind of came up with these abstractions and then suddenly people could use it implicitly. And that was amazing, right? [30:46.020 --> 30:50.020] Because it started to stitch together tools. So that vision started back then already, right? [30:50.020 --> 30:56.020] And then if you look at the accelerators, right? So we wrote this API called Accelerate, which lets you train on different hardware. [30:56.020 --> 31:03.020] This is back summer 2020. And it powers all of lightning. But that's what it is, right? And it allows you to go between CPUs and GPUs and GPUs and GPUs. [31:03.020 --> 31:10.020] And I think we're the first framework to actually let you do that seamlessly, right? So pie towards supported Accelerate for CPUs and supported GPUs. [31:10.020 --> 31:17.020] But you have to rewrite your code over and over again, right? So we introduced for the first time the ability to go between GPUs and GPUs just like that, right? [31:17.020 --> 31:23.020] And that really changed the game. And so that's been amazing because that wasn't integration. So it started to become a platform back then, right? [31:23.020 --> 31:31.020] And so kind of the, for me was, okay, how can we do more of this? It's set that in the model. You're very limited to just these kind of things, right? [31:31.020 --> 31:36.020] But when you start talking about feature stores and deployments and all that stuff, you need something a little bit higher level. [31:36.020 --> 31:43.020] And again, I'm lazy and I hate learning new things. So I was like, okay, how do we make it just as easy as lightning? [31:43.020 --> 31:48.020] So that if you know, pie towards lightning, you already know how to build production systems. And so that's kind of what we released. [31:48.020 --> 31:54.020] And the hard part was getting it to exactly be like lightning. What does that DNA, how does a user experience feel like? [31:54.020 --> 32:03.020] I'm curious how you think about product development and customer feedback. Like it felt like you created a lot, like, you know, from your own vision. [32:03.020 --> 32:14.020] Like how much of what you do is sort of like, inform us by your gut and how much of it is coming from, you know, like a user saying, like, hey, like XYZ, like, could you make something that does this or this or this? [32:14.020 --> 32:18.020] How do you think, how does your product development process look like? [32:18.020 --> 32:26.020] Yeah, so I think I'm probably the worst person to ask us because I don't care what anyone's doing. I legitimately don't. I don't look at what people are doing. I don't care. [32:26.020 --> 32:31.020] Right. We're going to do what we're going to do. And we, we're going to do things that I think are interesting. [32:31.020 --> 32:38.020] And so we're going to basically form a thesis around something that we want to do and we will see the behavior of the users, of course, right? [32:38.020 --> 32:47.020] But if you, if you only talk to users, we speak to users all the time by the way, right? So it's not about that. We take their feedback in, but users are always going to tell you kind of incremental things. [32:47.020 --> 32:54.020] They're always going to tell you they want this better. They're never going to tell you they want the iPhone. They're always going to tell you, can you make my blackberry keyboard slide out instead or whatever, right? [32:54.020 --> 33:04.020] So, so you have to have just a different mentality there where you take things with a grain of salt and you do take their inputs, but it's really those inputs are going to usually improve the product. [33:04.020 --> 33:17.020] So they're not going to help you be kind of like generate, create like a leapfrog product, right? And so that's really where again, I just don't care what people are working on. I'm just going to do what I think should be done for machine learning and that's what we build next, right? [33:17.020 --> 33:20.020] And sometimes we're wrong and sometimes we're right. [33:20.020 --> 33:34.020] Do you think it's important to hire people with a machine learning background to do the kind of work that you do or do you look for people with kind of more like an operational or like engineering or database background? [33:34.020 --> 33:46.020] So I guess first and foremost, I care that people are creative, driven and interesting in some way, like they just have interest and they're like not just the same kind of cookie cutter persona. [33:46.020 --> 33:52.020] So that's the first thing, right? Then after that, yes, I want you to be good at your thing, whatever your thing is, right? [33:52.020 --> 33:57.020] Now specifically machine learning, like yeah, it's nice to have, please pile means I hope you know what you're doing with it. [33:57.020 --> 34:05.020] If you're on the lighting team, you will want 1,000% need to know and every single person the lighting team is a PhD or came out of a PhD program, so they're all experts and stuff. [34:05.020 --> 34:09.020] But everyone else who's around that, I just want you to be really good at your thing. [34:09.020 --> 34:17.020] I don't care how you got that knowledge, right? I don't care. Like remember, I didn't go, well, I eventually wanted to fancy skills, but for like most of my life I hadn't, right? [34:17.020 --> 34:20.020] And so I didn't really care about that. [34:20.020 --> 34:26.020] So yeah, I think machine learning is not necessarily a deal breaker, it just depends on your particular role, right? [34:26.020 --> 34:32.020] Now, I could be wrong. How does the lighting team fit into the broader company team with the distinction there? [34:32.020 --> 34:42.020] Yeah, so the lighting team works on all the open source stuff and then we have people who work on all the close or stuff, right? So when you run lighting apps on your own, you're using all the free stuff. [34:42.020 --> 34:46.020] When you run it on the cloud, that's when you use some private proprietary stuff, right? [34:46.020 --> 34:53.020] So you can take a lighting app, you fork the cloud, you even models on all that stuff, you're on a locally, but if you want around on the cloud, you say, "That's just cloud." [34:53.020 --> 34:59.020] And then that stuff is now being built by the other people who are not lighting teams people, right? [34:59.020 --> 35:05.020] And these people are infrastructure people, they're database people, they're from all sorts of walks of life, I guess. [35:05.020 --> 35:17.020] And I think that diversity is always better in this world because there's just a lot of nons, and you and I know both know this, that like ML is evolving, like we just don't know what's going to need to be built next, right? [35:17.020 --> 35:20.020] So we kind of have to have a research hat on a little bit. [35:20.020 --> 35:28.020] Are there like top of my applications that you hope get built on your lighting platform right away? [35:28.020 --> 35:31.020] Or the next things that you're excited about? [35:31.020 --> 35:47.020] Yeah, so I think, I mean, top of my end right now is a few of these key partners that we've been working with for a long time, like you guys, where we want to make the tools just more widely adopted and bring more visibility to them and have the ability for people to mix and match and more, right? [35:47.020 --> 35:54.020] So it's really about the immediate partner, some of these include cloud providers, some of these include, like the hardware makers and so on. [35:54.020 --> 35:59.020] So we've had really good relations for a long time. So it's about enabling those tools to work first, right? [35:59.020 --> 36:08.020] In terms of capabilities, I do think that we do want to make sure that people have a really good way to, I don't know, to do inferencing, for example, right? [36:08.020 --> 36:16.020] So we're partnered with the cloud providers to do that, like SageMaker team and so on, and then I think for people who want to do anything with data, right? [36:16.020 --> 36:22.020] So I'd love to partner with like the snowflakes and the data breaks of the world to enable these things as well. [36:22.020 --> 36:36.020] And then there's all the labeling things that people are starting to do as well, right? So I don't know if you guys are doing anything there, but obviously, you know, I'm happy to partner and any of these, but yeah, it's, I think it's those things that are immediately around the model development part, right? [36:36.020 --> 36:40.020] There's a lot more that we can do, but we really want to focus on this part first. [36:40.020 --> 36:49.020] Would you ever work with frameworks that aren't Pythurts like do like a psychic integration or XG boost or anything like that, is that within scope? [36:49.020 --> 36:57.020] Yeah, for sure. I mean, people, it's crazy. People use lightning for ulcer, so stuff, but people have actually ran escalern in lightning. [36:57.020 --> 37:03.020] I think you're going to holly did that, but I was like, how are you doing this? [37:03.020 --> 37:13.020] So yeah, honestly, I love doing a great all-of-frame works. Like, you know, I'm long pie torch in general, but I don't have anything against TensorFlow and JAX and Keras or any of these things, right? [37:13.020 --> 37:30.020] I think any partnerships there were happy to obviously work with and enable it to us as well. Like, again, I think that we really evolved from where we were before to a point where we're saying, okay, now that we're able to support a lot more than we could, and it before, so it's a function of having bandwidth, right? [37:30.020 --> 37:37.020] Now we can support a lot more than we could, we want to do that, right? Let me show you how we welcome these partners as well. So yeah, we're happy to work with any framework. [37:37.020 --> 37:43.020] I'm just curious why are you long pie torch over the long term? [37:43.020 --> 37:53.020] I think that a lot of these frameworks have converged in functionality, I haven't gone back and used TensorFlow and I think it's probably changed quite a bit. [37:53.020 --> 37:59.020] You know, we just not so much work already in pie torch, that I think we're just excited to continue improving that user experience. [37:59.020 --> 38:03.020] I think if Google wanted to partner with the other ones, we'd be happy to do that as well. [38:03.020 --> 38:10.020] But I kind of believe that you can really do everything well, and so it's a function of having focus as well as a company. [38:10.020 --> 38:17.020] And anything of particular in pie torch, I think it's really become the standard for research and also production nowadays. [38:17.020 --> 38:23.020] And yeah, I firmly believe that that team has done a really good job at continuing to push a boundary. [38:23.020 --> 38:32.020] So I think that the energy, the way that the team thinks about things and how it's approached, even doing production workloads and inference, [38:32.020 --> 38:40.020] it's just very unique and different. And I don't know, I like unique and different thinking, I guess, so I gravitate towards that. [38:40.020 --> 38:54.020] I guess one of the things that I struggle with as we scale our company and our team, you know, we hire all these really creative smart people that have, you know, slightly different points of view, envision and stuff, [38:54.020 --> 39:01.020] and kind of keeping things aligned, keeping consistency always feels like, you know, like a lot of work to me. [39:01.020 --> 39:08.020] I'm curious how you've dealt with that if that seems like if that's been an issue for you as you scale up to 60 people. [39:08.020 --> 39:15.020] Yeah, I think, you know, I think you always want to take everyone's input into account, but you also want to be opinionated. [39:15.020 --> 39:22.020] That's a difference, right? And I think that when everyone just says whatever and then they'll do whatever they want, [39:22.020 --> 39:32.020] you end up with something that isn't really cohesive, right? And so to some extent, you got to be a little bad of the bad guy and just say, hey, you know what, cool, I get it, but like we're going to go this way, right? [39:32.020 --> 39:40.020] And that's just the way it is. And it's a lot of these micro decisions I get made, it's not just me, right? It's people on the team, we're encouraged to be opinionated. [39:40.020 --> 39:48.020] And so, you know, it's kind of the same philosophy that we're for lighting, it's like cool, you don't like sub-classing things, cool sounds good, go use something else, we don't care, right? [39:48.020 --> 39:52.020] This is the way that we think it should be built, and that's fine. [39:52.020 --> 39:56.020] Well, look, we always end with two questions that I want to make sure we get to them. [39:56.020 --> 40:10.020] So the second last question is, if you had a little more time on your hands or, I guess, you know, if you had time to work on something else in ML research broadly, what would it be? [40:10.020 --> 40:20.020] So, if I were back to doing just research right now, I appreciate what I've continued on the self supervised learning route, I still track that work. [40:20.020 --> 40:32.020] I believe that, you know, we publish a paper about this like a year ago, so I'm going to talk about that, but I believe that a lot of the things that have been pushed into self supervised learning, a lot of those advancements, [40:32.020 --> 40:38.020] are actually not necessarily being driven by the methods, you know, like negative sample this versus that. [40:38.020 --> 40:46.020] I think it's actually been driven by the transforms, right? And so the paper that we publish a well back, I would have continued on this line as my answer, I guess. [40:46.020 --> 40:58.020] The paper that we publish a well back showed that we could achieve very similar performance, like simply using a plane VIE, without any of the fancy tricks, and actually we removed one of the terms of the elbow loss. [40:58.020 --> 41:09.020] And why we could do that is because we took the simpler transforms and used them, right? But then the way that we generated the negative samples was using the transforms and then you reconstruct the original, right? [41:09.020 --> 41:25.020] And so that actually created a really good learning signal, and what that showed me and showed our group as well was that, you know, it's not about a fancy negative sampling algorithm and whatever thing you're doing with, I don't know, information theory, whatever thing you're coming up with. [41:25.020 --> 41:30.020] It's that I think that we're just embedding most of these things into the transforms and the transforms are actually pulling the way. [41:30.020 --> 41:36.020] We actually is kind of aligned with what the data scientist has been saying forever, it's about the data, right? It's about the data. [41:36.020 --> 41:42.020] So it turns out that we've just pushed all that knowledge into transforms now for images specifically. [41:42.020 --> 41:52.020] And so I'm a little bit sad about that, but at a minimum, I would probably continue that route exploring, how can I, how can I reduce the complexity of these algorithms? [41:52.020 --> 41:58.020] I don't want these tricks, I don't want these like weird learning grade schedulers and all this stuff. [41:58.020 --> 42:06.020] I want the super simple like FIAE laws or something super basic that I know why it works and I can pinpoint exactly why it's doing what it's doing. [42:06.020 --> 42:16.020] And I think self supervised learning has kind of lost its way and that mostly strippers are like brand new paper that does this and it's like, oh, they changed his one tiny term and it's like, come on guys. [42:16.020 --> 42:18.020] Interesting. [42:18.020 --> 42:29.020] Well, my last question is when you look at people that are trying to make machine learning work for real stuff, like, you know, companies like Facebook or Bloomberg or anyone. [42:29.020 --> 42:36.020] And they're kind of going from like, here's an idea of something we want to apply machine learning to to like deploy and working in production. [42:36.020 --> 42:41.020] Where do you see the biggest bottleneck right now in summer 2022? [42:41.020 --> 42:45.020] You know, it's like that meme where it's like, expectation and reality. [42:45.020 --> 42:49.020] I think that's why we we see all the time. I want to. [42:49.020 --> 42:51.020] Yeah. [42:51.020 --> 43:06.020] Yeah, you know, I think there's a lot of them in a like, we were just unknown like the thing is so new that you stress us it in a production system and things break and you're like, ah, my chat about Teresa is something you're like, yeah, well, no one's deployed a chat about before. [43:06.020 --> 43:11.020] So of course, you're going to learn that lesson right. So there's a lot of new unknowns that we're discovering. [43:11.020 --> 43:21.020] But I think a lot of it is the explosion of tooling that's out there and the lack of a standard on how to use that tooling together, right. So I think that's a lot of what's holding us back today. [43:21.020 --> 43:29.020] You know, I think there are many ways to solve that problem. I think that we're obviously taking a stab at that with the things that we've just introduced. [43:29.020 --> 43:38.020] And so I honestly think that's a big part of it. Now I believe that that's only a part of it. I think that the other ones are. [43:38.020 --> 43:52.020] Yeah, this fragmentation like, you know, you, you, everyone's to go from this to that to that to that and then use this on existing and then with this thing and that and it's just like, ah, right, like if we just have a standard and everyone works together, we can actually do well. [43:52.020 --> 44:03.020] I honestly think there's like a super unhealthy, like weird competitive thing in a mouth like guys, this is a massive market. There's a ton of people who are going to pay for this thing like it's not about one or the other two. [44:03.020 --> 44:10.020] Everyone's using all the tools together, right. And so this unhealthy competition thing is actually costing a lot of these problems, right. [44:10.020 --> 44:21.020] I think actually if the community worked together more and we had better communication and collaboration between frameworks and between open source projects and and you know tools like you guys. [44:21.020 --> 44:34.020] Then things would be all that easier because we speak into each other and then some random engineer sitting in like Facebook doesn't have to waste six months being like, man, if they just did this one thing could have been so much easier, right. [44:34.020 --> 44:40.020] Awesome. I hope you can find some ways to like to work together. [44:40.020 --> 44:47.020] Just think of that one, just think of that person just be like, I will get you your career back don't worry, right. That's the goal. [44:47.020 --> 44:51.020] I'm a real good thing. We're rooting for you. We'll make it work. [44:51.020 --> 44:53.020] Thanks. [44:53.020 --> 45:03.020] Yeah, thanks for having me. This is super fun. And by the way, I'm a big fan of everything. Yes, we're doing so appreciate everything you've done for for the ML community as well. Awesome. Likewise. [45:03.020 --> 45:11.020] If you're enjoying these interviews and you want to learn more, please click on the link to the show notes in the description where you can find [45:11.020 --> 45:19.020] links to all the papers that are mentioned supplement and material and a transcription that we work really hard to produce. So check it out. [45:19.020 --> 45:29.020] [Music]

61.18516

44.47353

3y ago

58s

Nov 21 '22 16:54

3snshqc1

Finished

Nov 21 '22 16:54

2625.816000

/content/spence-green-enterprise-scale-machine-translation-a9btvxaki1q.mp3

tiny

[00:00.000 --> 00:02.000] [MUSIC] [00:02.000 --> 00:06.072] Translation is then this sort of space of so-called AI complete problem. [00:06.072 --> 00:13.060] So solving it would be equivalent to the advent of strong AI, if you will. [00:13.060 --> 00:16.080] Because for any particular translation problem, [00:16.080 --> 00:19.060] world knowledge is required to solve the problem. [00:19.060 --> 00:21.044] >> You're listening to gradient descent. [00:21.044 --> 00:24.000] I show about machine learning in the real world. [00:24.000 --> 00:25.040] And I'm your host, Lucas B. [00:25.040 --> 00:29.020] Well, Spence Green is a machine translation researcher. [00:29.020 --> 00:32.008] And also the CEO of a startup called Lilt, [00:32.008 --> 00:35.084] which is a leading language translation services company. [00:35.084 --> 00:38.096] He has been using TensorFlow since the very beginning. [00:38.096 --> 00:41.052] And has been putting deep learning models in the production [00:41.052 --> 00:43.040] for longer than all of us any of us. [00:43.040 --> 00:46.032] I'm super excited to talk to him today. [00:46.032 --> 00:48.040] I think the best place to start here is, you know, [00:48.040 --> 00:51.048] you're the CEO of Lilt and you vote Lilt. [00:51.048 --> 00:53.084] Maybe you can just give us a description of what Lilt is [00:53.084 --> 00:56.000] and what it does. [00:56.000 --> 00:58.072] >> Well, I think it's important to sort of say [00:58.072 --> 01:02.056] where the company came from and the problem that it solves. [01:02.056 --> 01:06.088] And then, you know, I can kind of explain what it does. [01:06.088 --> 01:08.080] I think what it does follows from that. [01:08.080 --> 01:09.064] >> Perfect. [01:09.064 --> 01:10.044] >> Yeah, that's great. [01:10.044 --> 01:11.044] >> Where it started? [01:11.044 --> 01:13.064] At least for me personally in my mid 20s, [01:13.064 --> 01:16.008] I decided I wanted to learn a language. [01:16.008 --> 01:18.072] And so I moved to the Middle East for about two and a half years. [01:18.072 --> 01:21.096] And while I was there to important things happen, [01:21.096 --> 01:27.008] the first was I learned that as I was learning Arabic and out of friend. [01:27.008 --> 01:30.048] And I was talking to him one night and he said, [01:30.048 --> 01:32.044] he was like the building watchman in my building. [01:32.044 --> 01:35.028] I was talking to him and I was like, what did you do in Egypt? [01:35.028 --> 01:37.012] Where he was from and he said I was in a county. [01:37.012 --> 01:39.020] I said, I really weren't learning to in a county here. [01:39.020 --> 01:41.048] And he said, because I don't speak English. [01:41.048 --> 01:44.020] I was like, okay, we're in an Arabic speaking country [01:44.020 --> 01:45.084] and you can't get a job as an accountant. [01:45.084 --> 01:51.052] And it's because you just like can't get that people make a certain amount [01:51.052 --> 01:52.072] of money if they speak English. [01:52.072 --> 01:53.092] If they don't, they make less. [01:53.092 --> 01:56.028] And I'd never really encountered that before. [01:56.028 --> 02:00.024] And like six months or so after that conversation, Google Translate came out. [02:00.024 --> 02:02.080] And I got really excited about that. [02:02.080 --> 02:04.080] And so I left. [02:04.080 --> 02:09.064] I left my job went to grad school and started working on M.T. [02:09.064 --> 02:13.004] And then a couple years later, I was at Google working on Translate, [02:13.004 --> 02:16.040] where I met John, my now co-founder and a friend's off [02:16.040 --> 02:20.048] who started the group at Google and did all the really early pioneering work [02:20.048 --> 02:22.032] in statistical M.T. [02:22.032 --> 02:25.084] And we were originally talking about books a lot. [02:25.084 --> 02:27.052] And why do books don't get translated? [02:27.052 --> 02:32.016] And we found that Google's like localization team that did all of their language related [02:32.016 --> 02:35.004] work for the products didn't use Google translate. [02:35.004 --> 02:39.084] And this was kind of amazing to me, like why would this be? [02:39.084 --> 02:44.080] And the reason is is because in any sort of business setting or non-consumer setting, [02:44.080 --> 02:46.024] you need a quality guarantee. [02:46.024 --> 02:48.088] And so an M.T. system can give you a machine mining system. [02:48.088 --> 02:52.088] It can give you a prediction, but it can't really give you a grounded certificate [02:52.088 --> 02:54.044] of correctness about whether it's right. [02:54.044 --> 02:57.080] And that's what businesses want to book publishers or whatever. [02:57.080 --> 03:02.008] So we started building these human and the loop systems where you need the human [03:02.008 --> 03:06.020] for the certificate of a correctness, but the crux of the problem is to make that intervention [03:06.020 --> 03:07.040] as efficient as you can. [03:07.040 --> 03:13.032] I mean, I guess my biggest question that I was thinking about that I've always wanted to ask you [03:13.032 --> 03:19.016] is sort of like how different is the problem of translating something properly versus [03:19.016 --> 03:25.016] sort of setting up a kind of human in the loop system with the human transited to translate well? [03:26.020 --> 03:28.076] Is it almost the same problem or is it quite different? [03:29.080 --> 03:33.016] Do you by translating it properly, what do you mean? [03:33.016 --> 03:38.076] I guess I mean, so like Google translate is just trying to give me the best possible translation. [03:38.076 --> 03:44.036] I sort of assume that what you're doing is like helping a translator be successful [03:45.008 --> 03:49.008] translating something presumably by kind of guessing likely translations. [03:49.080 --> 03:50.068] Yeah, right. [03:50.068 --> 03:51.064] So it's a good question. [03:51.064 --> 03:57.088] So the question is the mode of interaction with the machine and the way that machine translation [03:57.088 --> 04:04.068] systems have been used really since the early 50s was when this line of research started. [04:04.068 --> 04:09.096] It's funny that machine translation was like this really old machine learning task and [04:10.076 --> 04:15.088] originally people thought the digital computers that were developed during the second world war for [04:15.088 --> 04:21.016] bomb making and for cryptography the initial idea was well Russian is just [04:21.072 --> 04:25.048] English encrypted in Cyrillic and so we can just decrypt Russian. [04:26.012 --> 04:30.052] And so the initial systems that were built in the 50s weren't very good. [04:30.052 --> 04:35.008] And so the like naive idea was well, just take the machine output and pay somebody to fix it. [04:35.072 --> 04:44.076] And that this sort of linear editing workflow is what our work in grad school was about was like going [04:44.076 --> 04:48.036] beyond that in some way like a richer mode of interaction. [04:48.036 --> 04:53.000] And what we came up with was effectively a predictive typing interface. [04:53.000 --> 04:54.092] Or there are two problems that we really wanted to solve. [04:54.092 --> 04:58.044] One was when you're doing translation, the system makes the same mistake over and over again. [04:58.044 --> 05:00.052] Documents tend to be pretty repetitive. [05:00.052 --> 05:04.020] It's an annoying user experience and it's inefficient when the system just makes the wrong [05:04.020 --> 05:05.072] prediction over and over again. [05:05.072 --> 05:11.032] So the solution to that is to have a system that does online learning which was part of the work. [05:11.032 --> 05:18.012] And the other was well how can you interact with a text string beyond just like using your cursor [05:18.012 --> 05:19.056] and fixing parts of it. [05:19.056 --> 05:22.020] And that is doing predictive typing. [05:22.020 --> 05:25.088] So if you put those two together you want to do online learning and you want to do predictive [05:25.088 --> 05:26.060] typing. [05:26.060 --> 05:31.072] It's kind of a fundamentally different system architecture than the type of system you've build [05:31.072 --> 05:34.036] for like Google translate system architecture. [05:35.016 --> 05:36.084] Although it seems fairly close. [05:36.084 --> 05:41.016] Right? I mean like the predictive typing I would think you sort of have like a language model [05:41.016 --> 05:42.044] and a translation model. [05:42.044 --> 05:46.084] It's sort of the same or at least that's how empty systems used to work or at least in my memory. [05:46.084 --> 05:47.096] Right? Is it? [05:47.096 --> 05:53.056] That's the way that the statistical systems used to work and really it was it came down to doing [05:53.056 --> 05:55.024] inference really rapidly. [05:55.024 --> 06:00.068] Well yeah it came down to doing inference really rapidly and doing inference with a prefix. [06:00.068 --> 06:07.008] So instead of just decoding a sentence with a null prefix you send it part of what the translator did. [06:07.008 --> 06:11.080] The old system we actually had a paper on this a couple years ago how to do inference with a prefix [06:11.080 --> 06:14.044] was an algorithmic problem that you had to solve. [06:14.044 --> 06:20.052] The new neural systems just do greedy beam search so it's actually pretty straightforward to do that these days. [06:21.032 --> 06:22.068] And is that what you're using? [06:23.096 --> 06:28.076] Yeah it's I mean like everything in NLP these days it's a transform our architecture and a [06:28.076 --> 06:34.020] pretty vanilla one too what our team really focuses on is domain adaptation a rapid and efficient [06:34.020 --> 06:38.060] domain adaptation so we do personalized models either at the user level or at the workload [06:38.060 --> 06:40.012] flow level for all of our customers. [06:40.084 --> 06:45.016] All right and workflow means like I said a document, sit or sort of like learning a specialized model [06:45.016 --> 06:46.092] is the wait. [06:46.092 --> 06:53.000] I think the way to think about it is more kind of like from your your early days which is [06:53.000 --> 06:57.080] anywhere that you have an annotation standard you would have a personalized model. [06:57.080 --> 07:03.056] So if you think about an a business like a marketing workflow has a has a writing standard that may be [07:03.056 --> 07:09.064] different than a legal workflow and so you would have different models for each one of those workflows. [07:10.028 --> 07:10.076] I see. [07:11.072 --> 07:16.044] I see you're actually training then thousands of more models. [07:17.008 --> 07:22.068] Yes that's correct and that also has so that's right so there are [07:22.068 --> 07:29.016] bunches of different models being trained continuously in production all the time right now and the way [07:29.016 --> 07:34.028] you can think about like what the translator does and I think what's really interesting about this task is [07:35.000 --> 07:39.032] in most machine learning settings like data annotation for supervised learning is some [07:39.096 --> 07:42.084] operating costs you have to pay people to go off and do it. [07:42.084 --> 07:47.064] It's an artificial task. Translation you can think about them they're just doing data labeling [07:47.064 --> 07:51.024] they're reading English sentences and typing French sentences as it's as soon as they finish [07:51.024 --> 07:56.092] that you just train on it. Right right and and to the models get noticed we better over time. [07:56.092 --> 08:03.064] Yes that's super cool. So I guess I'm curious about this sort of like technical details are just [08:03.064 --> 08:10.004] making this work but before getting into that I want to I'm curious like you started in 2014 is [08:10.004 --> 08:16.076] that right? Early 2015 we started 2015 yeah. So you've seen like such an arc in terms of I mean I [08:16.076 --> 08:22.052] feel like machine translation has had it's had such big changes at least like from from my perspective [08:23.040 --> 08:29.032] has has that been sort of like hard to adapt to is that like kind of helped you have yet to like [08:29.032 --> 08:36.028] you know kind of learning skills to it's a divinity. Yeah it's so the we you know we started the company [08:37.000 --> 08:43.024] in late 2014 and the system that we had which we had built at Stanford over the course of about [08:43.024 --> 08:49.064] 10 years was competitive with Google translate and so then in December 2014 you know the first sort of [08:49.064 --> 08:54.052] neural mt paper was published I mean people worked on neural mt in the 90s but it didn't work and so [08:54.052 --> 08:58.092] they sort of got it to work again there are two papers published one and December 2014 the other one [08:58.092 --> 09:06.036] in January of 2015 and it was like you know pretty promising but nowhere near production ready and then [09:06.036 --> 09:10.052] I think the thing that was really quite shocking was how quickly Google got that into a production [09:10.052 --> 09:18.060] scale system which happened in the late summer of 2016 and so we you know at that point our system was [09:18.060 --> 09:24.044] as competitive as anyone and then suddenly there was this huge leap in translation quality and so we had [09:24.044 --> 09:33.080] to and we were sort of graduating all three of us John and I and a third guy right at the like [09:33.080 --> 09:38.036] at this crossover point so we didn't really have any empirical experience with these neural [09:38.036 --> 09:45.072] machine translation systems so we had to like build a neural mt system from scratch over over the course [09:45.072 --> 09:53.032] of about six months and we went from the Stanford system was about 120,000 lines of code that [09:53.032 --> 10:01.056] had been developed over a decade going to a system that I think was about 5,700, 6000 lines of code [10:01.056 --> 10:06.076] and it's amazing it's really I mean it's really quite shocking I mean a bunch of that is like pushing [10:06.076 --> 10:12.020] a lot of the you know pushing a lot of the functionality down into the framework which everything in the [10:12.020 --> 10:18.004] Stanford system was like custom built you know so I guess 2016 what framework are you using is this [10:18.004 --> 10:27.080] cafe or is it even before? No we we wrote it in TensorFlow from the beginning so wow yeah it was [10:27.080 --> 10:33.072] I guess in okay technology that yeah I think there's some push to move to PyTorch but we've got [10:33.072 --> 10:39.048] a pretty significant investment in TensorFlow at this point yeah I would think so and and were you sure [10:39.048 --> 10:45.008] that it was going to work I mean it seems like a really kind of painful experience for a start of [10:45.008 --> 10:51.048] to do like mid flight it was terrible yeah I mean you just kind of had you kind of had to do it [10:51.048 --> 10:58.092] the results were so compelling and I think that mt really is probably of all the task within an Lp [10:59.048 --> 11:04.076] that you know deep learning is really revolutionized I think it really points it make the case that mt is [11:04.076 --> 11:11.008] probably the sort of most significant example like the recent language modeling work of courses is [11:11.008 --> 11:17.080] really impressive but mt just went from being kind of funny to being meaningfully good [11:19.008 --> 11:24.044] and I guess how did you find enough like parallel corporate to to make this work? [11:26.052 --> 11:32.004] Well there's quite a bit of public domain data so for example the uin has to publish all of its [11:32.004 --> 11:37.096] proceedings in its member languages there are news organizations like the AP that publish in different [11:37.096 --> 11:43.040] languages there are open source projects that you know can know project for example that publishes [11:43.040 --> 11:47.080] all their strings and a bunch of different languages so you can you can train on that and then you've [11:47.080 --> 11:52.076] got web crawl to which is where most of the training data comes from. I see I see it's funny I [11:52.076 --> 11:59.064] remember working on mt briefly at Stanford and Felix really unfairly Google had so much more access to [11:59.064 --> 12:07.032] data. It does help to have a search engine. I don't know if you came I mean I guess if you're [12:07.032 --> 12:13.008] mostly doing web crawl then that makes it my remember just all kinds of weird artifacts from I [12:13.008 --> 12:17.008] think we were training on sort of the EU data that was in all those languages and it was just such [12:17.008 --> 12:23.016] kind of like bias towards like political meetings of yeah. Now it's just seem like ludicrous sometimes. [12:23.088 --> 12:29.032] So that I think that's in an enterprise setting that's the real value of domain adaptation and the [12:29.032 --> 12:37.000] second thing that I think is interesting is the legacy approach to enterprise you know translation [12:37.000 --> 12:43.064] within the enterprise is to just build a database of all your past translation and if you translate [12:43.064 --> 12:48.060] it something before you just look it up in the corpus and retrieve it otherwise you send it off to a vendor. [12:48.060 --> 12:54.052] So big companies that have been doing translation for decades have this big corpus that they've built up [12:55.016 --> 13:04.052] and so we train on that too and that sort of customer specific training is where you get the real [13:05.032 --> 13:12.060] improvement versus just a big general domain system. I guess at the end of the day like how much [13:12.060 --> 13:19.032] I mean is it is your sort of do you measure your results in like how how fast you can get a translation [13:19.032 --> 13:25.048] done is that kind of your core metric and I guess if so how does that change at the quality of the [13:25.048 --> 13:30.044] translation like do you kind of get diminishing returns or as it gets close to perfect can someone just [13:30.044 --> 13:39.064] like curious to a translation? Well I think that there are maybe I should say a few sentences about [13:39.064 --> 13:45.048] you know what the how the customers how a customer would work with us. So an example of one of our [13:45.048 --> 13:51.064] customers is Intel and if you go to Intel.com in the top right corner there's a drop down and you can change it [13:51.064 --> 13:58.076] into 16 you can change the site into 16 different languages and that's all of our work. So that's and so [13:58.076 --> 14:04.012] if you start looking that way you'll see translation all around you you'll see it on websites you'll see it [14:04.012 --> 14:09.064] in mobile apps you'll see it when you get on the airplane and like get 10 language options for the [14:09.064 --> 14:15.080] inflight entertainment system that's where this can be used and right now it's a problem that you can [14:15.080 --> 14:20.084] solve with people like you can hire people to solve it the problem is this is like the number of [14:21.056 --> 14:25.096] the amount of information that's being produced is far exceeds the number of people that are being [14:25.096 --> 14:30.060] produced in the world right now and so you can't just solve it with you know just with throwing [14:30.060 --> 14:38.036] bodies at it and so that's why you need some automation so an example of like that Intel website there's [14:38.076 --> 14:45.048] from their side what they just see is us delivering words and the only real metrics that matter are [14:45.048 --> 14:50.044] how quickly that gets done and the quality level that it gets done at and they don't really care whether [14:50.044 --> 14:58.044] it's machines or lemons or whatever is doing the translation work on our side it's the whole name of the [14:58.044 --> 15:06.036] game is using using automation to reduce the production cost and the production cost per word and [15:07.016 --> 15:14.012] so when you produce a word to give to an enterprise there's a translation cost and a QA cost and [15:14.012 --> 15:18.068] workflow routing cost and there's a software hosting cost there's you know a bunch of different cost [15:18.068 --> 15:24.060] buckets and it's just minimizing that but I would think that am I wrong with the majority of the [15:24.060 --> 15:30.028] cost to be the human that's doing the the translation that's exactly right so then the metrics that we [15:30.028 --> 15:36.012] care about internally have to do with making that part more efficient but that's not something that [15:36.012 --> 15:42.004] it translates into business value and then it reduces the cost of what we provide to customers and it [15:42.004 --> 15:46.068] makes it faster but those metrics are not the same metrics that our customers care about. [15:47.040 --> 15:52.084] Are there cases where kind of like you worry about with like a self driving car or where like [15:53.040 --> 15:59.072] you know someone like it's so good that you know they stop watching and the car crashes like is it [15:59.072 --> 16:04.068] does your translation ever get so good that you worry that an administrator might just start accepting [16:04.068 --> 16:12.052] like every prediction and quality might suffer. Yeah this is a good question. I think it's more of a [16:12.052 --> 16:18.068] risk and this spares out empirically and the linear post-studiting workflow that I mentioned where I [16:18.068 --> 16:24.044] just give you some machine output for some random machine and ask you to correct it and it's like not [16:24.044 --> 16:31.008] very it's kind of a passive task and cognitively it's not very engaging and so people tend to just kind of [16:32.004 --> 16:37.096] gloss through that and make mistakes. Whereas in the predictive typing it's like an active engaged task [16:38.076 --> 16:45.016] and so if they're basically cheating there then it comes down to performance management on our part of [16:45.088 --> 16:51.080] whoa this person did you know two thousand words in ten seconds like that doesn't seem that doesn't [16:51.080 --> 16:59.048] seem right so you can kind of monitor that. And how do customers think about the the quality [16:59.048 --> 17:03.072] is it sort of like an intuitive feel for it or they like spot checking it or how does that work? [17:05.008 --> 17:10.076] It's I think it's again in the same realm of an annotation standard like like your like your [17:10.076 --> 17:16.068] world where you work with we work with the customer to define what we call text specification which is [17:17.040 --> 17:22.036] what are the text requirements within each language and that usually follows from marketing guidelines. [17:22.036 --> 17:28.068] They have their brand and style and copy editing guidelines and then how is that manifest in [17:28.068 --> 17:35.048] Chinese and Japanese and German and French and so then we have a QA process where we have [17:35.048 --> 17:42.060] raiders going and rate the sentences according to that framework and then that's what we deliver back to them. [17:43.024 --> 17:48.028] Oh so you don't just deliver the you don't just deliver the result you deliver an estimate of the quality [17:49.016 --> 17:53.064] based on raiders. That's cool. They must they must appreciate that. [17:53.064 --> 17:57.080] Or is that industry standard to do that? No. [17:58.076 --> 18:03.088] There's some vendors that are you know they'll implement like a scorecard and they'll [18:03.088 --> 18:09.008] give you the scorecard back with the deliverable but we just try to keep it. We just count the number of [18:09.008 --> 18:15.016] sentences where there's some annotation error and then we fix those but it gives you some sense for like [18:15.016 --> 18:23.024] what the overall error rate is. Got it. So I guess I'm sure you've seen you know I think people have [18:23.024 --> 18:29.040] pointed out that you know like in translation there can become of like ethical issues like I think [18:29.040 --> 18:35.040] people notice that you know Google was in languages where the pronouns are you know gender specific kind of [18:35.040 --> 18:40.092] making you for sort of like traditionally male occupations. Is that something that you like think about [18:40.092 --> 18:51.064] or like incorporate into your models at all? Well I mean I think the nice I mean to [18:51.064 --> 18:59.024] just give you you know part of my work in grad school was on on Arabic and when you work with you know [18:59.024 --> 19:05.072] Arabic corpora there's almost all male pronouns because it's coming from newswire and most of the [19:05.072 --> 19:12.068] people who are active politically in the Arab world are male so that's the representation in the data and [19:12.068 --> 19:20.036] so systems will tend to predict you know masculine and pronouns for lots of different things but then [19:20.036 --> 19:27.016] the human and loop model you have people who are there correcting that and they can use the suggestion [19:27.016 --> 19:34.092] or not and by that annotation you'll get a different statistical trend that the system will start to learn. [19:35.064 --> 19:42.036] I said so it's sort of self correcting. Cool I guess I guess I really am interested to know about the [19:42.036 --> 19:47.048] technical details of your your systems but as you can share I mean you are a super early user of [19:47.048 --> 19:52.020] TensorFlow and you have all these models running in production. I mean can you like an a high level [19:52.020 --> 19:57.032] to sort of tell me like how the system works and how it's evolved like these like TensorFlow is [19:57.032 --> 20:03.000] serving to serve these up or how do you even even like run all these models in production at once? [20:04.060 --> 20:16.028] Yeah it's an interesting so I think that maybe the most interesting part of it is how do you [20:16.028 --> 20:24.052] to the interesting cloud problem to solve of which there are several but I think the big ones are [20:26.004 --> 20:31.096] you have a budget if you're implementing predictive typing you have a budget of about 200 milliseconds [20:31.096 --> 20:39.040] before the suggestions feel sluggish and so that means that the speed of light starts to become a [20:39.040 --> 20:45.088] problem and so you have to have a multi region set up because our community of people who are working [20:45.088 --> 20:51.016] are all over the world usually hire translators within their linguistic community that are fluent in [20:51.016 --> 20:56.004] that native language so we have people all over the world so the first thing is it has to be a multi [20:56.004 --> 21:04.036] region system the second is it's doing online learning so you have to coordinate model updates [21:04.036 --> 21:11.016] across regions and the third thing that I think is interesting is to make you know to make inference [21:11.016 --> 21:17.072] fast commonly like in a big large scale system like Google translate you'll batch a bunch of requests put it on [21:17.072 --> 21:22.036] you know custom hardware run it and then return it but if you're switching and personalized models [21:22.036 --> 21:27.096] to the decoder basically on every request then you have to run on the CPU and you have to have a [21:27.096 --> 21:34.084] multi-level cache to be pulling these models up and off of cold storage and loading them onto the machine [21:34.084 --> 21:41.000] so that's been a lot of the engineering is to make it fast worldwide and to make the learning [21:41.000 --> 21:50.044] synchronized worldwide and I guess you know you mentioned that there's like some notion of switching [21:50.044 --> 21:57.064] to high-torch of why what would push that at all this is where my expertise like my empirical [21:57.064 --> 22:03.080] limitations run into a line I think at you know like ease of the two things that I've heard from our [22:03.080 --> 22:09.032] team are you know you can prototype faster and pie torch than intense or flow and then there've [22:09.032 --> 22:15.016] been some backwards compatibility issues on the range I'm like from TensorFlow 1 to TensorFlow 2 there tend to [22:15.016 --> 22:21.080] be more breaking changes and so we've got our system running and some TensorFlow 2 compatibility mode [22:21.080 --> 22:27.056] with some frozen graphs from before and so that you know that that's been a little bit of a problem [22:29.000 --> 22:36.028] has I mean I think one just a notable thing from from our perspective has been the sort of like [22:36.028 --> 22:42.092] rapid ascendance of hugging face has that been relevant to you at all do you use it anywhere we don't [22:42.092 --> 22:52.044] I think when it's funny when when that paper when the transformer paper came out I I went to grad school [22:52.044 --> 22:58.036] you know a she-asho is the swanie was a contemporary grad school and then yacobus great was is been a [22:58.036 --> 23:03.032] great friend of our company and so we kind of called yacob the next day and we're like let's talk [23:03.032 --> 23:09.032] about this and so we talked it through and we started working on it and it's a really tricky it was a really [23:09.032 --> 23:14.084] tricky model to get working correctly and it took some time so we started I think that paper came [23:14.084 --> 23:19.088] on like a Tuesday have never served and I think you're in started working on the implementation on [23:19.088 --> 23:26.004] like Wednesday morning wow something like that and it you know before it was like December or January [23:26.004 --> 23:31.064] before we had like a working model and I think their tensor to tensor release you know helped [23:31.064 --> 23:38.012] a lot there's some of the black magic is in there that helped so this was like 27 mid 2017 but it's you know [23:38.012 --> 23:43.056] it's it's tricky to get working right in production so I think having a library that you know people can [23:43.056 --> 23:50.028] use more broadly that may not have the same you know internal resources to get these systems working it's [23:50.028 --> 23:59.064] really really really valuable totally totally did you think that given your like do you're sort of like [23:59.064 --> 24:06.068] latency and throughput requirements mean that your models are different at all then then what a Google [24:06.068 --> 24:16.004] translate might use yes if you're running on custom hardware you can of course afford to run [24:16.004 --> 24:21.032] you know higher dimensional and more expressive models so we have to do quite a bit of work with knowledge [24:21.032 --> 24:27.024] distillation to try to compress the models so that inference is fast on the CPU and we've also been [24:27.024 --> 24:31.096] it's also been really helpful until is one of our investors and so their technical teams have helped us with [24:31.096 --> 24:38.028] some optimizations to make it run faster on on the CPU and that's been really valuable that's cool [24:39.000 --> 24:45.016] these different models at all for different language pairs it's kind of yeah the answer short answer is yes [24:45.016 --> 24:51.024] there's there's a general domain model that for every language pair that the domain adaptation [24:51.024 --> 24:57.040] starts from and it basically just forks off of that and then the the model forks starts learning and so we [24:57.040 --> 25:07.024] change the general domain models much less frequently and so we just actually yesterday released new [25:07.024 --> 25:12.052] models for English Japanese and Japanese to English and one of the researchers has been working on [25:12.052 --> 25:17.024] much deeper much deeper encoder so I think the one that came out yesterday as like a 12 layer encoder [25:17.024 --> 25:21.088] where it's historically we've been running like a four layer encoder or something like that so now over the [25:21.088 --> 25:27.048] over the you know the next little bit will be moving more of our general domain models to you know some [25:27.048 --> 25:32.076] of the current state of the model architecture at your general domain models though those are different [25:32.076 --> 25:39.024] free language pair writers they're sort of one oh that's that's an important point so one of the I think [25:39.024 --> 25:43.088] one of the most exciting papers in the last couple of years was you know training multi-source [25:43.088 --> 25:50.036] multi-target models and so Google had a paper you know last year or the year before but they just like [25:50.036 --> 25:55.048] piled all the corporate together and trained this huge neural network and this is really hard to think [25:55.048 --> 26:00.076] about coming from you know the statistical empty days because it's just like crazy to do and [26:02.020 --> 26:09.040] an statistical and pieces but we use some groups of languages so we'll group similar languages [26:09.040 --> 26:13.056] especially if they're low resource languages and we don't have much data and then you'll [26:13.056 --> 26:18.044] have a system that's for you know five different languages or so but there's something about that that's [26:18.044 --> 26:24.044] like so appealing like I mean I'm way on a date so I never saw that working when I was in grad school but I [26:24.044 --> 26:30.012] love the idea yeah it's a it's a it's a really attractive idea and it sounds like it's it's actually kind of working [26:30.068 --> 26:36.012] it does work yeah so I guess I don't know how much you feel you know comfortable it's [26:36.012 --> 26:42.076] spouting at this topic but I'm really curious like I mean do you how do you have a feeling on how [26:42.076 --> 26:49.080] far empty goes like do you think that human level empty is is realistic like you know it's funny [26:49.080 --> 26:55.072] when you talk about companies wanting like quality guarantees I mean I would think just you know [26:55.072 --> 27:00.092] having used a lot of Google translate in my life you know quality guarantees seem like it would be useful [27:00.092 --> 27:04.068] but also it just seems like the quality of Google translate just isn't good enough that I would want [27:04.068 --> 27:10.076] to put that you know on my website generally like do you expect that that is likely to change [27:10.076 --> 27:19.088] yeah I guess I can offer like some sort of assorted comments on thinking about that thank you [27:19.088 --> 27:27.016] in no particular order because I think they're both technical and sort of social issues to do with that [27:28.028 --> 27:33.080] and I think there's sort of sort of like philosophical issue so let's start with the philosophical issue [27:33.080 --> 27:40.052] you know translation is then this sort of space of so-called AI complete problem so solving it [27:40.052 --> 27:47.008] would be equivalent to the advent of you know strong AI if you will because world for any particular [27:47.008 --> 27:53.008] translation problem world knowledge is required to solve the problem and in their inputs that are [27:53.008 --> 27:58.044] not in the string that are required to produce a translation in another language although sorry to [27:58.044 --> 28:06.020] you off but I guess it feels like based on what I've seen lately from Google translate that it feels like [28:06.020 --> 28:11.088] less AI complete than I would have thought yes so that's the next comment that I'll make which is [28:11.088 --> 28:18.012] that philosophical statement does not it doesn't mean that within business settings you should not [28:18.012 --> 28:24.012] be using it and I'll give you an example so one space we've been looking at recently is like crypto [28:24.012 --> 28:29.072] well like the four months ago like nobody knew what a non-fungible token is so like how do you translate [28:29.072 --> 28:33.072] that into swahili and Korean well an empty system's not going to give you the answer to that question [28:33.072 --> 28:38.092] because language is productive people are making new words all the time and machines are not making up new [28:38.092 --> 28:44.068] words all the time people are and so philosophically you've got to have training data for the system to be [28:44.068 --> 28:50.020] able to produce a result people do not need training data to do that so but then I think increasingly [28:50.020 --> 28:56.068] there are a lot of business settings where it's good enough to solve the problem so you know if you go [28:56.068 --> 29:02.060] for years you can go to Airbnb and look at a flat and click translate with Google and to [29:02.060 --> 29:09.008] give you a translation it may not be perfect but it's certainly enough to convince you you want to buy this [29:09.008 --> 29:14.028] you know rent this flat and I think there will be more and more cases where fully automatic machine [29:14.028 --> 29:21.016] translation solves the business problem in hand I think that's absolutely true and then I think [29:21.016 --> 29:28.084] there's a third part of it which is sort of social and organizational which is how soon you know [29:29.080 --> 29:37.032] VP of marketing are you willing to let raw machine translation go on your landing page with no oversight [29:38.028 --> 29:43.080] uh huh and that's one way to think about that is like how soon are you Lucas ready for a machine to [29:43.080 --> 29:51.016] respond to all of your email all of my own email yeah well you don't have to say some of it probably [29:51.016 --> 29:56.012] sure but like others parts of it a little bit a little bit dangerous I mean this might be kind of an [29:56.012 --> 30:02.060] off the wall question but I have noticed my I think I have a slightly more polite writing style because of [30:02.060 --> 30:08.060] Google's like predictive text algorithm like I kind of wonder if you're slightly shaping the translations [30:08.060 --> 30:13.072] with with your predictions even if the translator is kind of getting involved in sort of making it match [30:14.044 --> 30:20.092] oh yes this is a this is called priming so it's a it's a common feature of psychological research and so [30:20.092 --> 30:24.084] one of the things that we showed in grad schools when you show somebody a suggestion they tend to [30:24.084 --> 30:31.080] generate a final answer that's closer to the suggestion than if they start from scratch so I guess there's [30:31.080 --> 30:35.040] some I mean I guess it's like maybe it's better that I write slightly more polite we don't know if [30:35.040 --> 30:39.072] they've this and good but you could just pulling your it's pulling your writing down to mean behavior so [30:39.072 --> 30:44.044] you know mean level of performance so I'm not sure that's great I'm pulling down or pulling up I don't know [30:44.044 --> 30:50.028] yeah there's going to be a level of performance right do you think of the translators [30:50.092 --> 30:57.040] kind of learn to use your system as well like do you see productivity going up for an individual that's [30:57.040 --> 31:03.024] doing this yeah this is so this we have an HCI team and this is one of the main things that they're [31:03.024 --> 31:13.096] working on right now which is I think yeah I think I remember right when we started the company [31:13.096 --> 31:19.096] one of my co-advisors a Jeff here who started tri-factor I was telling him this was like really [31:19.096 --> 31:24.028] early on I was showing him like some of the stuff we were building and we went optimize this and we [31:24.028 --> 31:29.016] wanted to do that he's a let me stop you right there and the early days of a company you're just trying to [31:29.016 --> 31:35.064] make things less horrible than they are than they are and and you're going to be in that phase for a long [31:35.064 --> 31:42.012] time and before you sort of get to the optimization phase so I think for a lot of the last number of years [31:42.012 --> 31:48.028] it was like you know catching up on neural and tea making the system faster multi-region like [31:48.028 --> 31:54.084] making the system more responsive and the browser and there was just like a lot of unbreaking work that was going on [31:55.096 --> 32:01.088] and now and we've got some pretty convincing results that the the highest you know the highest [32:01.088 --> 32:07.008] the thing that we really ought to focus on is how people use the system that the greatest you know [32:07.008 --> 32:12.092] the greatest predictive variable of performances just like the individual's identity and so when we [32:12.092 --> 32:17.040] look at how people use it there's really high variance and the degree to which they utilize the [32:17.040 --> 32:23.016] suggestions how they use the different input devices on the keyboard you know how they navigate and work [32:23.016 --> 32:27.064] through a document so the team spending quite a bit of time on user training right now actually [32:27.064 --> 32:31.096] oh so user training not like a lot of fun the interface but you're training people to use your training [32:31.096 --> 32:37.032] yeah it's just a do you they'd be able to consider doing like multiple suggestions like is that [32:38.020 --> 32:44.068] possibly better yeah we the one of the reasons that this sort of predictive approach to MT didn't [32:44.068 --> 32:51.064] work really well is because people the interfaces that were built up until our you know our work they use [32:51.064 --> 32:57.048] like a drop-down box and it turns out when you put stuff on the screen people read it which those [32:57.048 --> 33:04.044] them down so what you want to do is show them the one best prediction that's the very best prediction you [33:04.044 --> 33:10.004] can show them I see interesting I bet that's especially true when you're confident in your [33:10.004 --> 33:16.060] predictions yeah cool is there many other surprises in terms of like you're interfacing with with humans I [33:16.060 --> 33:20.076] feel like my last company was like a labeling company to set all these like kind of interesting ways [33:20.076 --> 33:27.000] that the interaction between humans and machines is as it as like the way that you engage with the [33:27.000 --> 33:33.024] human change at all over the years that you you're running this besides training maybe one of the [33:33.024 --> 33:37.056] biggest things that we learned is that historically within translation you know you know [33:38.076 --> 33:44.028] in this translation world I mentioned this you know MT work goes back to the 50s so in professional [33:44.028 --> 33:50.012] translation is a you know predates agriculture or something that's like a really an old profession right [33:50.012 --> 33:57.016] and so these people have been engaged with AI systems for like 50 years and for most of that [33:57.016 --> 34:05.016] period of time the systems are really bad so there's like kind of a lot of bias against these systems [34:05.016 --> 34:11.064] and people especially those who use them for a while when they weren't really good like they were kind of [34:11.064 --> 34:17.024] reluctant to try them I think more broadly now people are using them because MT is a lot better but [34:17.024 --> 34:22.092] we found that you know resistance to change was really significant and the way to get around that [34:22.092 --> 34:29.008] was to align incentives better with the business model which okay what do people actually want more [34:29.008 --> 34:35.088] than they want to not like embrace machine learning well they want to get paid they want to be recognized [34:35.088 --> 34:40.044] for their work they want to be appreciated they want to have a good work environment work with good [34:40.044 --> 34:48.068] people and so I think we found that focusing on those things when you did when you do those right [34:48.068 --> 34:55.056] then people are really open to let me try this automation you know let me I'm okay with the fact [34:55.056 --> 35:04.012] that you're changing the interface every week and I know like all that stuff yeah that makes it yeah [35:05.096 --> 35:10.052] is there look a feedback loop with the the ratings I think that might be an important thing too [35:10.052 --> 35:15.072] if you're then kind of rating the quality of the show yes so this is we have I believe it we just [35:15.072 --> 35:21.008] submitted a paper to EMLP hopefully it'll get in and we've been working on a bilingual grammatical [35:21.008 --> 35:28.012] error correction so what the reviewers do you can think of as another review step so we took an English input we [35:28.012 --> 35:34.004] generated some French maybe there's some bugs in the French and we give that to another person who then [35:34.004 --> 35:38.092] is going to find and fix those bugs or maybe they make some stylistic changes or like who knows what they do [35:39.056 --> 35:44.004] so that just becomes another prediction problem with two inputs the English and the [35:45.032 --> 35:51.072] on corrected on verified French or whatever you want to call it and they're going to predict the verified [35:51.072 --> 35:58.044] French and so you can use a sequence prediction you know architecture you know model for that [35:58.044 --> 36:02.092] or you can use sequence modeling for that and so the team has been working on that for about the [36:02.092 --> 36:08.060] past year and a half and they've you know they've sort of got it working now and we'll we announced that [36:08.060 --> 36:12.060] like last fall and we'll have it in production I think you know sometime in the second half of the [36:12.060 --> 36:19.000] year flat it's so cool and I guess in production what would that mean like once you finish editing [36:19.000 --> 36:26.052] it's occupant it sort of goes through and makes suggestions yeah it's a fancy it's a fancy grammar checker [36:26.052 --> 36:33.016] only it's a grammar checker that's data oriented instead of based on rules and it can learn things you [36:33.016 --> 36:37.096] know it can learn simple phenomenon like spelling mistakes but it can also learn stylistic at it [36:38.060 --> 36:43.016] what sounds like it's also incorporating the source language too right yeah so that's how it's different [36:43.016 --> 36:48.052] than like a grammarally or the you know the the grammar checker that you have in Google Docs or whatever [36:48.052 --> 36:55.008] and that instead of you know you only have one language to look at the string that you're generating [36:55.008 --> 37:01.016] is constrained by this other source language input so you can't just like generate anything you've got [37:01.016 --> 37:07.016] this sort of very strict constraint which is the source language and do you plan to like do [37:07.016 --> 37:13.032] a separate one for every single document stream or it works stream that that you have yes you can use the [37:13.032 --> 37:19.080] same infrastructure for that to use for the translation the circle what could we always end with two [37:19.080 --> 37:24.068] questions and I want to give you a little time to to show these I guess what is kind of open ended but I [37:24.068 --> 37:29.088] would be interested in thoughts and empty specifically is you know what's what's like an underrated aspect [37:29.088 --> 37:35.072] of machine learning or or machine translation that you think people should pay more attention to [37:35.072 --> 37:43.016] or that you'd be thinking about if you weren't working on low. Maybe it's around the question that you [37:43.016 --> 37:49.024] posed earlier which is the sort of human parody question with translation which there was a paper [37:49.024 --> 37:53.096] I don't know two years ago Microsoft how to paper saying "you and parody has been achieved and then [37:53.096 --> 37:58.084] you know two weeks ago Google published a paper on archive saying "you and parody has not been achieved" [37:58.084 --> 38:09.096] and I think that in our you know in our application there's a lot to translation quality which is like [38:09.096 --> 38:15.032] the particular message that you're trying to deliver to an audience which a lot has to do like how [38:15.032 --> 38:23.000] the audience feels and certainly in my time in grad school I was really focused on just like generating [38:23.000 --> 38:29.096] the output that matches the reference so the blue score goes up and I can write a paper and I think there's [38:29.096 --> 38:35.000] a lot of interesting work to think about you know broader pragmatic context of the language that's [38:35.000 --> 38:40.052] generated and is it appropriate for the context that you're in and for the domain and that's really [38:40.052 --> 38:47.048] hard to evaluate but it's it's really worth thinking about whether it's in natural language generation [38:47.048 --> 38:54.004] or machine translation or whatever else so so I think maybe thinking about that a little bit harder I would [38:54.004 --> 39:00.028] spend some time on yeah the blue score is funny because it seems like such a sad metric for translation [39:00.028 --> 39:06.028] like it makes sense that it works but it just seems so like politically simple it's hard to I mean at [39:06.028 --> 39:11.056] some point I feel like it must sort of lose meaning is the best possible metric right well people [39:11.056 --> 39:17.088] studied it a lot and you know I think the conclusion was that it's the least bad thing we've come up with and [39:17.088 --> 39:23.048] it's over two you know two decades of study it continued to be the least nobody could come up with anything [39:23.048 --> 39:29.072] it was as convenient and correlated better with human judgment so maybe it's a testament to [39:30.052 --> 39:35.024] you know simple idea that people are still using 20 years later or I guess like simple metrics are [39:35.024 --> 39:40.092] better than complicated metrics there might be a lesson there might be a lesson there too yeah [39:40.092 --> 39:46.004] I guess the final question we always ask what's the biggest challenge of machine learning in the real world but [39:46.004 --> 39:52.028] I'd like to tailor it to you a bit of just like what's been the hardest part of getting these language [39:52.028 --> 39:56.044] models to work and production you touched on a bit but I'd love to hear especially any part [39:56.044 --> 40:02.020] that might be surprising to you know yourself as an academic you know before starting the company like [40:02.020 --> 40:07.096] where the challenge has been yeah if I think back to when we started the company the research [40:07.096 --> 40:15.072] prototype that we had you could translate you had to special you had to specialize it to one document so [40:15.072 --> 40:21.008] like if you're going to translate a document you had to compile this part of it and then load it into [40:21.008 --> 40:24.092] a production system and you could send it to the document it would translate it and if you send it [40:24.092 --> 40:31.080] anything else it basically wouldn't work and I remember when we raised money for the company I told [40:31.080 --> 40:35.056] the investors I was like yeah we're going to take this prototype and have a production product [40:35.056 --> 40:40.076] like six weeks or something and what actually happened is it took us nine months and the problems we [40:40.076 --> 40:48.004] had to solve turned into an ACL paper and this is you should not do this this is very bad and I think [40:48.004 --> 40:56.012] I really underestimated how far it is from like kind of a research prototype that's actually a [40:56.012 --> 41:03.032] pretty effective system to an MVP for something like what we do which is taking any document from [41:03.032 --> 41:08.052] any company and generating a reasonable output and doing that with the learning turned on and the inference [41:08.052 --> 41:14.036] and all that stuff like getting to a large scale production system which is probably not surprising [41:14.036 --> 41:19.000] to anybody who worked at you know who's worked in these production scale and T systems but [41:19.064 --> 41:26.068] the amount of detailed large scale engineering work that has to go into that is surprising to us [41:26.068 --> 41:30.052] I think even having worked on Google translate and what could you do? [41:30.052 --> 41:36.092] sample like what was something you ran into because it does seem like that shouldn't take nine months like [41:36.092 --> 41:46.044] what? Well in those days in that original system it was you had to you had to be able to [41:46.044 --> 41:54.052] load the entire bi-text into memory so the systems stored words as atomic strings and you had to [41:54.052 --> 41:58.020] have all the strings in memory to be able to generate a translation so we do a lot of work on [41:58.020 --> 42:03.000] comp what's called a compact translation model where you can load the entire bi-text into a running production [42:03.000 --> 42:09.064] node and the lookups happen fast enough that you can generate an output. I think in the neural setting [42:10.020 --> 42:17.056] what's been really challenging is you can't do batching and you know you can't just like put it on [42:17.056 --> 42:23.080] GPU or a TPU because the latency constraint that you have so that's meant a lot of work on [42:23.080 --> 42:33.072] CPU and the way the production infrastructure swaps personalized models onto the production nodes and it seems [42:33.072 --> 42:39.008] like conceptually simple but when you actually get down into it you're like wow we've been at this [42:39.008 --> 42:43.048] for two months and we're still not quite there yet what's happening and that's sort of been [42:43.048 --> 42:48.068] our experience I think. Interesting and I guess at the time there's probably a lot less stuff to help you. [42:49.040 --> 42:55.040] Yeah there was no Kubernetes you know there was no non-animate non-of-that type of infrastructure. [42:56.020 --> 43:00.004] Awesome well thanks so much this is really fun and thanks for sharing so much about how you [43:00.004 --> 43:04.012] come to the upgrades. Yeah thanks for thanks for it it's always good to chat with you. [43:04.012 --> 43:09.056] If you're enjoying the great end of the send I'd really love for you to check out fully connected [43:09.056 --> 43:15.000] which is an inclusive machine learning community that we're building to let everyone know about [43:15.000 --> 43:22.068] all the stuff going on in ML and all the new research coming out. If you go to WMB.ai/FC you can see all the [43:22.068 --> 43:27.072] different stuff that we do including great at the send but also Salon's where we talk about new research [43:27.072 --> 43:33.040] and folks sure insights AMAs where you can directly connect with members of our community and [43:33.040 --> 43:40.020] Slack channel where you can get answers to everything from very basic questions about ML to bug reports on [43:40.020 --> 43:45.040] weights and biases to how to hire an ML team. We're looking forward to meeting you.

50.87021

51.61795

3y ago

Nov 21 '22 16:53

up5hykby

Finished

Nov 21 '22 16:53

2794.680000

/content/cl-ment-delangue-the-power-of-the-open-source-community-sjx9fsnr-9q.mp3

tiny

[00:00.000 --> 00:02.000] [MUSIC] [00:02.000 --> 00:05.000] I think through the open source model, [00:05.000 --> 00:07.060] you can do things a bit differently. [00:07.060 --> 00:11.060] We have kept the inspiration of open source for infrastructure in that [00:11.060 --> 00:12.020] base. [00:12.020 --> 00:14.016] You know, like with companies like, you know, [00:14.016 --> 00:17.044] elastic MongoDB that I've shown that, you know, [00:17.044 --> 00:20.096] you can as a startup and power to communicate in a way and [00:20.096 --> 00:24.060] create like a thousand times more values than you would by building a [00:24.060 --> 00:26.000] proprietary tool. [00:26.000 --> 00:26.040] Right. [00:26.040 --> 00:28.024] You're listening to gradient descent. [00:28.024 --> 00:30.076] A show about machine learning in the real world. [00:30.076 --> 00:32.096] And I'm your host, Lucas B. Wolfe. [00:32.096 --> 00:36.024] Client along is CEO and co-founder of Hugging Face, [00:36.024 --> 00:39.024] the maker of Hugging Face Transformers Library, [00:39.024 --> 00:42.068] which is one of the most, maybe the most exciting [00:42.068 --> 00:45.036] libraries in machine learning right now. [00:45.036 --> 00:46.064] In making this library, [00:46.064 --> 00:50.092] he's had front receipts all the advantages in NLP over the last few years, [00:50.092 --> 00:52.096] which has been truly extraordinary. [00:52.096 --> 00:56.020] And I'm super excited to learn from him about that. [00:56.020 --> 00:58.096] All right. So my first question is probably a silly question, [00:58.096 --> 01:02.000] because almost anyone watching the realistic to [01:02.000 --> 01:05.016] this would know this, but what is Hugging Face? [01:05.016 --> 01:10.064] So we studied Hugging Face a bit more than four and a half years ago, [01:10.064 --> 01:15.024] because we've been obsessed with natural language processing. [01:15.024 --> 01:19.048] So the field of machine learning that applies to text and [01:19.048 --> 01:24.024] we've been lucky to create Hugging Face Transformers on GitHub [01:24.024 --> 01:29.016] that became the most popular open source NLP library. [01:29.016 --> 01:34.080] That over 5,000 companies are using now to do any sort of NLP, right? [01:34.080 --> 01:37.008] From information extraction, right? [01:37.008 --> 01:39.096] To text, you want to extract information. [01:39.096 --> 01:42.080] So platform like check, for example, for homework, [01:42.080 --> 01:46.040] is using that to extract information from homeworks. [01:46.040 --> 01:48.088] You can do text classification. [01:48.088 --> 01:51.008] And so we've companies like Monzo, for example, [01:51.008 --> 01:55.016] that is using us to do customer support emails, classification. [01:55.016 --> 01:57.052] They receive a customer support email. [01:57.052 --> 02:02.008] Does it relate to which product team, for example, [02:02.008 --> 02:08.004] is that urgent, not urgent, to many other NLP tasks, like text generation [02:08.004 --> 02:11.008] for auto-completes or really can [02:11.008 --> 02:16.088] like any single NLP task that task that you can think of. [02:16.088 --> 02:23.084] And we've been lucky to see adoption, not only from companies, but also from scientists, [02:23.084 --> 02:30.032] which have been using our platform to share their models with the world. [02:30.032 --> 02:34.040] Test models of other scientists. [02:34.040 --> 02:37.072] We have almost 10,000 models that have been shared. [02:37.072 --> 02:41.072] And almost a thousand datasets that have been shared on the platform [02:41.072 --> 02:46.016] to kind of like help a scientist and a practitioners [02:46.016 --> 02:51.088] build better NLP models and use that in the product or in their workflows. [02:51.088 --> 02:57.052] And so hugging face transformers is the library that's super well known. [02:57.052 --> 03:02.036] And then the platform is a place where you can go to use other people's models [03:02.036 --> 03:04.072] and publish your own models. [03:04.072 --> 03:05.064] Do I have that right? [03:05.064 --> 03:06.028] Yeah, exactly. [03:06.028 --> 03:10.096] We've like a hybrid approach to building the link technology. [03:10.096 --> 03:16.084] And we feel like we need to collect the extensibility of open source [03:16.084 --> 03:21.068] and practicality of, for example, user interfaces. [03:21.068 --> 03:26.092] So we cover really a kind of full range, meaning that if you are a company, [03:26.092 --> 03:30.080] you can do everything yourself from our open source. [03:30.080 --> 03:32.004] Now talk to us. [03:32.004 --> 03:37.052] And not even go to hugging face.til, do everything from deep install transformers. [03:37.052 --> 03:42.060] If you want a bit more help, you can use our help to discover new models, [03:42.060 --> 03:46.036] find the model that works for you, understand these models. [03:46.036 --> 03:51.008] To even in a more extreme way, if you're like a software engineer, [03:51.008 --> 03:56.000] or if you're new to NLP, or even new to machine learning, [03:56.000 --> 04:02.016] you can use our training and inference APIs to train and run models. [04:02.016 --> 04:06.064] And we're going to host this inference and this training for you to make it very, [04:06.064 --> 04:14.076] simple, so that you don't have to become an NLP expert to take advantage of the latest state of the art in NLP models. [04:14.076 --> 04:15.056] That's so cool. [04:15.056 --> 04:19.000] I mean, I want to zoom in on hugging face transformers first, [04:19.000 --> 04:23.064] because it's maybe it might feel like it might be one of the most popular machine learning [04:23.064 --> 04:25.004] libraries of all time. [04:25.004 --> 04:27.092] I'm kind of curious what you attribute to that success. [04:27.092 --> 04:29.092] Like when did you start it and what were you thinking? [04:29.092 --> 04:33.012] And what did you learn along the way? [04:33.012 --> 04:36.096] Yeah, I mean, it might be, I don't know if it's a biggest machine learning of [04:36.096 --> 04:37.076] principles. [04:37.076 --> 04:41.000] It's definitely the fastest growing because it's fairly new. [04:41.000 --> 04:47.036] We released the first version of it two and a half years ago, which is not a long time ago in the grand scheme of open source. [04:47.036 --> 04:47.080] Right. [04:47.080 --> 04:51.080] If you look at all the kind of like most popular open source, [04:51.080 --> 04:57.052] you see that they usually need a very long time of maturation. [04:57.052 --> 04:57.084] Right. [04:57.084 --> 05:02.024] So the grand scheme of open source transformers is very much still a baby. [05:02.024 --> 05:04.080] But it globally, really fast. [05:04.080 --> 05:06.008] It really blew up. [05:06.008 --> 05:11.096] We were 42,000 Github stars over a million people in stores a month. [05:11.096 --> 05:16.012] I think we have 800 contributors to transformers. [05:16.012 --> 05:27.056] And the main reason why I think it's successful is to me because it really bridges the gap between science and production, [05:27.056 --> 05:35.016] which is something that makes it very new and that another lot of open source and another lot of companies manage to do. [05:35.016 --> 05:44.028] I strongly believe that machine learning can collect software engineering 1.2 or software engineering. [05:44.028 --> 05:50.012] Oh, computer science, even if computer science has science in the name of it, [05:50.012 --> 05:52.056] it's not a science driven topic. [05:52.056 --> 05:57.096] Right. If you look at good software engineers, they don't really read research papers. [05:57.096 --> 06:01.056] They don't believe follow the science of computer science. [06:01.056 --> 06:03.024] A machine learning is very different. [06:03.024 --> 06:05.056] It's a science driven domain. [06:05.056 --> 06:05.096] Right. [06:05.096 --> 06:14.024] It all starts from cup of dozen keycask and like NLP science teams over the world that are creating new models, [06:14.024 --> 06:20.020] like you know, birds, T5, you know, Roberta, all these new models that you've heard from. [06:20.020 --> 06:24.068] And I think what we managed to do with transformers is to, you know, [06:24.068 --> 06:33.064] give these researchers a tool that they like to share their models, to test models of others, [06:33.064 --> 06:39.044] to go deep into can't like the internals of the architecture of these models. [06:39.044 --> 06:50.016] But at the same time create an easy enough abstraction so that any NLP practitioner can literally use these models. [06:50.016 --> 06:55.016] Just a few hours after they've been released by the researchers. [06:55.016 --> 07:02.056] And so we created that there's some sort of magic, some sort of like a network effect or some sort of magic. [07:02.056 --> 07:07.040] When you bridge the two, we don't understand all the mechanics of it yet. [07:07.040 --> 07:14.012] But yeah, there's some sort of a network effect for each time there's a new model released, you know, [07:14.012 --> 07:17.072] like the researchers releasing it, we've seen, we've seen transformers. [07:17.072 --> 07:23.000] People are hearing about it, they're talking about it, they want to use it, the test it and transformers, [07:23.000 --> 07:25.008] they put it in production networks. [07:25.008 --> 07:29.032] So they want to support it more, the scientists is happy that these researchers, [07:29.032 --> 07:32.068] is seen, is used, is impactful. [07:32.068 --> 07:35.064] And so they want to create more and they want to share more. [07:35.064 --> 07:41.016] So there's, yeah, this kind of like virtual cycle that I think allows, [07:41.016 --> 07:47.016] who allowed us to grow, yeah, much, much faster than traditional open source. [07:47.016 --> 07:53.000] And that kind of like stroke of chord on the market and on the field of machine learning. [07:53.000 --> 07:57.056] Yeah, because it's an entrepreneur, I'm always kind of fascinated by how these virtual cycles, [07:57.056 --> 08:02.068] you know, get started. Like when you go back to an after years ago, when you're just first [08:02.068 --> 08:07.016] starting the Transformers project, like, what was the problem you're trying to solve? [08:07.016 --> 08:10.076] And what inspired you to even make an open source library like this? [08:10.076 --> 08:15.048] I could probably give you kind of like a smart, uh, third full. [08:15.048 --> 08:17.024] No, I want the real answer. [08:17.024 --> 08:22.044] Yeah, yeah, yeah, the real truth is that we didn't think much about it. [08:22.044 --> 08:25.096] You know, we've been using open source for for a while. [08:25.096 --> 08:34.004] We've always felt like in these fields, you're always standing on the shoulders of giants, [08:34.004 --> 08:37.088] of other people, the odds on the fields before. [08:37.088 --> 08:43.064] We've been used to this culture of, you know, when you do science, you publish a research paper [08:43.064 --> 08:49.096] and, you know, for research in machine learning, you even want publish, you know, open source [08:49.096 --> 08:57.024] rises and then the paper, right? And so, since they want at a taking phase, you know, we've always [08:57.024 --> 09:05.000] done a lot of things like in the open sharing in open source. And here for for Transformers, [09:05.000 --> 09:11.016] it started really, really simply with birds that was released in TensorFlow. [09:11.016 --> 09:17.000] And Thomas, I was a co-founder and and chief scientist was like, oh, it's it's in TensorFlow. [09:17.000 --> 09:23.000] We needed it in in PyTorch, right? So I think to today's after birth was released, [09:23.000 --> 09:30.044] we opened source PyTorch bird. And I was like literally the first first name of the repository. [09:30.044 --> 09:36.044] And it blew up people started using it like crazy. And then a few weeks, a few weeks after, [09:36.044 --> 09:41.096] I don't remember what model was was released. I want to say, we're better, but no, we're better [09:41.096 --> 09:48.028] was much much later. But another model was was released, maybe was GPT, actually. I think it was [09:48.028 --> 09:54.020] GPT, the first GPT was released. And I think same thing, it was really just in TensorFlow and [09:54.020 --> 09:59.088] we were like, okay, let's add it. And we felt like, all right, let's make it so that, you know, it's [09:59.088 --> 10:05.048] easier for for people to try both. Because they have different capabilities, good at different things. [10:05.048 --> 10:10.084] So we started thinking about, you know, what kind of abstraction we should be able to make it easier. [10:11.080 --> 10:18.084] And very much like that, you know, it went organically. And at some point, you know, like researchers [10:18.084 --> 10:23.016] were like, you know, I'm when I realize a new model can I really use it with the IntransWormers, [10:23.016 --> 10:30.092] and we'll see, okay, yeah, just do that. And they did that and you know, count like a snowball, it [10:30.092 --> 10:36.092] became bigger and bigger and borders to where we are now. But it's a really cool story. I didn't [10:36.092 --> 10:44.052] realize that you were trying to port models from TensorFlow to PyTorch. I mean, now you're, you work [10:44.052 --> 10:51.032] with both TensorFlow and PyTorch, right? Yeah. Do you, do you feel, did you feel at the time I guess [10:51.032 --> 10:55.080] a preference for PyTorch or what, why was it important to you to not fear as I got to move something to [10:55.080 --> 11:01.088] PyTorch? I think the user base was different, right? So we've always been passionate about [11:01.088 --> 11:07.064] you know, democratization or like, you know, making something like a bit obscure, a bit niche, [11:07.064 --> 11:14.076] making it like available to to more people. We feel like that's how you get the real power of [11:14.076 --> 11:23.008] technology is, is when you take something that is in the hands of just a few happy few and you make it [11:23.008 --> 11:28.068] like available for more people. So that was, you know, mainly go. You know, there was the [11:28.068 --> 11:35.000] right intent people were using TensorFlow, the people that are using who are using PyTorch. [11:35.000 --> 11:42.012] We wanted to make it available to people using PyTorch. We were using PyTorch ourselves extensively. [11:42.012 --> 11:48.076] We think it's like an amazing framework. So yeah, we were happy to make it make it more available. [11:48.076 --> 11:54.076] And the funny thing is that as we got more and more popular, at some point we've seen the other [11:54.076 --> 12:01.040] movement in the sense that people were saying, at some point we were actually named PyTorch Transformers. [12:01.040 --> 12:07.016] And we study having a lot of people working intensive. It was like, guys, it's so unfair. Why, [12:07.016 --> 12:14.084] why can't that just use, you know, Transformers if I'm using PyTorch? And so that's when we extended to [12:14.084 --> 12:21.000] TensorFlow and dropped the PyTorch Transformers, dropped the PyTorch in the name and became [12:21.000 --> 12:26.068] transformers to support both. And it's super interesting because if you look at our integration [12:26.068 --> 12:32.084] of PyTorch in TensorFlow, it's more comprehensive. It's more complete than just having [12:33.048 --> 12:38.068] half of it that is PyTorch and half of it that is TensorFlow. But you can actually kind of like on the [12:38.068 --> 12:46.020] same workflow in a way on your same kind of like machine learning workflow. You can do part of it in [12:46.020 --> 12:52.060] PyTorch. So for example, when you want to do more like the architecture side of it, PyTorch is [12:52.060 --> 12:58.060] really strong. But when you want to do kind of like, you know, serving, TensorFlow is integrated [12:58.060 --> 13:03.096] with a lot of tools that is heavily used in the industry. So on the same workflow, you can start building [13:03.096 --> 13:10.044] your model in PyTorch and then use it in TensorFlow, within the library, which we think is [13:10.044 --> 13:16.060] pretty cool because it allows you to take advantage a little bit of those trends and weaknesses [13:16.060 --> 13:24.004] of both frameworks. So do you get a chance to use your own software anymore? Like, do you do [13:24.076 --> 13:29.048] hugging face build applications? I read this point or are you just making these kind of tools for other [13:29.048 --> 13:37.048] people? Yeah, we play with them a lot. You know, I think when one of our most popular demo ever was [13:37.048 --> 13:43.072] something called right with Transformers, which was, myself, Catholic text, A-D-Torch, powered by [13:43.072 --> 13:51.088] some of the popular models of Transformers that got, I think, something over a thousand books, [13:51.088 --> 13:58.052] the equivalent of a thousand books have been like a written with it. It's some sort of like what you [13:58.052 --> 14:05.072] have in your Gmail to complete, but it's much more silly and creative. So it works really well [14:05.072 --> 14:11.048] when you have kind of like the syndrome of the, can you say that in English syndrome of the white [14:11.048 --> 14:16.076] white page when you don't know what to, oh yeah, I don't think we say it like that but I understand [14:16.076 --> 14:21.072] it's pretty much, yeah, in French we say the syndrome of the white blonde and you like, you want to [14:21.072 --> 14:27.072] write, but you don't know what to write about. It's helping you like being more creative [14:28.028 --> 14:37.024] by suggesting kind of like a long interesting text to it. That's really cool. So I wanted to ask you, [14:37.024 --> 14:43.000] I feel like you have a really interesting lens on all the different architectures for NLP. Like, [14:44.044 --> 14:49.016] I guess, are you able to know kind of what the most popular architectures are and have you seen [14:49.016 --> 14:56.044] changing that over the last two and a half years? Yeah, yeah, we do can see kind of like the down, [14:57.008 --> 15:04.028] download, kind of like, volumes of models. So it's super interesting to see, especially when new models [15:04.028 --> 15:11.064] are coming up to see if they're successful or not, how many kind of like people, people using, [15:11.064 --> 15:18.092] something that's been super interesting to us is that actually the number one downloaded model on the [15:18.092 --> 15:26.036] hub is a distilled bird, right? So like models that we distilled from birds. But there's also a lot [15:26.036 --> 15:35.064] of variety in terms of usage of models, especially as I felt like over the years, they became in a way [15:35.064 --> 15:43.064] a bit more specialized, even if there's still kind of like a general pre-trained language models. [15:43.064 --> 15:52.068] I feel like more and more as each new model came with some sort of optimization that made [15:52.068 --> 16:02.052] them made it like perform better. Either on short or longer text on generation tasks versus [16:02.052 --> 16:09.088] classification tasks, multi-language versus like model language. You start to see more and more [16:09.088 --> 16:18.036] diversity based on what people want to do with it and what kind of strengths and weakness to [16:18.036 --> 16:23.032] the value the most, right? A little bit like what I was talking about between, you know, [16:23.032 --> 16:33.056] Python and TensorFlow people trying to not how much decide which model is the best, which is kind of [16:33.056 --> 16:40.012] silly in my opinion, but which model is the best for which task, for which context, and then [16:40.012 --> 16:46.044] pick the right tool for the task? I guess for some of the singers who doesn't have an NLP background, [16:46.044 --> 16:52.004] could you explain what bird is and just what it does, and maybe how to disobey the first from that? [16:52.004 --> 16:58.076] Yeah, so the whole kind of like evolution in NLP, I started with a similar paper called [16:58.076 --> 17:08.036] attention is all you need, right? Which was introducing this new architecture for NLP models based [17:08.036 --> 17:16.052] on transfer learning, and Bird was the first kind of like most popular of these new generation of [17:16.052 --> 17:25.032] models, and the way they work is in a simplistic way, without getting to technical, is that you [17:26.004 --> 17:34.060] train a model on a lot of texts on what task, so for Bird, for example, it's mask feeling, [17:34.060 --> 17:39.096] you give it sentences, you remove a word, you remove the middle of the sentence, for example, [17:39.096 --> 17:46.028] and then you train the model on predicting this missing word, right? And then you do that on a very [17:46.028 --> 17:54.044] large group of text, usually, you know, slice of the web, right? And then you get a model pre-train model [17:54.044 --> 18:03.032] that has some kind of like understanding of text, that you can then find tune, and you know, [18:03.032 --> 18:08.020] the name transfer learning, because you can go from one one, you can find pre-train in task, [18:08.020 --> 18:15.024] to other fine tuning tasks, you can find tune this model for example on classification, [18:15.024 --> 18:22.020] right, by giving it like a couple of thousands of examples of the text, and classification of [18:22.020 --> 18:27.048] what customer support emails that I was talking about, classification, urgent, and not urgent. [18:28.004 --> 18:37.016] Right, and after that, the model is surprisingly good at classifying a new text that you give it [18:37.016 --> 18:42.068] based on urgency, and it's going to tell you, okay, this message, there's like 90% chance it's [18:42.068 --> 18:47.088] urgent based on what I've learned in the pre-training and in the fine function. [18:47.088 --> 18:53.096] It's like, for example, with Bert, I guess, you know, you have a model that can fill in, you know, [18:53.096 --> 18:59.056] missing words, how do you actually turn that into a model that's a classifying customer support messages? [18:59.056 --> 19:05.024] Yeah, we're fine tuning, you find tune kind of like, like adding a layer, you know, you find [19:05.024 --> 19:13.040] you in this model to perform on your own USB-C task. And that's kind of like a, a more kind of like a long [19:13.040 --> 19:20.036] term way, I think that's a very interesting way of doing machine learning because, [19:21.016 --> 19:29.080] intuitively, you almost feel like it's the right way to do machine learning. In the sense that [19:30.084 --> 19:37.016] what we've seen in the past with machine learning, and especially for startups, a lot of them, I've [19:37.016 --> 19:42.084] kind of like sold this dream of doing machine learning and doing some sort of like a data network [19:42.084 --> 19:47.072] effect on on machine learning, right, because there's a assumption that you're going to give more [19:47.072 --> 19:53.024] data to the model and it's going to perform better. And I think that's true, but the challenge [19:53.088 --> 20:02.076] has always been that you have more data and so your model performs incrementally better, but only on what [20:02.076 --> 20:10.004] you're able to do already, right? So if you're doing, I know, time series prediction, maybe you have [20:10.004 --> 20:18.036] like one billion data points, right, and your model performs at 90% accuracy, you are like maybe [20:18.036 --> 20:24.044] 99 billion, 10 billion additional data points and your model is going to perform at, you know, [20:25.024 --> 20:31.048] 90.5% accuracy, right? And that's great. I mean, that's good improvement, that's something [20:31.048 --> 20:39.016] you need, but it doesn't give the kind of, you know, increased performance that you're really [20:39.016 --> 20:45.024] expecting from a typical network effect in the sense that it doesn't make your result like [20:45.024 --> 20:52.052] 100x, 10x, 100x better than without it. With transfer learning, it's a bit different because [20:53.024 --> 21:02.020] you not only kind of like improve incrementally the accuracy on one task, you give it more [21:02.020 --> 21:11.096] ability to solve other tasks. And so you actually not only increase the accuracy, but you increase the [21:11.096 --> 21:18.028] capabilities of what your model is able to do. And so I won't go into a kind of like the crazy [21:18.028 --> 21:26.004] musk type kind of like prediction, but if you take actually an odd musk kind of like, [21:26.004 --> 21:33.072] open AI, founding kind of like, uh, story, where he's saying like, you know, we need to bring the whole [21:33.072 --> 21:41.016] community together to contribute to something open source for everyone. In twiddly, you could think [21:41.016 --> 21:48.012] that could come with actually transfer learning in the sense that you could envision a world where [21:48.092 --> 21:56.020] every single company is contributing with their, you know, data sets, with their compute, [21:56.020 --> 22:02.084] with their weights that machine learning model weights to, you know, build this giant [22:03.040 --> 22:13.048] kind of like, open source models that would be able to do, you know, 100x more things than, you know, [22:13.048 --> 22:18.036] what each of these companies could do alone. I don't know if we're going to get there in the [22:18.036 --> 22:25.032] foreseeable future, but I feel like that's in terms of concepts, that's something interesting to look at [22:25.032 --> 22:31.008] when when you think about transfer learning as opposed to the other techniques of mission learning. [22:32.068 --> 22:39.096] I guess did you have a feeling about open AI not releasing the weights for the early GPT models? [22:39.096 --> 22:48.004] I guess I need the GPT models. Yeah, so GPT GPT 2, I think it's the above, [22:48.004 --> 22:56.036] yeah, version in between, where open source, right, and it's in transformers, and we have a lot of [22:56.036 --> 23:04.004] companies using them. They're probably more, more companies using GPT 2, through transformers, and then GPT 3 [23:04.004 --> 23:11.064] today. You know, they're private companies, so I, you know, totally respect their strategy, not [23:12.004 --> 23:19.056] to open source, a model that they built. I think that's done an amazing job with GPT 3. It's a [23:19.056 --> 23:26.004] great model for everything when you want to do text generation. It's really useful. I'm really [23:26.004 --> 23:34.004] thankful for all the work they've done, democratizing the capabilities of an OP, as our goal is to democratize [23:34.004 --> 23:40.004] an OP. I feel like what they've done promoting it into more like of the startup community in a way. [23:40.004 --> 23:46.052] A lot of people realize with their communication that you could do so much more than what we've been doing [23:46.052 --> 23:55.000] so far, with an OP, which is great. I think it participated to the development of the ecosystem [23:55.000 --> 24:03.008] and putting it, putting it in our P in the camp, in the spotlights, which has been really great. We [24:03.008 --> 24:09.040] see a lot of companies starting to use GPT 3, and obviously it's expensive. It's not really [24:09.040 --> 24:17.040] extensible. You can't really update it for your own use case. It's hard to be a sort of technological [24:17.040 --> 24:23.080] competitive advantage when you build on top of an API proprietary API from someone else. We see a lot [24:23.080 --> 24:31.040] of companies using GPT 3, and then it's going to be an OP and then coming to our tools. The same way happens, [24:31.040 --> 24:38.020] I'm sure there's other way around. Some people start to repair tools, they decide to have more [24:38.020 --> 24:50.068] of the shelf. GPT 3, Google, NLP services, AWS Comprient, providing an API for an OP, as being around [24:50.068 --> 24:59.024] from this company. Everyone is part of the same ecosystem that is growing. That's a great exciting. [25:00.004 --> 25:06.052] Do you feel like there's a difference in the GPT approach versus the bird approach that you were talking [25:06.052 --> 25:12.092] about? GPT has been very high profile, and the text generation is really impressive. Do you feel [25:12.092 --> 25:19.064] like opening a AI is doing something fundamentally different? Yes, so there are both transformer [25:19.064 --> 25:29.008] models, the same same technology. We slightly different architectures. For example, when bird is doing [25:29.008 --> 25:36.044] mask failing, GPT is doing language modeling, so a next word prediction. It's a bit different. That's [25:36.044 --> 25:44.020] why the text generation capabilities are so much stronger. It does its limitation too. For example, [25:44.020 --> 25:49.080] if you want to do classification, you shouldn't do it with GPT. It doesn't make sense at all. [25:49.080 --> 25:58.020] So, yeah, this sort of different different different use cases with slight variations of the architecture. [25:58.020 --> 26:08.068] We've had people start it reproducing. GPT, we've had GPT 2, and a team called Elisir. I don't [26:08.068 --> 26:17.096] know how to pronounce it. But it's a GPT Neo a few days ago, which is the same architecture as GPT 3, [26:17.096 --> 26:23.032] which we've less weights for the moment, but they intend to kind of like grow the weights. [26:23.032 --> 26:31.000] I think there's the size of the model is the equivalent of the smaller GPT 3 that [26:31.000 --> 26:39.056] populates providing through an API today. It's interesting to see the power of the open source [26:39.056 --> 26:47.072] community. One of my fundamental conviction is that on a field like an uphill or machine running [26:47.072 --> 26:54.052] in general, the worst position to be in is to compete with the whole science and open source field. [26:55.080 --> 27:01.040] Just because I've been in this position before, it's the first dollar power work for, we're doing machine running [27:01.040 --> 27:07.072] for computer vision, back in Paris, the French of this is you can hear from my accent. [27:07.072 --> 27:15.008] But so, competing against science fields and the open source field on such a fast moving topic, [27:15.008 --> 27:23.008] is a difficult position to be in because I think you have hundreds of research labs at larger [27:23.008 --> 27:32.012] organizations or at universities. Not so much can't put in each one better than what you can do at [27:32.012 --> 27:38.068] the startup. There are so many of them that when you can do just one iteration, you have [27:38.068 --> 27:47.048] a hundred of other doing one iteration too. You can outpace them and be the state of the art [27:47.048 --> 27:54.052] for a few days. Then someone will start in just a few days after you are sketching up and then you [27:54.052 --> 28:01.032] cannot conflict your head anymore. So, yeah, we've taken a very different approach by instead of [28:01.032 --> 28:07.008] trying to compete with open source and with the science field. We're trying more to [28:08.004 --> 28:15.040] empower it in a way and I think through the open source model, you can do things a bit differently. [28:15.040 --> 28:21.096] We have kept the inspiration of open source for infrastructure in that a base. Companies like [28:21.096 --> 28:28.052] an elastic MongoDB that have shown that you can as a startup and power the community in a way [28:28.052 --> 28:33.048] and create a thousand times more values and you would by building a proprietary tool. [28:34.036 --> 28:43.032] That you don't have to capture 100% of the value that you create. That you can be creating [28:43.032 --> 28:51.048] like immense value and just capturing one person a bit to monetize to make your company sustainable. [28:51.048 --> 28:59.032] That can still make a large public company, in the case of MongoDB for that, both has [28:59.032 --> 29:06.020] can like this open source core but at the same time can grow an organization and this sustainable. [29:06.020 --> 29:10.020] I don't see why it should be different for machine learning. We haven't seen a lot of [29:10.020 --> 29:16.060] large open source machine learning companies yet. For me it's more matter of how early the technology is [29:16.060 --> 29:23.080] just to early to have large open source machine learning companies because I mean five years ago [29:23.080 --> 29:29.072] nobody was using machine learning but it's going to come. I think I wouldn't be surprised in [29:29.072 --> 29:37.048] like in five ten years you'd have like a one two three four five ten massive open source machine [29:37.048 --> 29:45.064] learning companies. I guess you know you've had really front row seats to like the cutting edge of [29:45.064 --> 29:52.028] NLP over the last couple years. Do you feel like the applications have changed with these models [29:52.028 --> 29:57.008] getting more powerful and useful? Are there things you see people doing now that you wouldn't [29:57.008 --> 30:03.024] the same people doing three years ago? Yeah, honestly I think I ought to do five thousand companies [30:03.024 --> 30:10.028] that are using transformers. I mean the vast vast majority of it I mean it's hard to tell but we [30:10.028 --> 30:17.032] see a lot of them that are using transformers in production and that would say that you know most of them [30:17.032 --> 30:24.036] were using NLP in production you know five five years ago right. A lot of these are new [30:25.048 --> 30:33.072] use cases that either weren't possible before so the companies were just not doing it or really were [30:33.072 --> 30:40.068] performed by humans right. You know moderation for example is a good example of that [30:40.068 --> 30:46.052] customer support classification as I was saying you know it's replacing kind of like a very manual [30:46.052 --> 30:55.016] process. You know auto complete you know is really big in Gmail it's been like my biggest productivity [30:55.016 --> 31:01.048] and then it's been like in the past few months is using Gmail to complete basically right just [31:01.048 --> 31:07.040] half of my emails. Now most of the you know search and giant are mostly powered by [31:07.040 --> 31:14.020] NLP and by transformers models you know now is saying that most of their queries are powered by [31:14.020 --> 31:22.028] by transformers so arguably it's like the most popular consumer product out there. So yeah I think it's [31:22.028 --> 31:29.000] changed it's changing like so many products the way products are built. I'm really interesting and that's [31:29.000 --> 31:35.080] that's why also like you know seeing GPT3 kind of like promoting NLP into the starter port is super [31:35.080 --> 31:41.080] interesting because I think it's very game changer when you have companies starting building [31:42.060 --> 31:49.072] products from scratch leveraging NLP because I think you build you build like [31:49.072 --> 31:56.052] differently right when when you start kind of like building a legal you know you can think of [31:56.052 --> 32:02.052] basically every every company today and it's really fun to think what if this company [32:02.052 --> 32:11.072] started today with today's NLP capabilities and you see that you have so many ideas for them to do [32:11.072 --> 32:17.000] things differently and you take like you know do you sign right what if do you sign with kind of [32:17.000 --> 32:23.064] like an issue of documents starting today with with Vena P you think Twitter you know what I mean tell me [32:23.064 --> 32:29.000] about Daki sign because what I do is Daki sign is I get like a message and then I click sign and then [32:29.000 --> 32:34.076] I sign with things so what would be different about Daki sign if it started with like all the [32:34.076 --> 32:41.080] technology available today? I don't know it would give you like so much like an analysis of the [32:41.080 --> 32:48.012] there would be a too long deed and read for the contract. First the contract you know like [32:48.012 --> 32:55.096] instead of having to read you know five different pages five five five page long like a document you [32:55.096 --> 33:04.020] would have like an automatically generated summary of the document. I think with highlights in [33:04.020 --> 33:12.044] green or red the interesting part in the documents you know like when you see oh there's a big kind of like [33:12.044 --> 33:17.000] money money shot that's that point to define you know how much money you're gonna make. [33:17.000 --> 33:25.016] There are big big green green flashing flashing right big care for that or when when there's a small [33:25.016 --> 33:34.076] you know like a small star that says everything that we wrote before is completely not not kept like [33:34.076 --> 33:40.052] it doesn't work in that case you know like the small kind of like conditions you would like big big [33:40.052 --> 33:46.012] you know red flashing like be careful here is they're trying to they're trying to screw you here [33:47.072 --> 33:53.008] you know it looks like that okay okay that was so fun what are you talking about Twitter started [33:53.096 --> 34:02.044] it's just like now develop. So what could we do? So first first you would do the feed completely [34:02.044 --> 34:10.068] completely different right it would not show you tweets because their popular or tweets because [34:10.068 --> 34:17.048] they're you know I mean not not popular with a controversial but it would show you tweets that [34:17.048 --> 34:24.028] you know you would relate to you know and tweet that you would be interested in based based on you know what [34:24.028 --> 34:34.020] you tweeting before hopefully it would be able to you know moderate things it would be better [34:34.020 --> 34:41.024] you know avoid more biases avoid more canfl like no violence inappropriate you know racism [34:41.024 --> 34:47.096] and canfl like bad canfl like behaviors like that's where they're security I would want would have wanted [34:47.096 --> 34:52.060] obviously an edgy button but I don't know if you're now p with you help with that. [34:56.084 --> 35:02.020] like you know like this like famous thing that before canfl like ages everyone asked for [35:02.060 --> 35:12.020] a we're not as been asking for like a deed button and then I would be an upheaval but still [35:12.020 --> 35:18.052] they started today I would add that well else did you have any any idea of what they would do [35:18.052 --> 35:25.016] differently with within upheaval today well honestly I don't know how you feel about this but you know [35:25.016 --> 35:30.060] when I look at the text generation technology the NLP technology and that was the field actually [35:30.060 --> 35:38.068] started in you know 15 years ago more and I I almost feel like the thing that's intriguing is the lack [35:38.068 --> 35:45.040] of applications for how amazing the technology seems to me like I feel like you know I remember [35:45.040 --> 35:49.080] the touring test was this thing of like if you could you know convert it with like you know I [35:49.080 --> 35:53.080] figured it's actually the framing but it's like the vertical computer for like 10 minutes and you can't tell [35:53.080 --> 35:59.056] if it's a human you know maybe we have like a giant at that point and it seems like that seems [35:59.056 --> 36:06.036] impossible and now it seems like you know we're gonna seems like we'll pass it sometimes soon I mean there's [36:07.000 --> 36:13.072] variants of it but I feel like I feel more and more like it's probably appears could trick me into [36:13.072 --> 36:18.060] thinking that I'm talking to a person you know it's just you know GPG for another text generation model [36:18.060 --> 36:26.020] but I actually feel like I don't engage with totally new NLP applications yet and I kind of wonder why [36:26.020 --> 36:35.040] that is yeah I mean I I wouldn't agree with you I think I use a G is usage of it is really like everywhere [36:35.040 --> 36:43.088] right now I mean I yeah they're not a lot of products that don't start to use from some NLP [36:44.052 --> 36:52.068] right maybe it's just more subtle than yeah yeah maybe it's less it's less in your face in the sense [36:52.068 --> 37:01.080] that it hasn't been this big kind of like conversational AI interfaces that that took over in a way [37:01.080 --> 37:08.052] right for for a very long time it was kind of like a most popular and kind of like mainstream face [37:09.008 --> 37:17.056] in a way of an NLP right both think NLP for you know serial XR in a way and that's true that we haven't [37:17.056 --> 37:26.092] seen that's picking up right chat but haven't you know proved to be very very good yet and we're not [37:26.092 --> 37:33.032] there yet in the capabilities in really kind of like solving real problems but I think it [37:34.052 --> 37:41.056] became adopted in a way more still way in a way more kind of like incremental way compared to [37:41.056 --> 37:48.020] existing use cases you probably using Google every day and that's true that maybe you don't see much of the [37:48.020 --> 37:54.060] difference between the search results before and now but the reality is that you know it's the most [37:54.060 --> 37:59.072] mainstream most choose product of all time that most of the people using every day and it's powered by [37:59.072 --> 38:07.088] modern NLP yeah by transform but it's not yeah it's not as kind of like yeah maybe a word is [38:07.088 --> 38:17.032] ground breaking in terms of like experience changes as as you could have expected right I think one of the [38:17.032 --> 38:25.048] challenges of NLP is that because language has been so much of a human to pick for for so long [38:25.048 --> 38:34.076] in a way it carries all these kind of like association with AI right and kind of like a GI and [38:34.076 --> 38:41.000] kind of like almost like this this machine intelligence and obviously if you look at you know all the [38:41.000 --> 38:47.056] sci-fi with her you know you associate that little bit with NLP and that's kind of what you could [38:47.056 --> 38:55.040] have expected from NLP and the reality has been more kind of like productivity improvements behind the [38:55.040 --> 39:04.012] scene that you don't really feel or seen that much as as a user is true. Are you optimistic about [39:04.012 --> 39:14.020] China interfaces? I am I think what most of us got wrong and I mean we we started by building an AI friend [39:14.020 --> 39:19.032] or like a fun conversational AI with we're getting faced when we started digging faces that was saying [39:19.032 --> 39:23.048] we were obsessed with NLP and we were like okay what's the most challenging problem today? [39:24.044 --> 39:30.084] Open domain conversational AI building this kind of like AI that can chat about everything [39:30.084 --> 39:37.032] about the large sports game, about you know your last kind of like a relationship and really talk about [39:37.032 --> 39:41.072] everything and you're like okay that's the most difficult thing we we're going to do that and indeed [39:41.072 --> 39:47.024] these are not work out so I think what we got wrong and what most people are getting wrong is [39:47.024 --> 39:55.024] it's for really like the timing you know it in the sense that conversation and especially like [39:55.024 --> 40:02.052] open domain conversation the way we're doing it now is extremely hard it's almost kind of like the [40:02.052 --> 40:11.024] ultimate and NLP task because you need to be able to do so many NLP tasks together at the same time [40:11.024 --> 40:16.036] ranking them you know I need to be able when you're talking to me to extracting formation to [40:16.036 --> 40:22.036] understand classify your intent, classify the meaning of your sentence, understand the emotion of it [40:22.036 --> 40:28.028] right if you're doing it's changing then you know it means different things so I think we're going to [40:28.028 --> 40:35.016] get to better conversation on AI ultimately I don't know if it's in you know five years if it's in [40:35.016 --> 40:40.092] 10 years if it's longer but I think we're going to get there it's already solving some kind of like [40:40.092 --> 40:47.096] more vertical problems with sometimes customer support chat but you know I think Razah in the open [40:47.096 --> 40:55.040] source community is doing a really great job with with that so I think yeah we won't get tomorrow to the [40:55.040 --> 41:01.032] AI who you can chat with about everything and kind of like what we started making face with but [41:01.032 --> 41:07.016] ultimately I think we'll get there and that's when you know in terms of like user experience it's [41:07.016 --> 41:14.012] going to you're going to realize it's different at that at that time but it's probably going to take [41:14.012 --> 41:20.068] a much more time than what we are expecting cool well you know we always end with two questions so I'd love [41:20.068 --> 41:25.088] to get those in the last couple of minutes we have we always ask what's an underrated topic and [41:25.088 --> 41:29.088] machine learning or maybe in your case what's an underrated topic and an adult being like something [41:29.088 --> 41:34.020] that you might work on if you didn't have a day job that's a good question I mean something that [41:34.020 --> 41:43.000] I've been super excited about in the past few weeks is the field of speech so like a speech to text [41:43.000 --> 41:50.084] to speech because I feel like it's been kind of like a little bit like an RPG like a few years ago [41:50.084 --> 41:56.044] it's been kind of like a really good idea that some of kind of like a little bit boring field with [41:56.044 --> 42:02.052] not so many like people working on it and I feel like thanks to a couple of like research teams [42:02.052 --> 42:08.012] especially the team of Alex I could know at fair with a wave to Vek you're starting to see new [42:08.012 --> 42:15.056] advances I actually leveraging transform and models that are bringing kind of like new new capabilities so I'm [42:15.056 --> 42:20.028] I'm pretty excited about it I think there's going to be some sort of research on so Vek and [42:20.028 --> 42:26.004] and kind of like a leap frog in terms of quality not only in language but what's interesting is that [42:26.004 --> 42:34.020] also in other languages we hosted a couple of teach prints at a game face with over 300 participants [42:34.020 --> 42:41.096] who contributed models speech to text for almost 100 low resource languages and so it's been pretty [42:41.096 --> 42:46.068] cool to see like the response of the community so I think there's going to be a full things happening [42:46.068 --> 42:54.068] in the coming months in speech which is going to unlock like a new case is because if you think that [42:54.068 --> 43:01.024] you can combine you know speech with an RPG you can start to really will serve we were talking about [43:01.024 --> 43:07.088] what if like the product is built today you know if zoom was built today with like good speech to text and [43:07.088 --> 43:12.092] an RPG you can do pretty cool stuff too you know when I'm seeing something something like [43:12.092 --> 43:18.052] theory it should be like automatic clapping because as you like everyone can't like muting that's [43:18.052 --> 43:23.088] a problem with the current zoom is that we've everyone like muted when I say something to chair like [43:23.088 --> 43:30.060] I'm learning one sharing for you know when you say who are they should be kind of like able to show [43:30.060 --> 43:37.024] hours of like celebratory emojis or things like that so yeah I'm excited for for speech if [43:37.024 --> 43:42.060] haven't checked the fields lately you should definitely check in the our cool things happening [43:43.024 --> 43:47.096] very cool and the final question and I feel like you're in a unique place to see this is [43:47.096 --> 43:53.024] you know what's the hardest part or what's some unexpected challenges and just getting a model [43:53.024 --> 43:59.064] from kind of thinking about it to deploy it into production and I guess you have a unique point of view here [43:59.064 --> 44:05.016] where you actually you have a platform that makes it super easy are there still challenges when [44:05.016 --> 44:10.036] when folks use your stuff is there more to do or does it work out of the box there are still a lot of [44:10.036 --> 44:17.024] you know human challenges to it I think in in the sense that you know machinery model is [44:18.012 --> 44:24.020] doing different things in a different way than you know traditional software engineering and for a lot of [44:24.020 --> 44:28.084] companies is really really hard to make to make the transition for example the lack of you know [44:29.088 --> 44:38.052] explainability the fact that it's harder to you know predict the outcomes of his models and [44:38.052 --> 44:46.068] can't like tweak them in a way is still really hard to you know understand and adopt for people [44:46.068 --> 44:52.012] like spent a career in you know software engineering when you can really kind of define the outcome that [44:52.012 --> 44:57.080] that you want that you want to get so I think from what I'm seeing a lot of the time like the human [44:57.080 --> 45:04.092] and kind of like understanding of machine learning part is the most difficult thing more [45:04.092 --> 45:13.008] more than kind of like the technical aspect to it on the technical part I mean we've been excited to [45:13.008 --> 45:18.068] bring on kind of like larger and larger models which are still kind of like yeah difficult to run [45:18.068 --> 45:23.008] running production so we've been working a lot we've do with the cloud providers we we announced the [45:23.008 --> 45:28.028] strategic partnership with AWS not so long ago but we're also working every day with with [45:28.028 --> 45:34.004] blue cloud Azure and another cloud providers but yeah bringing this kind of like a [45:35.024 --> 45:41.056] large language models in production especially at scale we require a little of skills and require [45:41.056 --> 45:48.020] some some work you can you can get there like I think coin based as as a good article in a good [45:48.020 --> 45:54.084] blog post and how they can like use one of our I think it was distilled birth from from transformers [45:54.084 --> 46:02.012] on over a billion inferences now we're thinking about this taken wow but it's still challenging [46:02.012 --> 46:09.056] still requires a lot of like an infrastructure work to it awesome thanks for your time it's a real pleasure [46:09.056 --> 46:14.068] thank you thanks for your time thanks for your time thanks for listening to another episode of [46:14.068 --> 46:19.088] great descent do these interviews are a lot of fun and it's especially fun for me when I can actually [46:19.088 --> 46:25.032] hear from the people that are listening to the episode so if you wouldn't mind leaving a comment and [46:25.032 --> 46:29.040] telling me what you think or starting a conversation that would make me inspired to do more of these [46:29.040 --> 46:34.020] episodes and also if you wouldn't mind liking it subscribing I'd appreciate that a lot

52.62142

53.10917

3y ago

1m 15s

Nov 21 '22 16:52

35yo3v0r

Finished

Nov 21 '22 16:52

3011.088000

/content/peter-wang-anaconda-python-and-scientific-computing-zvmycj-b-na.mp3

tiny

[00:00.000 --> 00:06.000] We have to be faced with this concept that technology is not value neutral. [00:06.000 --> 00:09.040] And if you think about what machine learning really is, it is the application of massive [00:09.040 --> 00:13.088] amounts of compute, rent a supercomputer in the cloud, kind of massive amounts of compute, [00:13.088 --> 00:18.016] to massive amounts of data that's even deeper creepier than ever before, because if sensors [00:18.016 --> 00:22.032] everywhere, to achieve business ends into optimized business outcomes. [00:22.032 --> 00:27.016] And we know just how good business are at capturing and self-regulating about externalities [00:27.016 --> 00:28.072] to their business outcomes. [00:28.072 --> 00:32.036] So just as a human looking at this, I would say, "Wow, I've got a chance to actually [00:32.036 --> 00:34.012] speak to this practitioner crowd." [00:34.012 --> 00:38.068] About if you're doing your job well, you'll be forced to drive a lot of conversations [00:38.068 --> 00:44.044] about ethics and the practice of your thing about what you're doing within your business [00:44.044 --> 00:47.048] as it goes through this data transformation. [00:47.048 --> 00:48.080] And you should be ready for that. [00:48.080 --> 00:50.024] Still yourself for that. [00:50.024 --> 00:51.076] Don't punt. [00:51.076 --> 00:52.076] Don't punt on it. [00:52.076 --> 00:54.004] We can't afford to punt. [00:54.004 --> 00:58.036] You're listening to Grady DeSent, a show where we learned about making machine learning models [00:58.036 --> 01:01.056] work in the real world on your host Lucas B.W. [01:01.056 --> 01:06.000] Peter Wang is the co-founder, CEO, and creator of Anakanda. [01:06.000 --> 01:10.088] He's been developing commercial scientific computing and visualization software for over [01:10.088 --> 01:12.052] 15 years. [01:12.052 --> 01:16.072] He created the Pi Data Community and Comforts' and Devote's time and energy to growing the [01:16.072 --> 01:21.056] Python data science community and teaching Python the conferences around the world. [01:21.056 --> 01:23.096] Couldn't be more excited to talk to him. [01:23.096 --> 01:29.040] So maybe for starters, I know of you because of your product, condo, which I've used for many [01:29.040 --> 01:30.040] years. [01:30.040 --> 01:33.080] I've been feeling most of the people listening to this will know what condo is, but maybe [01:33.080 --> 01:37.084] could you describe it in your words just to make sure we're all in the same page here? [01:37.084 --> 01:45.088] Yeah, so condo is a package manager that we built as part of the overall Anakanda Python [01:45.088 --> 01:48.060] distribution. [01:48.060 --> 01:54.012] It started as a way just to get people package updates of the binary builds that we do and [01:54.012 --> 02:00.024] it's expanded to then manage virtual environments so we could have different versions of libraries [02:00.024 --> 02:07.012] and different, in fact, versions of Python in user land on any of the platforms we support. [02:07.012 --> 02:11.060] And then we also created a space for people to upload their own packages. [02:11.060 --> 02:15.084] That's the Anakanda.org service. [02:15.084 --> 02:20.076] Found that a community is grown up called Condo Forge where they make recipes, maintain recipes, [02:20.076 --> 02:26.088] upload packages, but lots of other people like me and video folks or like PyTorge, they will [02:26.088 --> 02:31.028] upload their own official builds into the Anakanda.org system. [02:31.028 --> 02:34.036] So we run all of that infrastructure. [02:34.036 --> 02:38.016] We pay the bills for the CDN and for all the storage and everything. [02:38.016 --> 02:44.092] Then we do have a community around the Condo Package Manager itself so people making tools [02:44.092 --> 02:46.056] and extensions for it. [02:46.056 --> 02:48.056] So that's in a nutshell what Condo is. [02:48.056 --> 02:51.020] So you think of it as like an RPM or something like that. [02:51.020 --> 02:56.084] But for primarily for data science and numerical oriented computing. [02:56.084 --> 02:58.020] What's your original background? [02:58.020 --> 03:03.028] Did you make software running a successful software company? [03:03.028 --> 03:07.040] No, so I've always been programming pretty much. [03:07.040 --> 03:09.024] I was like I think eight years old. [03:09.024 --> 03:11.060] I've been programming something. [03:11.060 --> 03:17.024] But I end up going to college for physics and so I graduate with a degree of physics and I decided [03:17.024 --> 03:23.000] to join kind of the dot com kind of boom by going and joining a startup and I've been in [03:23.000 --> 03:24.076] software ever since then. [03:24.076 --> 03:30.032] But I spend a number of years working in consulting using the scientific Python and the Python [03:30.032 --> 03:32.036] stack in the 2000s. [03:32.036 --> 03:36.032] And that's really where I started seeing the possibilities for using Python for a broader set [03:36.032 --> 03:38.004] of data analysis. [03:38.004 --> 03:42.068] So in the future, you can use cases, then just kind of niche scientific and engineering computing [03:42.068 --> 03:44.084] kinds of use cases. [03:44.084 --> 03:45.084] Cool. [03:45.084 --> 03:50.080] Can you explain to me what was going on that you started this project and like what the original [03:50.080 --> 03:52.084] conception was when you went up again? [03:52.084 --> 03:53.084] Yeah, sure. [03:53.084 --> 03:54.084] The original conception. [03:54.084 --> 03:56.080] So the company was called Continuum Analytics. [03:56.080 --> 04:01.044] I started that with my co-founder Travis Olavot who is the creator of NumPy and one of the [04:01.044 --> 04:04.044] co-founders of sci-fi. [04:04.044 --> 04:10.024] He put the company together to promote the use of Python and to advance the state of the art for [04:10.024 --> 04:14.064] Python for a broader set of data analysis needs. [04:14.064 --> 04:17.044] So that was the original vision. [04:17.044 --> 04:20.096] And at that time, this was 2012 we formed the company. [04:20.096 --> 04:29.016] Wes McKinney had just really started pushing pandas as a data frame library. [04:29.016 --> 04:34.040] The Jupiter notebook is relatively new at the time of the still called the iPad on notebook. [04:34.040 --> 04:39.012] The world was sort of a wash in Hadoop big data craze. [04:39.012 --> 04:45.036] And what we could see was that once people throw their data Hadoop, they're wanted to do [04:45.036 --> 04:46.060] bigger analyses. [04:46.060 --> 04:52.048] They wanted to do broader more cross data set, cross schema sort of analyses and they would [04:52.048 --> 04:53.048] need tools like Python. [04:53.048 --> 04:55.048] SQL wasn't going to do it for them. [04:55.048 --> 04:58.044] And so we were putting this stuff together. [04:58.044 --> 05:04.028] We were trying to find alternative map reduce frameworks that were nicer to Python than [05:04.028 --> 05:05.028] Hadoop. [05:05.028 --> 05:11.028] And the rest of the Java, the Apache Java JVM, big data stack, if you will, that the JVM world [05:11.028 --> 05:14.068] does not play with the Python C++ native world very well. [05:14.068 --> 05:20.068] So in any case, as we're looking at doing all this stuff, it became clear to me that if people [05:20.068 --> 05:25.052] couldn't install sci-fi and map plot lib and ipython, they were not going to be able to [05:25.052 --> 05:30.056] install any newfangled compiler tools we built or any newfangled map reduce framework, [05:30.056 --> 05:32.036] it was just going to be completely off-table. [05:32.036 --> 05:37.028] So we started by saying, well, we should probably produce a special collection of packages, [05:37.028 --> 05:41.012] a distribution of Python that helps people get started that includes all of the basic things [05:41.012 --> 05:44.060] they need that works on Mac Windows Linux. [05:44.060 --> 05:45.092] And so that was the basic idea. [05:45.092 --> 05:46.092] So we built an account. [05:46.092 --> 05:49.012] I came up with the name because it's Python for big data. [05:49.012 --> 05:50.052] So it's a big snake kind of. [05:50.052 --> 05:52.084] And of course, I don't like snakes that much. [05:52.084 --> 05:56.084] Python is of course named after Monte Python, but whatever, we'll know that. [05:56.084 --> 05:59.076] So that's where the name of the account came from for that product. [05:59.076 --> 06:04.040] And that just took off quite well. [06:04.040 --> 06:08.028] And so we've actually renamed the company continuum to an account because we'd be at conferences [06:08.028 --> 06:10.008] and they'd say, where are you from? [06:10.008 --> 06:11.008] Or what company are you with? [06:11.008 --> 06:12.048] And I'd say, we're with continuum. [06:12.048 --> 06:14.016] And I'd say, OK, yeah, that's nice. [06:14.016 --> 06:16.052] And we'd say, well, we make this thing called an account. [06:16.052 --> 06:18.080] And they'd say, oh, we use an account or we love an account. [06:18.080 --> 06:23.028] And so after that happens like the thousands of times, you sort of figure out you should [06:23.028 --> 06:24.060] the world's telling you something. [06:24.060 --> 06:25.060] So anyway. [06:25.060 --> 06:28.016] But anyway, that's the journey. [06:28.016 --> 06:33.040] And since then, we've continued to push like new open source tools and things like that in the [06:33.040 --> 06:35.008] Python called data stack. [06:35.008 --> 06:36.028] So it's incredible. [06:36.028 --> 06:42.000] The impact that I think you've had and certainly numpy and sci-fi in terms of just making [06:42.000 --> 06:44.028] Python a popular product. [06:44.028 --> 06:47.012] Do you ever regret choosing Python for this? [06:47.012 --> 06:48.068] Has that been a good choice for you? [06:48.068 --> 06:49.068] Oh, no, no. [06:49.068 --> 06:50.092] That was completely intentional. [06:50.092 --> 06:56.032] I mean, we think that people should understand, I think, especially as more software engineers [06:56.032 --> 06:58.080] move into ML become ML engineers, right? [06:58.080 --> 07:00.076] For them, language is just a choice. [07:00.076 --> 07:05.016] It's like, well, I'm a C-plops of plus code or now, and I learned some go, and now I'm doing [07:05.016 --> 07:06.016] Python. [07:06.016 --> 07:07.016] It's like whatever. [07:07.016 --> 07:08.080] Python's got some words, and it's got some good things. [07:08.080 --> 07:14.052] But the thing to recognize is that Travis and I, when we started this, the reason why we wanted [07:14.052 --> 07:20.068] to push Python was because of the democratization and the access, the accessibility of it. [07:20.068 --> 07:23.040] When you're software developer, you learn new languages all the time, because that's part [07:23.040 --> 07:24.040] of your gig. [07:24.040 --> 07:27.056] But if you're not a software developer, if you're a subject matter expert or domain expert in [07:27.056 --> 07:29.068] some other field, what's your geneticist? [07:29.068 --> 07:32.012] Or let's say you're a policy maker or whoever, right? [07:32.012 --> 07:34.012] You have astrophysicist. [07:34.012 --> 07:36.028] Learning a new software programming language is hard. [07:36.028 --> 07:38.008] You're not really a code or anyway. [07:38.008 --> 07:43.060] You had to learn some four-channel C++ or MATLAB in grad school, but otherwise, you're not [07:43.060 --> 07:46.048] doing this on a weekend just because you love it, right? [07:46.048 --> 07:50.092] So if you learn a language, this is going to stick with you for a while. [07:50.092 --> 07:55.024] And if we, as people who make languages or who make software tools, if we can find a language [07:55.024 --> 07:59.036] that people like to use this powerful for them and that multiple different kinds of people can [07:59.036 --> 08:01.080] use, that's incredibly powerful. [08:01.080 --> 08:07.056] So one of the things that Python is that the creator Python, Guido, before Python, he was [08:07.056 --> 08:11.032] working on a project called computer programming for everyone. [08:11.032 --> 08:17.012] And so some of the ideas that went to Python came from that precursor language called ABC. [08:17.012 --> 08:21.056] And that readability counts and that kind of executable pseudo code thing, the same things [08:21.056 --> 08:23.016] that make Python hard to optimize, right? [08:23.016 --> 08:27.052] They make a consternation for statically typed language efficient out of those. [08:27.052 --> 08:30.072] Those things also make it incredibly accessible to lots of people. [08:30.072 --> 08:35.016] And when we make these kinds of advanced tools available accessible to lots of people, what we [08:35.016 --> 08:38.060] do is we grow the universe of possible innovations. [08:38.060 --> 08:41.076] So for me, it's very intentional that we chose Python. [08:41.076 --> 08:45.080] There's, you know, 1000 new languages you could create that are better than Python and [08:45.080 --> 08:47.012] all these different dimensions. [08:47.012 --> 08:50.004] But at the end of the day, Python is kind of language everyone uses, right? [08:50.004 --> 08:53.012] It is valuable that everyone uses that same language. [08:53.012 --> 08:58.064] So I have a very, very strong opinion about the fact that it is that we should continue promoting [08:58.064 --> 09:00.024] its use and growing its use. [09:00.024 --> 09:03.084] Even as I fundamentally believe, there must be a better language out there, right? [09:03.084 --> 09:05.024] That's like the successor to it. [09:05.024 --> 09:07.024] I have some ideas about that as well. [09:07.024 --> 09:11.052] Oh, it's just saying I'd love to hear about that because we were talking about the fast AI, [09:11.052 --> 09:14.012] one of the fast AI founders, Jeremy Howard. [09:14.012 --> 09:19.072] And he's written so much Python code and he was really emphatic when I was talking to him [09:19.072 --> 09:25.068] on this same podcast about Python can't possibly be the future scientific computing. [09:25.068 --> 09:26.068] And I was kind of surprised. [09:26.068 --> 09:31.096] I would say my perspective is definitely a non-expert, but I really enjoy programming Python. [09:31.096 --> 09:35.016] And maybe it's hard for me to really see how things could be better or maybe, you know, [09:35.016 --> 09:39.036] I don't have to kind of worry about performances, such as other people. [09:39.036 --> 09:40.096] But what would your take be? [09:40.096 --> 09:46.064] Because there any kind of language less adoption that you think is really intriguing and could kind of replace [09:46.064 --> 09:49.008] Python or there are tweaks to Python that you'd like to see. [09:49.008 --> 09:53.076] How do you think about that? [09:53.076 --> 09:55.056] So yes and no. [09:55.056 --> 10:00.092] So like there are languages out there that do interesting things. [10:00.092 --> 10:05.040] Are things that Python can't quite do or that Python may never be able to do, right? [10:05.040 --> 10:10.008] So one of the fastest database systems out there is a thing called KDB. [10:10.008 --> 10:12.048] The language in it K. [10:12.048 --> 10:17.084] You're not going to find any, I mean, it comes from like the APL roots, right? [10:17.084 --> 10:22.084] Which are the precursors to like the for-trans stuff and then MATLAB and NUMPI and all these things. [10:22.084 --> 10:29.012] So in any kind of algal and modular derived kind of imperative programming language, [10:29.012 --> 10:35.028] you're not going to meet the kinds of raw numerical performance like K and KDB can achieve. [10:35.028 --> 10:42.064] And the creator of K and KDB has a new thing that he's building called shockty, which is even more [10:42.064 --> 10:43.064] interesting. [10:43.064 --> 10:45.052] So there's that kind of lineage of things, right? [10:45.052 --> 10:51.004] They're sort of like the most out there amazing bits of list plus like for-trans and you get [10:51.004 --> 10:52.004] something like that. [10:52.004 --> 10:56.072] And Python is not there, but Python has a lot of the good parts of the ideas there and expresses [10:56.072 --> 11:00.096] them in a in-fix imperative language. [11:00.096 --> 11:02.092] Then there's things like Julia that you hold for your attention. [11:02.092 --> 11:03.092] Let me show you. [11:03.092 --> 11:06.064] Let me show you what you said about K and other ones like K. [11:06.064 --> 11:11.064] What's the advantage of that they have the potential to be faster? [11:11.064 --> 11:14.044] It's more than just faster. [11:14.044 --> 11:20.004] It's a fast and correct and performant representation of your idea. [11:20.004 --> 11:24.000] But you have to sort of warp your brain a little bit to thinking in that way. [11:24.000 --> 11:29.060] So can I ever send the creator of APL, which is kind of the root of all of this stuff. [11:29.060 --> 11:34.032] If you had this idea that notation is a tool of thought. [11:34.032 --> 11:37.072] So if you really want to help people think better and faster and more correct all the same [11:37.072 --> 11:39.084] time, you need better notations. [11:39.084 --> 11:44.080] And so if you ever go and look at a bit of K, it looks different. [11:44.080 --> 11:46.008] Well, so put it that way. [11:46.008 --> 11:52.020] Then what you are mostly used to in like a non- or a Python or even a C++ or C or Java world. [11:52.020 --> 11:53.080] It's completely different. [11:53.080 --> 11:56.056] It comes with different brain space. [11:56.056 --> 11:59.024] So yeah, interesting. [11:59.024 --> 12:03.056] But there's a lot more like is that just because it's sort of following different conventions [12:03.056 --> 12:06.040] or is there something to this perspective? [12:06.040 --> 12:09.036] Because I feel like every so often not in many years but in grads, [12:09.036 --> 12:13.084] I used to occasionally run across Fortran and it would just be like, okay, I'm stopping here. [12:13.084 --> 12:15.040] Like I'm not going to go to see for this. [12:15.040 --> 12:18.024] This feels a penetraval to me. [12:18.024 --> 12:19.040] But is that my fault? [12:19.040 --> 12:23.096] There's that like, yeah, there's something there that's like better about it, I guess, [12:23.096 --> 12:25.092] in the notation. [12:25.092 --> 12:29.012] Well, better as a big word. [12:29.012 --> 12:38.012] So I'll back up and say like the difference between something like K or fourth or J, [12:38.012 --> 12:45.036] kind of like J, K, fourth, APL versus algal or Pascal C, [12:45.036 --> 12:50.016] kind of this lineage of fairly imperative procedural languages. [12:50.016 --> 12:52.000] At the end of the day, we are programming. [12:52.000 --> 12:59.036] When we write a program, we have, we're sort of meeting, we're making a balance of three things. [12:59.036 --> 13:00.036] Right? [13:00.036 --> 13:01.036] There's the expression itself. [13:01.036 --> 13:02.072] What is it we're trying to express? [13:02.072 --> 13:06.068] Like, you know, there's the data, the representation of the data. [13:06.068 --> 13:10.028] And then there's like some compute system that's able to compute on that data. [13:10.028 --> 13:14.084] And so this, I call this kind of the iron triangle of programming is that you've got expressions [13:14.084 --> 13:16.068] and express activity or speciveness. [13:16.068 --> 13:20.012] You have data, schemas, data correctness, things like that. [13:20.012 --> 13:24.088] And then you've got the compute, which is runtime, again, correctness, runtime characteristics. [13:24.088 --> 13:31.008] And every programming system sits somewhere in the middle of this, like, turnery chart. [13:31.008 --> 13:32.076] And usually you trade off. [13:32.076 --> 13:37.068] What happens is usually collapse one axis onto the other and you have a linear trade off. [13:37.068 --> 13:44.012] And most of the post-nickless worth kind of era of like looking at, okay, you've got data. [13:44.012 --> 13:48.052] You've got a virtual machine and you're going to basically load data in and do things to it [13:48.052 --> 13:51.004] with functions that proceed like this. [13:51.004 --> 13:56.028] That's a very, that model sort of everything has in their heads as a programming system, right? [13:56.028 --> 14:00.048] When you look at something like fourth or like, "K, you actually come from a different perspective." [14:00.048 --> 14:05.004] So fourth, I'll throw that in there because even when you do have an explicit data representation in mind, [14:05.004 --> 14:09.048] when you write programs in fourth or if you ever had HP calculator reverse Polish notation, [14:09.048 --> 14:15.048] throw the closest, and most people will get ever get to fourth, you're explicitly manipulating stacks. [14:15.048 --> 14:20.016] You're explicitly manipulating these things and you're writing tiny programs that can do a lot. [14:20.016 --> 14:21.028] It's amazing, right? [14:21.028 --> 14:24.044] And that's with an explicit stack and explicit these kinds of things. [14:24.044 --> 14:31.044] When you go to something like Lisp or like K, you're writing these conceptual things, the expressions. [14:31.044 --> 14:34.000] Well, in the case of Lisp, it's a conceptual algorithm. [14:34.000 --> 14:41.052] In the case of K, it's also an algorithm, but it's an algorithm on parallelizable data structures on arrays and on vectors. [14:41.052 --> 14:49.020] And then you can, a part of your first class thing that you can do is you can change the structure of those data structures. [14:49.020 --> 14:54.028] You can do fold operators, you can apply in these ways, you can broadcast and collapse and project. [14:54.028 --> 14:59.040] And all of those are first class little things you can do in line as you're trying to express something else. [14:59.040 --> 15:05.052] So you end up with a line of K that's this long that would take you this page of Java to do. [15:05.052 --> 15:16.052] And by the way, the genius of the K system is that the underlying machine that interprets that the compiler and then the interpreter system is incredibly mathematically elegant. [15:16.052 --> 15:26.052] Because there's actually fundamental algebra that you can sit in the heart of the stuff that you can then basically cable load into, I think the claim is that it loads into L1 iCash. [15:26.052 --> 15:36.052] And so you program just streams with a CPU like a mofo like you're never even hitting L2 right so that's kind of an amazing thing. [15:36.052 --> 15:43.052] And so I think when you turn around you look at something like Python which is like not that optimized at all it's like the C based virtual machine. [15:43.052 --> 15:55.052] But when you do numpy things you're expressing some of those same ideas right so yeah, I was going to say this reminds me of my experiences numpy where you know I keep kind of making it tighter and tighter and shorter and shorter and shorter. [15:55.052 --> 15:57.052] And more and more elegant. [15:57.052 --> 16:04.052] And then you need to debug it I feel like I often ended up just unpacking the whole thing again and I don't know if that's like me being stupid but that's definitely. [16:04.052 --> 16:09.052] Well, it depends on what you're debugging though right because you can make it compact and then when you debug it. [16:09.052 --> 16:15.052] It's like are you debugging miss a bug an actual bug in the in the runtime of numpy itself. [16:15.052 --> 16:20.052] Are you debugging performance mismatch with your expectation relative to how the data structures laid out in memory. [16:20.052 --> 16:29.052] Are you debugging a impedance mismatch mean you're understanding of what numpy is going to do in each of these steps versus what it's you know there's a lot of things to debug so to speak. [16:29.052 --> 16:37.052] But that's one of the downsides of making really tight numpy snippets because I did some of that back in the day and it was like all this is so great and this thing blows up but it's like. [16:37.052 --> 16:39.052] Oh crap. [16:39.052 --> 16:43.052] But wait I'd like to you off at all these tangents and I'm actually really fascinated by. [16:43.052 --> 16:51.052] So this is a conversation totally but like so you were saying so you're comparing to k which actually Jeremy Howard did did talk about it and really really praised. [16:51.052 --> 16:59.052] Great but then what what are the other kind of languages that might that have like interesting pieces that could be useful for scientific computing. [16:59.052 --> 17:03.052] Yes well I think we. [17:03.052 --> 17:12.052] There's so Jim Gray the late great Jim Gray wrote an amazing paper back in 2005 called scientific computing in the coming decade. [17:12.052 --> 17:21.052] It was prescient it was ahead of its time I think I mean it was well it was at Jim's time so he knew it but he was right this great paper and it talked about how. [17:21.052 --> 17:29.052] So many different things he talked about this paper just it's worth everyone to read it but he talked about how we need to have computational sort of notebooks. [17:29.052 --> 17:38.052] How we need to have metadata indices over large data that would have to live in data centers that we couldn't move anymore we have do computing we have to move ideas to code. [17:38.052 --> 17:46.052] Sorry move code to data move ideas to data all these different things but one of the things he explores is why don't scientists use databases. [17:46.052 --> 17:56.052] Right databases is the realm of like business apps and like oral called nerds why don't geneticists and astrophysicist use databases the closest they get is using hdf5. [17:56.052 --> 18:08.052] Right which is really just like it's a okay it's a file system great it's a terrible right it's a terrible lays on a memory so you can compute on it's that's great you can do out of core execution on it but but why don't scientists use databases more. [18:08.052 --> 18:19.052] And so he kind of looked in this a little bit more but what one of the things I think that would really move scientific computing forward is to treat the data side of the problem as being more than just fast arrays. [18:19.052 --> 18:30.052] And actually as we have more more sensor systems that have more more computational machinery to get to additional data such which then become transform to additional data sets. [18:30.052 --> 18:38.052] That entire data proven pipeline even as businesses have to re invent the enterprise data warehouse to do machine learning on other business data. [18:38.052 --> 19:03.052] And so the scientific computing has to honestly sit down and face this giant problem it's tried to ignore for very long time which is how do we actually make sense of our data not just some you know like slash home slash you know some grad students name slash temp slash project five slash whatever like we've got to actually do this for reals right so I think one of the ways to move scientific computing forward. [19:03.052 --> 19:18.052] And so the top of the side of like going to the K land and fast APL land is treating data the metadata problem and the data catalog problem and in fact the schema semantics problem as a first class problem for scientific computing. [19:18.052 --> 19:38.052] So if you look at what F sharp did with type providers and building a nice extensible catalog of schema that was actually part of your coding as you are using data sets that and they did that like ten years ago that that stuff is amazing right and that is something that we should make available that's something that would be a made that would be a game changer. [19:38.052 --> 19:47.052] I don't know if you saw this thing where the the some like internet accounts all of like geneticists they declared they would change actually gene names. [19:47.052 --> 19:56.052] Do you hear about this? They changed there were gene names they changed from March one, September one and things like that because. [19:56.052 --> 20:08.052] Power for petitions use Excel so much and when those show up and excel data excel translates them into dates and it's just them up. [20:38.052 --> 20:48.052] And then you can see that they have a lot of money and yeah, but if I if when we get to a certain point where we have the resources to invest in additional projects. [21:18.052 --> 21:26.052] So it's been using a few projects and people started to pick it up but it's something I recommend people check out in take cool alright we'll put it like to it. [21:26.052 --> 21:36.052] Do you so so can you give me some examples of who your customers are and and what's the the valid and this is like such business speak what's the value of the get. [21:36.052 --> 21:48.052] So we have a couple of different things that we sell for a while now we've been selling a enterprise machine learning platform call an account enterprise and that has. [21:48.052 --> 21:55.052] We started Kubernetes and data scientists can you know IT can stand up data scientists log into it and they have a managed governed. [21:55.052 --> 22:10.052] And then they have one click deploy for dashboards for like notebooks and things like that they can run machine learning models and you know have rest in points they deploy. [22:10.052 --> 22:25.052] And it's a it's a yeah big data science platform thing there's another thing we sell that is just the package server so a lot of the value that business is get from us is that they have a actual vendor backed. [22:55.052 --> 23:03.052] And then you use which versions of which libraries right which is a really important thing is in an enterprise environment you have data scientists who want delays greatest and bleeding edge everything. [23:03.052 --> 23:16.052] And then you've got production machines which you do not want getting the latest and great everything you wanted to know exactly which version how many CVE's which ones are patched and that's all that runs a production right so this is a package server that gives. [23:16.052 --> 23:29.052] And that gives business ability to do that so those are primarily our two commercial products and we will be coming up with some more things later in the year you know it's an individual commercial edition that individual practitioners can buy things like that so. [23:29.052 --> 23:37.052] And you've been doing this so while right like at least a decade not no not a decade as an octal decade and we start when you. [23:37.052 --> 23:53.052] So that's I guess like what you know even that is quite a long time I think you know for this space I'm curious like what like when you started what kinds of customers or like what what industries are using you the most and how has that changed over the last eight years. [23:53.052 --> 24:07.052] Yeah when we started it was very heavily in the finance so hedge funds investment banks things like that there was a heavy use of python there at the time and. [24:07.052 --> 24:15.052] We were doing a lot of consulting and training open source consulting standard sort of things like that. [24:15.052 --> 24:28.052] Unlike a lot of nowadays you see a lot of these open source venture back to open source companies that that have like a product and it's like here's our open source food bar and here's the enterprise food bar plus plus right and then they like. [24:28.052 --> 24:40.052] And then Amazon builds a build a cloud of it not their open source and the go public anyway make tons of money this is a this is a play that many companies have done especially routes on the big data infrastructure projects right it's pretty popular move. [24:40.052 --> 24:50.052] We are open source company that supports an ecosystem of innovation so there's a lot of things that are out there that we deliver and ship the an economy that we ourselves don't write. [24:50.052 --> 24:58.052] And so so that innovation space has changed and it's gotten sucked into so many different things so now we've seen. [24:58.052 --> 25:02.052] Everybody I mean insurance oil and gas logistics. [25:02.052 --> 25:11.052] DOD and 300 agencies and just like everybody is using python to do data analysis machine learning so it's just literally everywhere like sports betting sites. [25:11.052 --> 25:16.052] Now Netflix and the ubers of the world like everybody is doing this stuff. [25:16.052 --> 25:23.052] Now not all more paying us yet paying customers but that's that diversification of well. [25:23.052 --> 25:32.052] I want to say diversification but that growth and adoption was when we're hoping what we were hoping to unleash right when we started the company and so it's been really great great to see all that happening. [25:32.052 --> 25:36.052] We couldn't predict a deep learning we couldn't have predicted that machine learning what the thing to take off. [25:36.052 --> 25:43.052] We were really thinking that it would be more rapid dashboards around notebooks around building here's a data analysis. [25:43.052 --> 25:49.052] I'm a subject matter expert because I can write a little bit of python code I now can produce a much more. [25:49.052 --> 26:01.052] Meaningful rich interactive dashboard and the troll pane for my business processes or for my like whatever like heavy industrial machinery. [26:01.052 --> 26:04.052] We saw that happening pretty well in the 2000s around rich client tool set as sort of a mat lab display sir. [26:04.052 --> 26:09.052] But now with machine learning on the rise it's completely flipped python usage into a different mode that's the. [26:09.052 --> 26:18.052] As you would know what was my says like that's the dominant conversation on python but these other use cases are still there there's still a lot of people using python for all these engineering simulation things. [26:18.052 --> 26:24.052] And so anyway it's just great to see all this growth and diversification of use cases. [26:24.052 --> 26:35.052] Is is machine learning even the top use case that you see I feel like it's certainly the feels like the busiest right now but I always wonder like what's the reality of the usage volumes versus the. [26:35.052 --> 26:40.052] It's the aspiration that people get paid for. [26:40.052 --> 26:42.052] That way. [26:42.052 --> 26:53.052] I think there's a strong disconnect between older businesses I would say python is cross the chasm right so you talk about the the the the chasm technology and crossing the chasm. [26:53.052 --> 26:58.052] Python is cross the chasm on the other side of the chasm the way that this kind of innovative technology is landed. [26:58.052 --> 27:08.052] Is that you have a lot of buyers who are not as sophisticated about what is they really want to buy or what is they're buying or how ready they are as a business to adopt what they've bought. [27:08.052 --> 27:16.052] So you can buy the fanciest like Ferrari but if you have a dirt track road it's not going to go as fast as you have like an actual smooth paved road. [27:16.052 --> 27:27.052] So a lot of business have this problem where they can buy the the hottest sweetest ML gear team tooling but then their internal data is just a mishmash. [27:27.052 --> 27:40.052] And so you spend 80% of your time digging that ML team out of the data swap right so that message I think people are starting to get it now I'd say come to you know over into the the the chasm of missed. [27:40.052 --> 27:46.052] The truffle the truffle of what miss not to spare something yeah. [27:46.052 --> 27:51.052] Disagin and disillusionment that one that one that sort of right and so. [27:51.052 --> 28:01.052] But the truth is this is like there's an ML hierarchy of needs just like mass lows right and if you don't have your data stuff together if you don't understand the domain problem you're trying to solve. [28:01.052 --> 28:08.052] You have no business even doing data science on it if you haven't done data science there's no models to go and optimize with machine learning right. [28:08.052 --> 28:18.052] So but if you get all that in place then machine learning can absolutely deliver on the promise so I think people try to buy the promise but most of the people they pay are out there slug it a bunch of like trying to basically denomize data, [28:18.052 --> 28:21.052] D dupe data and just do a lot of that kind of stuff. [28:21.052 --> 28:31.052] But you actually see it like I mean most of the most of the verticals that you mentioned I think are not the the first things to come to mind here and silicon valley for ML applications. [28:31.052 --> 28:37.052] But but you actually see like insurance doing ML and thinking of it as a model just just as a specific example. [28:37.052 --> 28:47.052] Oh absolutely so the hardcore finance folks are probably the only people would say that lead silicon valley in terms of ML I mean the hedge funds were their first. [28:47.052 --> 28:57.052] Because they operate in a pure data environment and and the thing about that data environment is everyone else is operating in the same pure data environment and by the way it's all zero sum. [28:57.052 --> 29:08.052] So you like and if you screw up by a millisecond you lose millions of dollars incredible incredibly hard odds to or hard boundary conditions to be optimizing. [29:08.052 --> 29:15.052] In right and I think silicon valley there's a lot of it's a lot of consumer behavior it's a lot of like this kind of thing. [29:15.052 --> 29:23.052] Certainly anything in ad tech and kind of the attention economy the ML there is fairly low stakes right. [29:23.052 --> 29:34.052] I would say that I mean of course 100 hundreds of billions of dollars of corporate valuation hanging the balance but like if you screw a little bit of something up it's like well they'll be back tomorrow do scrolling so we'll give some better content tomorrow. [29:34.052 --> 29:46.052] But when you're in insurance and these other things the ML those models you know the kinds of diligence that a credit card company has around this models and model integrity the kinds of actual aerial work that goes into. [29:46.052 --> 29:54.052] Building models at an insurance company that's real that's like there's real hard uncertainty if you screw up that's a hundred million dollar screw up right. [29:54.052 --> 30:01.052] So there's real stuff happening there and there are no light weights on this stuff they're doing real things yeah. [30:01.052 --> 30:11.052] Cool I guess when I've talked to insurance companies it's felt like there's almost these sort of two separate teams that feel a little bit of odds with each other like they're sort of like the. [30:11.052 --> 30:24.052] Old school math guys like the actual areas who you know like what is this even doing ML forever like this is just a rebranding of this stuff we've always been doing right and then a couple guys off to the side maybe doing some crazy deep learning projects that you wonder how connected they are to the. [30:24.052 --> 30:35.052] Business like do you know that same dynamic or yeah absolutely I mean you know any any organization over like 50 people is a complex beast right so even 50 people can be pretty complex so. [30:35.052 --> 31:01.052] These these larger firms there is definitely a struggle internally as they do this data transformation into the cybernetic era is what I've been calling it cybernetic era and many of them the theory of action is still open right it's like oh we sell this particular insurance policy and we'll see what you know what comes back five years from now right and you know when we get skewed like we'll look at a five year retroactive performance and then we'll know if the model is correct. [31:01.052 --> 31:15.052] And those kind of old guard folks who are you know yeah a bunch of acturias writing a bunch of sask code that's some old school stuff and then there are new people in that space or access to the data who have the statistical background and who know they can do way better. [31:15.052 --> 31:30.052] And so there is this kind of there is a conversation happening I mean within credit card companies you'll have like they're great example right because there's like regulatory pressure there's like old school models and sass there's new people trying to do some better credit models and there's really cutting edge people doing. [31:30.052 --> 31:37.052] Real-time risk real-time fraud and all these kinds of things using deep learning sometimes using all sorts of GPU based clusters. [31:37.052 --> 31:47.052] So you just see a whole pile of different things within like a credit card company that you might not see it is still in Silicon Valley so we want to culture because there's less. [32:17.052 --> 32:29.052] Any time we make a technology choice we should be very respectful of Conway's law which is that the technology systems that we build software systems we build are reflection of the communication patterns within. [32:29.052 --> 32:46.052] The teams that built it right third time that's come up this week in interviews five really yeah but but it hits it hits the ML stuff in a different way which is that if those different teams speak different languages then you have to team if the same team speaks to different languages of two teams right. [32:46.052 --> 32:56.052] And this and we see this actually people trying to get Python into ML production where sometimes those production processes are optimized for managing a pile of Java with a bunch of maven right or. [32:56.052 --> 33:05.052] It's like you had to record out all the C++ because we only deploy TensorFlow C++ so there's this kind of thing when you have a language barrier you create. [33:05.052 --> 33:15.052] And you have to take a lot of things to do with the technology. [33:15.052 --> 33:18.052] And you have to take a lot of things to do with the technology. [33:18.052 --> 33:21.052] And you have to take a lot of things to do with the technology. [33:21.052 --> 33:27.052] And you have to take a lot of things to do with the technology. [33:27.052 --> 33:32.052] And you have to take a lot of things to do with the technology. [33:32.052 --> 33:36.052] And the question then is taking a step back if I'm the manager of this team. [33:36.052 --> 33:49.052] How how much longer do I want to only have a team that only knows how to use MATLAB or SAS when clearly all the papers at ICML whatever are being published in Python right so like. [33:49.052 --> 34:00.052] You got to sort of make that call if you're if you're the manager so I would say that the answer is yes but if you're doing that you should be aware that there's all this innovation happening in different languages. [34:00.052 --> 34:12.052] And you can't remember bring those languages into a hybrid environment if you say fine I'll hybridize I got my legacy MATLAB that's never going away because that's how it model like airflow through this turbine system I'm not going to redo all that work. [34:12.052 --> 34:27.052] But then I have to build discipline about how to hybridize how to bring these people forward so they know some Python bring the Python technology back to be able to couple with the MATLAB and see yourself as having to become an expert in doing that right. [34:27.052 --> 34:34.052] The answer is yes that that would not answer yes you absolutely make a justification for starting new projects and those things but generally. [34:34.052 --> 34:40.052] If if you're doing it in teams that already know those languages I probably wouldn't recommend it for Python team. [34:40.052 --> 34:48.052] What about okay what about are like what is that sit is that ever like a reasonable choice for a team where you have green field or or. [34:48.052 --> 34:50.052] Yeah of course or not. [34:50.052 --> 34:56.052] Of course I mean there's lots of people who do that and what would be going on that you choose to use our versus python. [34:56.052 --> 35:13.052] Well for me because I'm a python expert I would choose python so the only reason I which you have my team use are is if there's a lot of existing stuff that's an R or they're all our experts which case I'm not going to try to convert them to python I'm going to try to make the best go of R with that right. [35:13.052 --> 35:26.052] But if there are really new capabilities and things that are only available and like a python bridge to some CPU or some GPU stuff then I would encourage that I would have to hire some people who are polyglot that can build that bridge. [35:26.052 --> 35:42.052] So again it comes down to the teams although I feel like you do I don't think you really have the perspective that like kind of all languages are created equal right and of course you know we hit the real world and you know we have to choose our language maybe run what libraries available or like what's going to be maintainable but I'm curious what you kind of. [35:42.052 --> 35:55.052] Make of R I mean when I was in grad school I used all R and I absolutely loved it and then I had this experience of like you know kind of seeing pandas and numpy and just feel like this is a way better like I just I just want to switch this. [35:55.052 --> 36:03.052] Well some people take the opposite position for me on that they would say I want to R and now I can think like a status tition again and actually do my. [36:03.052 --> 36:18.052] You know express what I'm trying to get like deep the tidy verse and deep wire and these things are so nice and GG plots gorgeous and all these things that's true lot a lot of the R advocates they have they have good you know good points like there there is I would say a more. [37:48.052 --> 38:01.052] I hope I don't offend anyone that I'm in and I feel like I've never quite like I feel like numpy is improved so much and and sci-fi seems to have like so many libraries that anything that I would want is like not a. [38:31.052 --> 38:41.052] I think what happens in python there is that if one were to take a sort of a more objective look at all these is the author of like two or three different. [38:41.052 --> 38:51.052] Not not the only author but the originator let's say of two or three of the graphing libraries and python and now there's like there's several dozen right there there have been. [38:51.052 --> 38:54.052] What what we have here is. [38:54.052 --> 38:57.052] A couple of different things going on so are. [38:57.052 --> 39:03.052] Does what are does very well precisely because it was designed by user community for that community. [39:03.052 --> 39:13.052] So and also because of its sort of list be heritage it is able to do some really neat tricks by preserving kind of transformation pipeline and kind of quoting the expressions things like that. [39:13.052 --> 39:24.052] That give you some really awesome superpowers when you're building like just give you a facet of like these things and it's like it just does the right thing right obviously I mean also had a great work with gg plot too like there's nothing. [39:24.052 --> 39:27.052] Not to say there wasn't hard work involved there but. [39:27.052 --> 39:42.052] But then if you want to go and do some additional if you want to plot outside of some of the things that are that gg plot is great for then it's a more impoverished landscape let's say right don't you real time spectrograims there in r. [39:42.052 --> 39:49.052] I don't know man like that's you know or if you want to do like really large skill and active web graphics with like all this like crazy map data. [39:49.052 --> 40:02.052] I don't know that's right so the Python world has always been more multi it's just been there's a lot more Mongols across a bigger plane and so there's just many different flavors different things all over the place in that plot label is written. [40:02.052 --> 40:25.052] I and grad school trying to plot eg plots right and then he moved on to hedge fund but then he was trying to copy what he knew which is matlab and it does great for that actually if you're an engineering matlab user not Paul live works great like you just fit your brain right but most most like you know and help people were not matlab people right and then likewise if you. [40:25.052 --> 40:42.052] You know you can use tools like c-borne you know they get you kind of some of the way there but then they don't have the support from the language level to encapsulate some of the statistical transformations that would help inform something even better so it has to sort include some of those transformations within it right and fast it's names things like that. [40:42.052 --> 40:50.052] So then you go around and you look at some of the interactive plotting systems whether it's all tear whether it's like okay or any of these other things partly. [40:50.052 --> 41:05.052] Then they all are solving for kinds of different parts of the problem and to do as much of a cover of the python usage usage use cases is just bigger than any kind of project was able to do. [41:05.052 --> 41:17.052] I think there's more compact set of use cases in r and so therefore it was possible to do a more higher level of cover in a single project does that make sense totally like that's really well said. [41:17.052 --> 41:22.052] And very and very not just mentally love you. [41:22.052 --> 41:26.052] Well about big ten and content all about the big ten right. [41:26.052 --> 41:31.052] Yeah you do packages that you played favorite not. [41:31.052 --> 41:39.052] Okay, well we always end with two questions that I want to make sure I get it because I'm curious to hear your thoughts so. [41:39.052 --> 41:50.052] So one question that we always ask people and maybe I should ask this and we're expansive way we always ask people is is there a topic in ML that doesn't get as much attention as you think it should. [41:50.052 --> 42:05.052] That people should focus on more than they do and I might expand that for you into like all of scientific computing like what's what's the things that people aren't or what's one thing that you think people don't pay as much attention to as it's usefulness would suggest. [42:05.052 --> 42:07.052] I think. [42:07.052 --> 42:15.052] The top well a topic there's lots of topics my general thing stems in. [42:15.052 --> 42:19.052] I come from this place where. [42:19.052 --> 42:33.052] I feel very strongly that ML practitioners more so than just software like coder nerds are going to run into the ethical implications of their work. [42:33.052 --> 42:43.052] And even more uncomfortably they're going to be the ones forcing that conversation in businesses that for a long time maybe have not had to think about that. [42:43.052 --> 42:58.052] Because ML is about engineering the crown jewels of the business models so you're like hey we just figure out this way if by these two data sets and do this kind of model and reject these kinds of people from our user base we get this kind of lift should we do it. [42:58.052 --> 43:11.052] Well it's like heck I mean I never I'm just a VP of god knows what I didn't ask me presented this incredibly difficult trolley problem like don't look at me right I slept through that crap and college you know so I think that ML. [43:11.052 --> 43:15.052] And then any other thing right now. [43:15.052 --> 43:32.052] We have to be faced with this concept that technology is not value neutral and if you think about what machine learning really is it is the application of massive amounts of compute you know rent a super computer in the cloud kind of massive amounts of compute to massive amounts of data that's even deeper creepier than ever before because of sensors everywhere. [43:32.052 --> 43:40.052] To achieve business ends and to optimize business outcomes and we know just how good business are at capturing and self regulating about externalities right to their business outcomes. [43:40.052 --> 43:58.052] So just as a human looking at this I would say wow I've got a chance to actually speak to this practitioner crowd about if you're doing your job well you'll be forced to drive a lot of conversations about ethics and the practice of your thing about what you're doing within your business as it goes through this data transformation. [43:58.052 --> 44:06.052] And you'd be ready for that steal yourself for that don't punt don't punt on it we can't afford to punt. [44:06.052 --> 44:13.052] Besides stealing yourself for that which is probably a good verb for the. [44:13.052 --> 44:24.052] Do you have any suggestions on how someone might educate themselves in that because I think we have a lot of people listening to this that's in that situation that might be wondering where could I find more resources. [44:24.052 --> 44:30.052] Do you have any suggestions yes so I think there's there's books that are red that have been written now especially. [44:30.052 --> 44:36.052] In the era of like the the Facebook and information attention economies or dystopia stuff. [44:36.052 --> 44:43.052] There are a lot of there's books by Shoshana Zuboff, Kathy O'Neill, weapons and math destruction. [44:43.052 --> 44:49.052] There are books like even like I think Christian rudder, data clasms, right and so these other things you can look at. [44:49.052 --> 45:03.052] Are you yourself with knowledge about the anti patterns of what happens when ML blindly applied goes wrong and that at least gives you a bit of a war chest of like or a quiver things you can reach for to say what we're doing here is exactly like what happened when. [45:03.052 --> 45:06.052] I just pull one out of hat like when a well anonymized their. [45:06.052 --> 45:14.052] Their user data anonymized their user data back in the early 2000s and they did this anonymous data release and this thing happened and somebody got out. [45:14.052 --> 45:20.052] Like there's all sorts of wonderful examples you can pull from because we've actually been making a lot of mistakes. [45:20.052 --> 45:34.052] So one thing is take your time take the time to read about that stuff number two is go to an attend talks about sort of this quote unquote the soft topic about ethics in ML and fairness and. [45:34.052 --> 45:43.052] What not I know some of it may seem a bit sermony and preachy it's like hey I came here for like the hardcore cob nets I didn't come here to go listen somebody drone on about ethics. [45:43.052 --> 45:58.052] But but in every conference you go to and everything you go and do spend some time getting educated about the state of the art thinking because right now we are you know people are trying to think about preserving privacy privacy preserving encryption and some of these like these things differential privacy. [45:58.052 --> 46:12.052] And those things are coming those are going to be part of like the state of the art best practice soon you should be educated about those things and not only just do it because you have to but know why because I guarantee you when you go and scope those into your project some VPS going to come and say well can't you just get rid of that and do a faster. [46:12.052 --> 46:18.052] And you have to argue the principles of why you need to do it this way right so that would be the one thing I would say is. [46:18.052 --> 46:26.052] I don't know if it gets maybe already too much press but it probably doesn't get enough press is that the ML practitioners if they're not going to be just. [46:26.052 --> 46:43.052] Surf's in this they actually want to have agency in that conversation to hold their own ground in what we should do and not have a pile of regret down the road that now is the time to start getting educated and start asserting yourself more in those internal corporate political discussions. [46:43.052 --> 46:48.052] And we'll say. [46:48.052 --> 46:49.052] Fine, fine, a question. [46:49.052 --> 46:50.052] All right, a question. [46:50.052 --> 46:54.052] When you look at companies trying to get stuff into production. [46:54.052 --> 47:05.052] What are the what are the surprising bottlenecks that they run into like when somebody's trying to take an ML project from like you know kind of an idea to deploy it and it's working and doing something useful. [47:05.052 --> 47:09.052] Where do you see people get stuck? [47:09.052 --> 47:19.052] Well, every part of the process can be trouble some I don't know if there's a surprise there at all I guess one yeah what the surprise in me is how many. [47:19.052 --> 47:23.052] How many corporate IT places are still. [47:23.052 --> 47:26.052] Pretty backwards relative to open source. [47:26.052 --> 47:38.052] This is surprisingly me in 2000 and like 14 it's still surprising now how many places will say well we don't really either well we don't really do open source or here's the open source that we do is just these few things. [47:38.052 --> 47:49.052] And then when they say that they tried out all the tired fun arguments about how could trust this thing how we trust that the other thing is that there is still a very strong Python allergy and a lot of lack of. [47:49.052 --> 47:58.052] Lack of awareness of what Python actually isn't can do and so there are some companies that are like well this is a job of shop where this is dot net shop we really only know how to deploy these ways. [47:58.052 --> 48:04.052] You know we don't deploy Python you have to recode that because it's just a language you can recode this other thing right why wouldn't you be able to. [48:04.052 --> 48:15.052] And these IT shops that they don't understand that when you use Python you're hard to sing like you're linking into like seriously optimized low level code that a lot of seriously smart people have been doing. [48:15.052 --> 48:22.052] And there's not the equivalent over the job of space and all the data marshalling back and forth is going to cost you tremendous not performance in the job of space right. [48:22.052 --> 48:27.052] And these IT shops have not yet understood that and sadly a lot of the ML engineers. [48:27.052 --> 48:35.052] They are relatively new and they don't know how to articulate that argument they know how to sit there and talk about JVM internals and all these other bits because that's not their gig right. [48:35.052 --> 48:43.052] So I think that's been sort of the depressing it's surprising that it's still this issue because we do have companies that deploy Python. [48:43.052 --> 48:51.052] In front line production stuff to do some of these ML things and they're fine and even with that as proof points there's still kind of these industry. [48:51.052 --> 49:01.052] Oh, I'm sure. Yeah. I mean, what would you even use for like a non open source machine learning framework this shows you how sort of maybe Silicon Valley cool aid I. [49:01.052 --> 49:15.052] But no, I think I think what ends up happening to be honest they'll buy some vendor thing which still just embeds the machine learning the same open source machine learning thing it's I know I'm I kid you not like that's literally what they will do sometimes. [49:15.052 --> 49:26.052] It is to if you get into corporate IT enough like it gets pretty depressing about the kinds of like the incentives are all messed up there unfortunately which is one of the reasons why Silicon Valley does run circles around some of these other companies. [49:26.052 --> 49:31.052] Yeah, man, we should it we should end it on your ethics answer this is a surprise. [49:31.052 --> 49:35.052] I guess both are kind of worrying in different different ways. [49:35.052 --> 49:38.052] We have a workout for us. How are we gonna for us? That's for sure. [49:38.052 --> 49:43.052] Nice. That's a good way of putting it. Thanks so much. That's really fun. [49:43.052 --> 49:44.052] Yeah, thank you. [49:44.052 --> 49:55.052] When we first started making these videos we didn't know if anyone would be interested or want to see them but we made them for fun and we started off by making videos that would teach people and now we get these great interviews. [49:55.052 --> 50:02.052] We're in the street practitioners and I love making this available to the whole world so everyone can watch these things with free. [50:02.052 --> 50:09.052] The more feedback you give us the better stuff we can produce so please subscribe, leave a comment and gauge with us. We really appreciate it. [50:09.052 --> 50:19.052] [Music]

66.89784

45.01024

3y ago

Nov 21 '22 16:51

2mj2v481

Finished

Nov 21 '22 16:51

2619.168000

/content/peter-boris-fine-tuning-openai-s-gpt-3-citdnuogk48.mp3

tiny

[00:00.000 --> 00:02.000] [MUSIC] [00:02.000 --> 00:07.044] >> Every two camps of users, the researchers and the developers and developers keep telling us, [00:07.044 --> 00:11.020] hey, I just want one button, I just want the best model to come out. [00:11.020 --> 00:15.060] And then like a lot of the researchers want to come, you know, fill more with the parameters. [00:15.060 --> 00:20.040] And I think we can probably satisfy both for a long time. [00:20.040 --> 00:24.080] >> You're listening to gradient descent, I show about machine learning in the real world. [00:24.080 --> 00:26.088] And I'm your host, Lucas B. Wall. [00:26.088 --> 00:32.032] Today, I'm talking with Peter Wellender, longtime friend and currently VP of product and [00:32.032 --> 00:36.080] partnerships at OpenAI running GPT-3 and other things. [00:36.080 --> 00:42.072] And before that, research lead at OpenAI, where he was one of Wadeshenbius's very first customers [00:42.072 --> 00:46.048] and before that, head of machine learning at Dropbox. [00:46.048 --> 00:51.012] And I'm also talking with Boris Dema machine learning engineer Wadeshenbius's. [00:51.012 --> 00:56.056] And we're going to talk about GPT-3 and the recently announced integration that [00:56.056 --> 01:00.008] GPT-3 did with Wadeshenbius's. So this should be a lot of fun. [01:01.036 --> 01:06.072] So Peter, the last time we talked, I think you were working on research at OpenAI. [01:06.072 --> 01:12.088] And that's most of the time that I've known you. But now we find that your VP of products and [01:12.088 --> 01:18.000] partnerships at OpenAI are kind of curious what that means and what you're doing today today. [01:18.096 --> 01:26.000] >> Yeah, sure. What I do today today, today is quite different from when I did the research [01:26.000 --> 01:34.000] for me doing research has always been about solving the hardest problems that are out there [01:34.080 --> 01:39.044] in order to actually have some sort of impact on the world. [01:39.044 --> 01:44.096] So I'm personally much more driven by the kind of end goals of research rather than the research itself. [01:44.096 --> 01:46.080] It's like really fun to do research at the end of the year. [01:46.080 --> 01:53.036] Go down and explore things research wise, but it's always been with some goal at end of it. [01:53.036 --> 02:01.012] And one exciting thing that has happened with GPT-3, like a lot of the things that I did when I started [02:01.012 --> 02:07.092] at OpenAI, I was like, I do things on the robotic side. And with robotics, it's still, [02:07.092 --> 02:11.052] you know, there's still some gap from the stuff you can do in the lab and what you can do in the [02:11.052 --> 02:21.068] wheel world. And GPT-3, when we kind of got our first results in GPT-3, it was kind of clear that [02:21.068 --> 02:27.012] we had something that we could start applying to real-world problems rather than just cool demos. [02:27.012 --> 02:31.012] When I worked in robotics, but we got at the end was a really cool demo of robotic hands, [02:31.012 --> 02:35.004] so we were in the script, but it's not like you could start deploying this in everybody's home. [02:37.004 --> 02:40.048] Even if it's very robust enough to do that, I don't know if it's how useful I would be to [02:40.048 --> 02:45.012] kind of solve a really script, a very expensive way of doing that. But we did three, [02:45.012 --> 02:49.020] like we had a kind of language model that you can now apply to solve all kinds of different [02:49.020 --> 02:56.040] problems, like everything from translation into some relation to things like classification and question [02:56.040 --> 03:02.056] and answering, and so it was a very flexible model. So what we can have set out to do was to [03:02.056 --> 03:09.076] start just seeing if this was good enough of a model to solve we were problems. And for me, that's just [03:09.076 --> 03:17.028] a really fun area to focus on. It's like when you have this kind of really powerful new technology that [03:17.028 --> 03:25.076] has the potential of just changing a lot of things in the way they work, it's all about kind of [03:25.076 --> 03:32.072] finding the right problems to after and then seeing, you take the tools you have in your toolbox [03:32.072 --> 03:37.060] to get to solve those problems. The difference is that what I do, this research was very much [03:37.060 --> 03:44.040] coming up with the right benchmarks and the right ways to measure progress, where there was a goal [03:44.040 --> 03:48.072] that was really far out and you could have needed to come up with these kind of toy ways of evaluating [03:48.072 --> 03:55.092] progress. And now it's like customers telling us, hey, I'm trying to apply to the degree to this [03:55.092 --> 04:02.000] use case and it doesn't work, where it's too slow or something like that. And it's like those [04:02.000 --> 04:09.052] problems are much more concrete. So my day to day, it's much more on building a team that can [04:09.052 --> 04:14.056] kind of solve these weird war problems with the technology that we have developed at over the night. [04:14.056 --> 04:25.012] When you look at GPT3 versus the other approaches for large language models out there that [04:25.012 --> 04:31.028] kind of seems to be a trend, are there kind of key differences that you notice and how it works? [04:31.028 --> 04:38.040] Is the take difference somehow? Yeah, that's a good question. I think that what I really like [04:38.040 --> 04:46.048] about GPT3 and the main way in my mind, it is different is that it's just extremely simple. All that [04:46.048 --> 04:52.096] GPT3 does is GPT3 is like it's a kind of a large language model, big neural network. It's using this [04:52.096 --> 04:57.004] kind of transformer architecture that Google introduced a couple of years ago that has been a [04:57.004 --> 05:02.024] really popular and it's really powering on different language models these days and it's talking to [05:02.024 --> 05:09.052] kind of make its way into other areas like computer vision as well. But the way GPT3 is kind of set up [05:09.052 --> 05:17.004] it's very simple. It has some context which basically means it has it can look at history of [05:17.004 --> 05:22.064] texts. So maybe like if you're reading a book you can look at the page of text or the paragraph of text [05:22.064 --> 05:28.064] and then it's trying to predict the next word. And that's the way the GPT3 is trained. It's just [05:28.064 --> 05:33.036] trained on lots of texts from lots of different sources, most of them from the internet and it's just trained [05:33.036 --> 05:38.000] you can improve and over and over again, based on some words it's seen, predict the next word. You [05:38.000 --> 05:42.096] can start with only like a few words but like when we train these models today we train them on [05:42.096 --> 05:49.060] the order of like 1,000 or a few thousand words. They can look back at those 1,000 words and then [05:49.060 --> 05:54.064] try to predict the next word. So like the setup is super, super simple and you just train it on [05:54.064 --> 06:02.016] these huge data sets of texts in order to keep on predicting the next word and get really, really good [06:02.016 --> 06:08.080] at that. And I think the surprising thing with GPT3 was that if you do that and then you make the [06:08.080 --> 06:16.096] model really, really large. So it has a huge capacity of learning. Then it gets really good at a [06:16.096 --> 06:21.076] bunch of tasks for which you previously needed specialized models. Like if you want to do [06:21.076 --> 06:27.084] translation, you would need a specialized translation new and network or if you want to do some [06:27.084 --> 06:33.036] izations, similarly you would put up your network in a particular way and then train it on only [06:33.036 --> 06:38.048] summarization tasks. And what we found with GPT3 is that you actually get very close to state of the [06:38.048 --> 06:43.036] eye performance on a number of these benchmarks that measure things like some recession, translation, [06:43.036 --> 06:48.040] and stress and answering and so on. With a model has just been trained on the internet to not [06:48.040 --> 06:55.068] do any of those tasks specifically, but by just being able to reproduce a text in a similar way that it [06:55.068 --> 07:01.052] has read it. And so practically though, how do you apply it to say a translation task? [07:01.052 --> 07:06.024] Like how do you take predicting the next word and make it do a translation? Yeah, that's great [07:06.024 --> 07:11.068] question. So in a lot of those other large language models, like there are certain steps where you could [07:11.068 --> 07:17.052] you would sort of take a piece of text and you would encode it. So you would create some representation [07:17.052 --> 07:23.020] you're in a network and then you would have sort of a decoder that would take that and then kind of [07:23.020 --> 07:27.004] write some sentence. So if you did translation for example, you would encode that into some sort of [07:27.004 --> 07:31.012] representation and then you would have a separate piece of your network that took that representation [07:31.012 --> 07:36.072] and tried to output what you wanted. So an input might be like a sentence in German and output might [07:36.072 --> 07:43.044] be a sentence in English and you know it's been trained specifically for that and for GPT3 to your question [07:43.044 --> 07:49.028] then what do you do with GPT3? Like the simplest way you would do it is that you would provide a few [07:49.028 --> 07:54.064] examples of what translations might look like in just pure text. You would write German colon and some [07:55.092 --> 08:01.092] some sentence in German and then English colon and some sentence in English. Maybe you could provide [08:01.092 --> 08:08.048] only a single one, then they would serve it as one shot. You could provide a few examples of basically [08:08.048 --> 08:12.048] German colon and English colon examples and then you would put it in like the new sentence that you would [08:12.048 --> 08:16.072] want to translate. That's called few shot training where you have few examples and the model [08:16.072 --> 08:22.032] would just by looking at the pattern of what it's now seeing in its context in can predict [08:23.068 --> 08:29.020] we can produce a translation. So it's like a very simple setup. Basically the way I think about [08:29.020 --> 08:34.040] telling GPT what to do is a little bit like how you would actually tell a human to do the same thing. Like [08:34.040 --> 08:38.064] if you're writing an email, if I'm writing an email to you, say like hey look as I wanted to translate [08:39.068 --> 08:44.016] some sentences. What I would do is like hey would just ask you please translate these sentences [08:44.016 --> 08:48.072] and I would like maybe provide a few examples. Okay give you a sense of the tool like do I want [08:48.072 --> 08:54.008] more formal translation more casual translations on you would pick up on the pattern. You would [08:54.008 --> 08:59.020] give them a sentence in German if you I don't know you know, you would be able to translate it to English. [08:59.020 --> 09:03.044] And it turns out like now with our latest models like you don't actually even have to provide those [09:03.044 --> 09:08.088] examples you can often just ask the models just as you know you last human like hey translate these [09:08.088 --> 09:14.040] this sentence to me or summarizes piece of text. We just found that that's how people wanted to use [09:14.040 --> 09:18.088] the models. We can have made them work work like that but like that's how simple it is. You just [09:18.088 --> 09:22.000] kind of tell it what you want to do and it will do it's best to tell you what you want to do and it will do it's best [09:22.000 --> 09:28.064] to tell you what you want to do. So did you make a concerted effort to train the model on multiple [09:28.064 --> 09:32.088] languages? Was it most English or where did the the the the the the the the the the the the the [11:02.088 --> 11:08.064] figured out how to make these trade-offs in more optimized ways but yeah like originally we actually wanted [11:08.064 --> 11:13.076] the opposite we just wanted to be really good at English. And is it predicting words or is it predicting [11:13.076 --> 11:21.004] like one character at a time? It's yeah it's neither of those it's actually predictions on the [11:21.004 --> 11:28.032] called tokens which is like part of words is maybe the way to think about it for the most kind of [11:28.032 --> 11:36.008] common English words they are captured by single token and a token it's basically what it is is it's sort of [11:36.008 --> 11:41.068] I think you know preference that we have about 50,000 of these tokens and we map them on to kind of [11:41.068 --> 11:47.020] sequences of characters so that it ends up being like you know a common word like high or the [11:47.020 --> 11:52.096] ends up being one token but then if you have a more uncommon word like insectopede or something [11:52.096 --> 11:58.088] you're probably gonna break it up into two or three tokens so it's like word pieces that just makes it easier [11:58.088 --> 12:04.088] and more efficient for these language models to consume text. In principle you can actually do it at [12:04.088 --> 12:09.068] the character level as well. It just gets very inefficient but you know that's where the field is [12:09.068 --> 12:15.076] is probably moving especially it's gonna just do it at the character level. But I would think that might [12:15.076 --> 12:22.008] make foreign languages really hard like for example would Asian languages be impossible then if they [12:22.008 --> 12:28.056] have far more tokens or I guess maybe could argue they've sort of done the tokenization for you by having [12:29.036 --> 12:33.036] a larger number of characters that encode like a bigger chunk of meaning. [12:34.040 --> 12:39.036] Yeah it is definitely the case that the way you train your tokenizer will have an impact on like [12:39.036 --> 12:44.080] the performance of different languages so and usually those two things of training two different [12:44.080 --> 12:49.060] steps you would train your tokenizer on some purpose of data and then you would separately train [12:49.060 --> 12:56.016] your models with that tokenizer on some other datasets and in order to kind of get your [12:56.016 --> 13:00.032] your models really good at different languages you need to train the tokenizer as well over multiple [13:00.032 --> 13:08.080] languages and it's definitely it's kind of more expensive to use other languages because they end up [13:08.080 --> 13:13.092] like you know a German word just ends up being more tokens because we've trained a much less of it [13:13.092 --> 13:19.036] while like English is very efficient where a lot of words are a single token so while it's like [13:19.036 --> 13:24.064] so it makes it both a little bit worse that other languages and more expensive. I see could I [13:24.064 --> 13:30.080] translate something in a Japanese without even be possible for GPT3? Oh yeah one common time I [13:30.080 --> 13:37.052] I remember was like a Japanese users of ours they really like to use GPT3 to translate [13:37.052 --> 13:43.028] technical documentation between English and Japanese because they found a GPT3 was much better [13:43.028 --> 13:48.016] at this translation technical documentation than then Google translate this was like you know [13:48.016 --> 13:53.076] your back says possible that Google translate is better now but probably just a chance thing based on [13:53.076 --> 13:59.052] like the datasets that we had. I mean the really cool thing actually with the translation capabilities [13:59.052 --> 14:06.008] of GPT3 is that we haven't really trained the model on explicit pairs of input and output [14:06.008 --> 14:11.004] kind of translated pieces of text like what usually calling the letter to the right aligned pieces of text [14:11.004 --> 14:16.056] it's just like it's seen a lot of Japanese it's seen a lot of well not super much it's seen a [14:16.056 --> 14:24.064] bunch of Japanese but a whole ton of English and somehow you know through learning how to predict the next [14:24.064 --> 14:30.016] kind of word you know it's been a lot like enough of little pieces of text blog posts or whatever [14:30.016 --> 14:34.064] they're where like the authors switching between Japanese and English and maybe doing like some [14:34.064 --> 14:39.084] translation and some sentences where it kind of found them happening and then somehow has a representation [14:39.084 --> 14:44.056] that's good enough then to kind of generalize to arbitrary translation tasks for me that's [14:44.056 --> 14:49.052] just like kind of magical that it's just by reading lots of English texts lots of Japanese texts and then [14:49.052 --> 14:55.028] maybe like accidentally finding a few kind of aligned pairs in all of the data it's able to do that [14:55.028 --> 15:01.084] translation that's pretty crazy to me that is really amazing is it is it is it's performance kind of [15:02.048 --> 15:09.028] tangibly different than earlier versions of GPT like was there something that happened in GPT3 where [15:10.024 --> 15:15.092] open AI thought okay we can you know we can like use this for real world commercial applications was it's [15:15.092 --> 15:22.072] sort of like a performance level that it did need to get above yeah definitely I think the big difference [15:22.072 --> 15:29.084] between like GPT2 and GPT3 was really I was trained on more data and it was a bigger model like [15:29.084 --> 15:35.060] by two artists of mine too I think the original GPT2 was about 1.5 billion parameters and GPT3 [15:36.024 --> 15:41.084] the biggest model was 175 billion parameters so it went out by like two hours a minute and since it was much [15:41.084 --> 15:48.040] bigger model it also needed more data the surprising thing is that that's what it took to kind of go [15:48.040 --> 15:55.084] from feeling fairly kind of dumb to interact with like GPT2 was kind of cool but also felt kind of [15:55.084 --> 16:01.068] incredibly stupid and was sort of time and I think with GPT3 you went to being like you know sometimes [16:01.068 --> 16:07.068] just surprisingly good like talking around GPT3 does it does a lot of silliness they still but [16:08.040 --> 16:13.060] it does the right thing probably like 30 to 50% of the time on some tasks and sometimes even better [16:13.060 --> 16:18.024] and that's so it's sort of like suddenly before you would need to kind of sample and try on a task [16:18.024 --> 16:21.092] and like like maybe in the ones every kind of 20 or something you would see something like oh it's [16:21.092 --> 16:27.012] looks pretty good and with GPT3 you kind of started happening like every third time or every half [16:27.012 --> 16:32.088] high like second time or every fifth time and you like oh my god this is actually for things like [16:32.088 --> 16:38.048] summarizing text for example like it's one example we have is summarizing a piece of text in the style [16:38.048 --> 16:43.092] of a second grader and it's just like incredible how the model is able to kind of simplify words [16:43.092 --> 16:49.076] get the gist of a piece of text and so on and again it's not perfect but it's like it's just really good [16:49.076 --> 16:54.088] and you know obviously we have there's a lot of academic benchmark so you can run these models and [16:54.088 --> 17:00.096] you can see it can just get it much better on those those academic benchmarks but it was a whole different [17:00.096 --> 17:06.096] field to it when you when you wanted to prototype something you know the difference is that now it's [17:06.096 --> 17:13.044] just easy to get something that works pretty well and that's some of why we decide like hey this [17:13.044 --> 17:19.044] now it seems to be a little bit too still in scene kind of really useful to the same extent but GPT3 [17:19.044 --> 17:23.044] for all these tasks we found like okay it's close enough to kind of stay to the hour if you have [17:23.044 --> 17:29.068] like specialized model whatever you clever programmer should be able to apply it to you know whatever task they [17:29.068 --> 17:36.016] have and that was we'll be set up to validate with API what are some of the use cases that you [17:36.016 --> 17:41.084] you feel really proud of where really were so there are any that you could point us to where we could go [17:41.084 --> 17:47.052] interact with it in a commercial setting somewhere yeah sure I think some of the areas where we [17:47.052 --> 17:55.060] can have were most concerned surprised were a copywriting and question answering and generally creative writing [17:56.048 --> 18:02.016] for copywriting what happened there was that there was a number of companies that started building [18:02.016 --> 18:07.036] on top of our platform some of these companies are like think copiesmith what's one first one's [18:07.036 --> 18:14.032] copy AI there's also jargon setting reasons teaching certainly to a different name and the number of other [18:14.032 --> 18:20.016] of these companies and what they did was really clever because they realized that as I said like when you [18:20.016 --> 18:25.020] use in GPT3 to kind of do some tasks it's not perfect so every now and then you will get something that [18:25.020 --> 18:30.000] doesn't really make sense but if you're doing copywriting tasks like what if like you want to write say [18:31.004 --> 18:36.088] some engaging product description based on some attributes of a product like issue maybe like [18:37.044 --> 18:42.008] the type of so with color some other attributes of the issue and you want to kind of write something [18:42.008 --> 18:49.036] really engaging about that then the problem that you as a human face is that you get into some kind of [18:49.036 --> 18:57.036] writers plot like where do I even start and what these companies started doing is they took 23 and they [18:57.036 --> 19:02.056] used it to kind of generate a few kind of starting points or a few variations of how you could write [19:02.056 --> 19:09.020] write descriptions and then what you find is like more often than not if you generate like five of [19:09.020 --> 19:14.040] those those examples like one of them would look really good and you can kind of use that as you're starting [19:14.040 --> 19:21.052] point you maybe just take it as it is or you make some small tweaks it's a way to really almost like [19:21.052 --> 19:27.044] aid in human creativity you know and I think that's just so cool it was like writers who will tell us like [19:27.044 --> 19:32.056] hey I've been trying to write this book for like half a year now I just keep on getting stuck in [19:32.056 --> 19:37.044] writers but and then I started using your playground for GPT3 and now it took me two weeks to [19:37.044 --> 19:43.028] turn out the whole book it's sort of when you get stuck it can count create an interesting storyline when [19:43.028 --> 19:48.040] you start as a creative writer you start exploring that like yeah that's okay I wouldn't have thought [19:48.040 --> 19:53.052] of this character going down in that direction but let's explore that and then we come to much [19:53.052 --> 19:58.088] more fun engaging process so it's almost like as a human now you have like a brainstorm partner [19:58.088 --> 20:02.096] that you can apply to all these different tasks and I think what I found was really cool if you can [20:02.096 --> 20:09.004] see a number of companies kind of read a leveraging that and and and create a new new experience that [20:09.004 --> 20:14.024] just you couldn't you couldn't do before so I think that was really exciting I think question answering [20:14.024 --> 20:19.052] is also really cool but this one was like quite unexpected we I don't think we would have predicted [20:19.052 --> 20:28.040] that one being such a big use case it seems like one of the advantages of GPT3 is that it works [20:28.040 --> 20:35.028] right out of the box but I can also imagine for some teams there might be a concern about what do you do [20:35.028 --> 20:43.052] if if something goes wrong I guess I'm curious do you typically work with ML teams inside of companies [20:43.052 --> 20:48.072] or is it more like engineers that view the benefit here is that they don't have to figure out how [20:48.072 --> 20:53.052] machine learning works to kind of get the benefit of natural like if pressing or do you tend to like [20:53.052 --> 21:00.056] integrate this with ML teams it's like a kind of bigger ML workflow yeah it's a good question it's a bit [21:00.056 --> 21:08.080] of a mix I would say we've had multiple machine learning teams who kind of you know already had their [21:08.080 --> 21:14.032] own models that you know they they would have downloaded the models online so I'm and they would have [21:14.032 --> 21:20.048] like kind of adapted them for the tasks and then you know they they find our API I start doing the same [21:20.048 --> 21:26.008] thing using our API and it just turns out that you can get much better performance from our models like [21:26.008 --> 21:31.084] just because there doesn't exist there isn't like an open source version of the biggest models that we [21:31.084 --> 21:38.040] have the best models and so far a lot of tasks that's kind of what works the best but I think [21:38.040 --> 21:46.088] probably the majority of our customers are more in the other camp of just really smart developers [21:46.088 --> 21:51.012] and you know when I said developers it's kind of it's pretty broad and good like we see everything from [21:51.012 --> 21:57.076] like programmers engineers to like designers, pms, number of people like you know have told us that the [21:57.076 --> 22:02.040] opening API was sort of what got them into programming because they got really good results and just [22:02.040 --> 22:06.008] in our playground where you can interact with our models and they kind of got ideas and they start [22:06.008 --> 22:10.032] to learn how to code and they start playing with no code tools like bubble IO and stuff like that [22:11.028 --> 22:16.008] it's kind of really low at that barrier like you don't have to learn become a machine learning expert [22:16.008 --> 22:23.036] to get really good results out of these models you just kind of have to be kind of good at [22:23.036 --> 22:27.084] iterating and figure out how to write the instructions to the model is a little bit like you know [22:27.084 --> 22:33.084] sort of everybody becomes a manager you know you have to give really good instruction to your [22:33.084 --> 22:38.064] employee if you want them to do the task as you want it to be done and it's very similar with these [22:38.064 --> 22:43.020] models like if you're on a specified tasks you're going to get very high variance in their [22:43.020 --> 22:48.040] outputs but if you get really good at specifying and writing a few examples then you get really good results [22:48.040 --> 22:56.000] and that's not a machine learning skill that's like almost more of a task specification like management [22:56.000 --> 23:00.016] skill and so like I feel like a lot of people can kind of pick that up really really quickly [23:01.052 --> 23:06.040] I think that I'm going to really excited about that just seeing so many people get access [23:06.096 --> 23:12.064] to these models that just seem like you have to have a PhD machine learning to work with before [23:12.064 --> 23:20.000] I feel like I've heard of people talk about a new role called prompt engineer that might be [23:20.000 --> 23:24.072] related to this and figure out how to prompt G53 to get it to do what you want it to do [23:25.044 --> 23:32.032] so this one is interesting because like so we um birdy on when we had the first version of the API [23:32.032 --> 23:39.020] we had a really smart guy who is like word renowned author but also kind of a programmer and remain [23:40.032 --> 23:48.032] you know he was one of the early users of the API and he kind of got the internal name like the prompt [23:48.032 --> 23:53.092] whisper you know he or like degree whisper he kind of really knew how to craft the prompts to kind of get [23:53.092 --> 24:01.004] the best results and since it's been trained on the internet you kind of need to put your mind in like how [24:01.004 --> 24:05.052] would detects on the internet kind of start so if you wanted to kind of a really good recipe you have to [24:05.052 --> 24:11.004] kind of start writing in the tone of like a recipe book or a food blog post or something like that it's not like [24:11.004 --> 24:17.084] you could just ask the model to do what you wanted to do so I think initially like there was a big [24:17.084 --> 24:24.008] piece to that like you really had to be good at understanding kind of the intricacies of degree 3 and [24:24.008 --> 24:30.088] and decide really good prompts but would it pass one or half years since we launched we saw people [24:30.088 --> 24:36.088] was writing with this lot so we developed a new set of models we called the instructor QPD [24:36.088 --> 24:43.052] which actually just like last week became like the default in our API and the reason they're [24:43.052 --> 24:48.088] calling instructor QPD is because you just provide instructions so like I would say like prompt [24:48.088 --> 24:52.088] design is a limit less of a thing now like you could just tell the model what you wanted to do [24:52.088 --> 24:57.012] and provide a few examples they're still like a little thing about like the formatting might [24:57.012 --> 25:02.008] impact like how you provide your examples and so on not like you basically is like super robust to that [25:02.008 --> 25:09.012] but like sometimes it does matter a little bit some tweaking matters but I would say like it's less [25:09.012 --> 25:13.060] of a thing now than it was like a year ago and my hope is that it becomes less and less of a thing [25:13.060 --> 25:20.064] and it becomes much more almost interactive and you've also largely believed to find [25:20.064 --> 25:27.076] ten the models what's the thinking there and where's that useful. The surprising thing with Divided 3 [25:27.076 --> 25:35.004] was that you got really good results zero shot where you only provided like an example zero no [25:35.004 --> 25:40.064] example it is instructions of like hey translate this sentence from German English or you provided [25:40.064 --> 25:47.084] few shot examples where you you know we provide a few pairs of of German English and with just a few [25:47.084 --> 25:53.044] shot examples you could get like just surprising good results but what that meant in practice is that you know [25:53.044 --> 25:59.068] the accuracy is a very test dependent but like for some tasks maybe 30% of the time we got an output [25:59.068 --> 26:05.012] that was acceptable to get a bit in a product and then for other tasks that were more simple you [26:05.012 --> 26:11.004] will get it like maybe 70% of the time and so when it's like not good every time you have to be very clever [26:11.004 --> 26:15.012] in the way you can out expose it in a product and that's why like for example you weren't well for [26:15.012 --> 26:21.028] non-loaded copyrighting companies because you could just provide a few examples and you can new that [26:21.068 --> 26:28.016] at least one of them would be good and that's all the user needs but with fine tuning what you can do [26:28.016 --> 26:34.016] is basically you can customize your model so you you can provide more examples of the inputs and [26:34.016 --> 26:38.032] outputs you want to do is if you want to do account translation or if you want to say you want to [26:38.032 --> 26:43.044] kind of summarize articles you can provide like a few hundred examples of articles that have [26:43.044 --> 26:49.052] then humor written summaries and you can actually update GP3 to do much better at that task. [26:50.008 --> 26:54.088] You couldn't put all those examples in your product the prompt has like limited space but like with [26:54.088 --> 27:00.072] fine tuning you're like working these examples into the connections of these new and networking [27:00.072 --> 27:06.064] into the ways of the new and network and so in some way you have like an infinite prompt like you [27:06.064 --> 27:10.040] just you can provide as many examples you want to say you know the more examples the longer we take [27:10.040 --> 27:17.060] to fine tune and the more costly will be but fine tuning is pretty easy that concept of taking a bunch of [27:17.060 --> 27:21.052] input and output examples and kind of working them into the model and getting kind of a new version [27:21.052 --> 27:27.036] of the model out is really good at that task or with you find the examples. It turns out like with only [27:27.036 --> 27:32.056] like a few hundred examples or like around a hundred examples you can kind of get significant boosts [27:32.056 --> 27:38.032] in accuracy so we have a number of customers that have used it like cheaper tax they're doing these [27:38.088 --> 27:44.096] like analyzing transactions to find these tax write-offs and stuff like that and so what they're doing is [27:44.096 --> 27:49.076] like they're extracting the relevant pieces of text they're classifying and so they find two models [27:49.076 --> 27:53.068] and got much much better results with fine tune models for example and we've seen that over and over again [27:53.068 --> 27:57.076] with the number with our customers they can get really good results that kind of often be good enough [27:57.076 --> 28:02.032] for a prototype but then in order to get it kind of high enough accuracy to put in production which [28:02.032 --> 28:10.040] used like more than 90% or 95 or 99% fine tuning on some data sets that they have or they put together [28:10.040 --> 28:17.060] kind of get some all the way so that's that kind of enabled many more applications then you could do [28:17.060 --> 28:23.004] before so we just made it very simple to do this kind of fine tuning. Cool and you know I have to [28:23.004 --> 28:28.096] ask you about the weights and biases integration I mean we're so excited about it I don't know if people [28:28.096 --> 28:33.084] listening would know that you use weights and biases from the very early days and provided it's on [28:33.084 --> 28:39.044] of incredibly useful feedback that's in the product but I was curious how you thought about how that [28:39.044 --> 28:47.012] integration might be useful for users of GPT3. So I think this is the background of my [28:47.012 --> 28:52.016] when you use it with devices like I was like one of the first users and it's just like it just [28:52.016 --> 28:57.092] improved my research workflow so much that I'm a big kind of weights and biases spokesperson [28:57.092 --> 29:03.084] like I just like it's basically what it does right is that it allows you to kind of track your experiments [29:03.084 --> 29:09.012] in a really lightweight way. I see training your models you can get all the stats you know anybody [29:09.012 --> 29:13.036] is country making learning models and also that you kind of have to you have to look at a bunch of [29:13.036 --> 29:18.072] curves as you're doing your training to make sure that the models are learning in the way that you want [29:18.072 --> 29:25.028] and a lot of like the work you do as machine learning engineers kind of do that sort of iteration on [29:25.028 --> 29:30.008] your models and seeing if you can improve your results and a lot of that is looking at those learning graphs [29:30.008 --> 29:34.064] and so on and it's really good because like weights and bias provides you with this kind of history of [29:34.064 --> 29:39.020] the experiments you run basically to compare experiments and that you can't track your progress and share [29:39.020 --> 29:45.068] it really your humans and so on and what we did is basically making innovation so that as you find [29:45.068 --> 29:51.068] tuning your models your GPT models will be our API all your experiments all your training runs show up [29:51.068 --> 29:57.068] in the weights and biases interface so you get that same convenience but now for things that are training [29:57.068 --> 30:03.068] non-clusters and so on so you can see as our fine tuning process is happening as like you know [30:03.068 --> 30:08.024] the model updating its weights based on each new iteration going through the data so you can [30:08.024 --> 30:13.084] kind of see your metrics and so on improve and you can also kind of you know we might a number of [30:13.084 --> 30:17.068] different parameters so let's you kind of iterate and try out different parameters and so on and see [30:17.068 --> 30:24.000] your progress so yeah it's just much more delightful to train your models that way to kind of have that [30:24.000 --> 30:30.056] place where you can go and look at your results in an ongoing way so that's what's a super exciting [30:30.056 --> 30:36.000] integration for us that's you kind of keep track of all your fine tunes in much better way then we have [30:36.000 --> 30:40.032] like you come online interface it's not at all as pretty as the weight and bias as we are tracking things [30:40.032 --> 30:46.080] so. Doris you actually said you did the integration and he said it was one one lines is that right [30:46.080 --> 30:51.012] I mean my my question for you is more you know how you thought about how it might be used but [30:51.012 --> 30:56.080] I'm curious was it really one line integration. I mean there's a few more in the code but the [30:56.080 --> 31:02.080] way for the user is just to type a line to type like open AI one DB sync and you can automatically [31:02.080 --> 31:08.080] sync all these runs to a dashboard. The idea was that there's a lot of people who use the API that [31:08.080 --> 31:14.008] are not ML engineers so you don't want them to have to learn like what what am I supposed to [31:14.008 --> 31:20.016] log or how do I take off that asset and the open AI API was like so convenient when you want to train [31:20.016 --> 31:25.060] a model you just pass a file that is your data set and it cleans up the data set and then you pass [31:25.060 --> 31:31.020] a new command and it fine tunes everything so it was a bit the idea of keeping the sync simplicity [31:31.020 --> 31:36.016] so you will just type that one command and then you know or the magic happens behind the scene and [31:36.016 --> 31:41.004] you have all your visuals and you can compare your models and see like is it worth giving more [31:41.004 --> 31:46.008] training samples how much did my model improve from that what is the effect of tweaking that little [31:46.008 --> 31:51.076] parameter here and what data set did I have when I trained that model so it's trying to make it as [31:51.076 --> 31:56.040] as possible for users to benefit from all the features when they don't necessarily know why it's [31:56.040 --> 32:01.092] on the basis initially. And I guess for both of you what are the parameters that you can actually [32:01.092 --> 32:07.012] tweak because the way of describe it it sounds to me like there might not be any parameters how [32:07.012 --> 32:12.040] how the parameters get involved here. So before I answer a question one thing that the voice that [32:12.040 --> 32:17.028] really stands out to me by the way why I think where I really like this integration generally was that [32:17.028 --> 32:21.084] there is this concept of just making these kind of advanced things very simple and I think [32:22.072 --> 32:27.068] I still remember when you know because you shown and Chris kind of did the first with [32:27.068 --> 32:34.000] biases demo and was basically just like import one be and and like you know it was like to kind of just [32:34.000 --> 32:39.092] start logging an experiment. I think like that philosophy of just making it super simple to get going is [32:39.092 --> 32:44.064] something we have tried to also doing in our API where it's like you know import of an AI and then like [32:44.064 --> 32:50.016] the single API calls again I a Python or JavaScript kind of gets you to use D53 and start creating [32:50.016 --> 32:55.028] kind of completion and stuff. I really like that kind of that simplicity and that's what we try to do [32:55.028 --> 33:02.064] within this integration. But two questions about kind of parameters we've tried to make this quite simple [33:02.064 --> 33:09.092] in our API which we try to kind of make the defaults very very good and generally you can get really [33:09.092 --> 33:15.020] good results with finding it without filling much with our parameters at all but some it can [33:15.020 --> 33:21.004] make more different. Like you can set for example the learning rate that's how much you updating [33:21.004 --> 33:27.020] the weights with each learning step. You can set things like how many passes you want to go through [33:27.020 --> 33:31.092] the data it turns out if you go through the data too many times then you're going to overfit on [33:31.092 --> 33:37.044] your data sets. So these degree models being really big you often only need like on the order of like [33:38.016 --> 33:42.016] two to five iterations through your data to get really good results and if you go further than that [33:42.016 --> 33:47.092] like you sometimes overfit and you know they're doing more advanced parameters as well but like I cannot [33:47.092 --> 33:52.008] be like playing a bit with the number of epochs you want to train it for and they'll learn it [33:52.008 --> 33:57.076] rate that gets you 9% of the way there and if you start filling with other parameters you know [33:57.076 --> 34:02.096] it's not going to give you that much more. Was part of the thinking of leaving the parameters in to just [34:03.068 --> 34:12.048] give the person tweak it get the joy of messing with parameters? I think honestly you know I would [34:12.048 --> 34:19.028] love it if it was completely automatic that said we do have a number of more research oriented customers [34:19.028 --> 34:23.084] who really do like the fit links so I think it would be hard for us to remove it but like you know [34:23.084 --> 34:28.064] as it's like we have these kind of two camps of users the researchers and the developers and developers [34:28.064 --> 34:33.084] are telling us like hey I just want one button I just want like the best model to come out and then [34:33.084 --> 34:39.052] like a lot of the researchers want to kind of you know fill more with the parameters and I think I think [34:40.016 --> 34:45.076] we can probably satisfy both for a long time. Well I certainly wish category you put yourself in [34:45.076 --> 34:51.004] because you make some amazing beautiful demos and you also I know that you love to tweak [34:51.004 --> 34:57.068] parameters. I'm curious you're experienced playing with the GPT3 model. I definitely like having the [34:57.068 --> 35:01.076] good default because initially you don't really know what you should change on and let's say you [35:01.076 --> 35:07.028] choose the wrong parameter and nothing works between the nice experience so I like that if you don't [35:07.028 --> 35:13.052] choose anything it's already going to be pretty good then I really like to tweak the parameters to see [35:13.052 --> 35:17.092] okay what would be the effect and try to play with intuition and in addition to the parameters [35:17.092 --> 35:23.068] that Peter mentioned there's two that interest me a lot too. You can decide which model you find [35:23.068 --> 35:29.092] you. So there's a model of different size and like if you use a larger model maybe your API is going [35:29.092 --> 35:36.056] to be a bit slower but your accuracy will be better and maybe sometimes you don't need it maybe [35:36.056 --> 35:42.000] sometimes indeed so I like to see the effect of which model I use and I like to also see the effect [35:42.000 --> 35:50.024] of how many training samples can I give like if I give 20 samples versus giving 100 or 200 because then [35:50.024 --> 35:56.056] it gives you an idea and how much my model is going to be better as I develop a larger dataset. So [35:56.056 --> 36:01.052] those are kind of parameters like to play the way and see what other predictions based on this. [36:02.080 --> 36:06.096] Yeah that last one I think is like it's actually super important. I think it's like one of the most [36:06.096 --> 36:10.088] kind of common advice we can give people over and over again is like start with a small set of [36:10.088 --> 36:15.020] examples then double it and see how much of a different like improvement you get. Usually you know if you [36:15.020 --> 36:20.072] double them out you're amount of training data then you get to get see some linear improvement in your [36:20.072 --> 36:26.008] error rate. So if you have like 10% error rate or something then you double your training data you're [36:26.008 --> 36:30.096] going to get down to maybe like 8% error rate and then you double it again you get down to 6% error rates. [36:30.096 --> 36:35.076] And so like if you can start seeing that trend then you can suddenly get a sense of like how much [36:35.076 --> 36:42.024] would it actually cost me in terms of like labeling more data and so on to get the result that I want and so on. [36:42.024 --> 36:48.008] So it's like it's a very powerful thing to do. Other results of training these models [36:48.008 --> 36:54.072] reproducible I can't much variability is their each time you you find unit. Would you get the same [36:54.072 --> 37:02.024] model if you find two and the same data two different times? I think so in principle you can set it up to [37:02.024 --> 37:08.096] be quite reproducible like if you basically train it on the same data like like you keep like we see [37:08.096 --> 37:13.092] what do you want to do when you train is on each iteration train iteration you have a batch of data [37:13.092 --> 37:19.004] like a number of examples you can actually not if I can set the batch size how many samples per [37:19.004 --> 37:24.008] update you want and I think like it's supposed to 32 or something like that and when you do that you [37:24.008 --> 37:28.032] also want to shuffle the data so you want to take a random sample of your training data. As far as you [37:28.032 --> 37:33.052] kind of keep those randomizations consistent between your training runs you're essentially going to get the same [37:33.052 --> 37:39.036] model at the end of it it's going to be fairly reproducible the only caveat is that in practice like we're [37:40.048 --> 37:45.052] you know this and this is true even for inference like we have a problem to call temperature where you can [37:45.052 --> 37:52.024] set the variability in the output higher temperature to more variability and even if you put a zero [37:52.024 --> 37:56.080] there's no really guarantee that you're going to get completely deterministic output because [37:56.080 --> 38:03.012] there's enough noise and a little weirdness with floating point arithmetic and so on and these two [38:03.012 --> 38:10.000] views with these really big models that it's it's very hard to guarantee completes the determinism [38:10.000 --> 38:14.080] and so we we get people asking about that a lot and the answer is always like well [38:15.068 --> 38:19.036] unfortunately we can't provide that but you can get something that's fairly ghosted but you should [38:19.036 --> 38:23.084] just make your experiment robust enough but you don't really care too much about the determinism [38:24.088 --> 38:32.048] I would think operationally having everyone have their own fine tuned model would be much more of an [38:32.048 --> 38:38.096] infrastructure challenge than everybody using the API that hits the same model has that been a [38:39.052 --> 38:45.084] big undertaking to allow that to happen like do you have to like swap in and out the different models [38:45.084 --> 38:52.080] as people start to use them yeah not for sure like this is when we started out the way we did [38:52.080 --> 38:59.020] fine team was basically you know it's only you almost can rent it a set of GPUs where the models ran on [38:59.020 --> 39:04.048] and even like for some of the absolutely earliest fine tuned in customer's essentially [39:04.048 --> 39:09.060] sharshan by GPU hour to some extent like per hour how much they were using the models and you know [39:09.060 --> 39:14.024] even from the very beginning like I think like sitting six months at a large in API we had a few select [39:14.024 --> 39:18.040] customer to have fine tuned models and stuff like that and that sort of way worked and the [39:18.040 --> 39:23.084] problem with that is like if you're like trying something new GPUs hours are expensive so you don't want [39:23.084 --> 39:31.028] to really pay for to reserve a GPU for like even fraction of an hour just adds up really really quickly [39:32.064 --> 39:38.032] so we just set the goal of saying like well as soon as you have fine tuned your model you should [39:38.032 --> 39:43.092] immediately be able to just use that model and you should just have to pay for basically the tokens [39:43.092 --> 39:49.012] that go go into it at the impressed time like whatever you put in your prompt and so that was definitely a [39:49.012 --> 39:54.088] huge engineering challenge you can make that experience really great like you just keep up your fine [39:54.088 --> 39:59.052] tuned when it's done you know get to fine tuned model may melt and now you can use that model in the [39:59.052 --> 40:03.076] API to just get a result immediately and you're not going to be charged by hour whatever you're just [40:03.076 --> 40:07.028] going to be charged the same way you're going to be charged by the guy and so that was really tricky [40:08.032 --> 40:13.044] we have like an amazing engineering team at OpenAI has kind of redefined out you know a lot [40:13.044 --> 40:18.096] of tricks around balancing where these models end up and caching them in the right way and so I'm [40:18.096 --> 40:24.048] to create great great experience around that. I'm curious if you you've found tuned the [40:24.048 --> 40:30.032] entire model or you've found tuned just bought a bit to make it more efficient. Yeah you can [40:30.032 --> 40:34.088] imagine like there's just lots of tricks that we're using to make this happen but we're like we're [40:34.088 --> 40:40.016] constantly trying to figure out new ways of doing it where like there are challenges with if you [40:40.016 --> 40:45.044] want to find tuned a whole 75 billion parameter model you can get really expensive and hard and so on [40:45.044 --> 40:51.076] then there tricks you can do to you know make it much faster. Do you feel like the thing [40:53.004 --> 41:00.048] between you and and everyone using GPT-3 for natural language tasks is more quality and performance [41:00.048 --> 41:07.092] of the model itself or is it something else is it something about like integration or monitoring [41:07.092 --> 41:18.088] a production or something like that. I think definitely the key things we focused on when we built the [41:18.088 --> 41:26.008] API was you know what matters the most is really the capability of the models and then like number [41:26.008 --> 41:30.096] two it's like you need to have fast inference like before we create our API for large [41:30.096 --> 41:34.072] language models nobody cared about inference like everybody cared just how quickly you can train [41:34.072 --> 41:41.028] them because that's what mattered you know so you can get your benchmarks result at end of the day so we did [41:41.028 --> 41:47.060] just a ton of engineering to make inference super fast. I remember like over the course of the first few [41:47.060 --> 41:53.084] months of us getting the first prototype of the API to customers starting to use it we increased the [41:53.084 --> 41:58.056] inference be like 200fold or something like that it was like lots of effort that was done to make that [41:58.056 --> 42:02.080] super fast and then the third thing is like things around safety or into things like [42:03.092 --> 42:09.092] one of the reasons we're in strategy in strategy models is that we saw that sometimes you can get [42:09.092 --> 42:16.056] surprising outputs of models that you don't expect like for example you might write a very innocent [42:16.056 --> 42:22.040] sentence and you might turn very dark for some reason you might get some kind of more biased outputs [42:22.040 --> 42:31.060] in different ways with our instructs or into models by default they behave in a much more expected way [42:31.060 --> 42:36.080] but you can also specify the behavior in a much better way so I think as I was like when you know [42:36.080 --> 42:41.084] safety and capability kind of comes hand in hand it's like just becomes better products when you can [42:41.084 --> 42:48.024] control it's better those are definitely the things we have focused on I think we are I think we're [42:48.024 --> 42:55.028] doing much better on than alternatives that are out there but there's also the third thing that we have [42:55.028 --> 43:00.072] kind of put on focus on is it's just like making it really simple to use the fact that you don't have to load [43:00.072 --> 43:06.064] up models that you can just call a fine tune model that is just a single line of Python to you can [43:06.064 --> 43:13.028] call the API like that's also been really central to us it's like just we want this to be easy to use [43:13.028 --> 43:18.016] by everyone awesome well thank you very much it's really nice to talk to you and congratulations [43:18.016 --> 43:24.072] on making such a successful product thank you if you're enjoying these interviews and you want to learn more [43:25.036 --> 43:31.028] please click on the link to the show notes in the description where you can find links to all the papers [43:31.028 --> 43:36.096] that are mentioned supplemental material and a transcription that we work really hard to produce so check it out [43:36.096 --> 43:46.096] [Music]

53.19625

49.23595

3y ago

1m 6s

Nov 21 '22 16:50

6kxx45yu

Finished

Nov 21 '22 16:50

3069.288000

/content/jeremy-howard-the-story-of-fast.ai-and-why-python-is-not-the-future-of-ml-t2v2kf2gnni.mp3

tiny

[00:00.000 --> 00:05.036] [MUSIC] [00:05.036 --> 00:09.036] You're listening to Grady DeSent, a show where we learn about making machine learning [00:09.036 --> 00:10.096] models work in the real world. [00:10.096 --> 00:13.004] I'm your host, Lucas B. Well. [00:13.004 --> 00:19.028] Jeremy Howard created the Fast AI course, which is maybe the most popular course to learn [00:19.028 --> 00:21.084] machine learning and there are a lot out there. [00:21.084 --> 00:26.008] He's also the author of the book Deep Learning for Coters with Fast AI and PyTorch, [00:26.008 --> 00:31.068] and in that process, he made the Fast AI library which lots of people use independently [00:31.068 --> 00:33.092] to write deep learning code. [00:33.092 --> 00:39.028] Before that, he was the CEO and co-founder of Enlidic, an exciting startup that applies [00:39.028 --> 00:41.076] deep learning to health care applications. [00:41.076 --> 00:46.096] And before that, he was the president of Kaggle, one of the most exciting, earliest machine learning [00:46.096 --> 00:47.076] companies. [00:47.076 --> 00:49.044] I'm super excited to talk to him. [00:49.044 --> 00:52.088] So, Jeremy's nice to talk to you and in preparing the questions, I kind of realized that [00:54.008 --> 00:58.048] every time I've talked to you, there have been a few gems that I've remembered that I would never [00:58.048 --> 00:59.052] think to ask about. [00:59.052 --> 01:02.072] Like, one time you told me about how you learned Chinese and another time you gave me [01:03.036 --> 01:08.056] dad parenting advice, like very specific advice that's been actually super helpful. [01:08.056 --> 01:10.000] So, it's kind of funny, putting it in. [01:10.000 --> 01:10.064] [LAUGHTER] [01:10.064 --> 01:13.068] I tell me what dad parenting advice worked out. [01:13.068 --> 01:20.016] Well, what you told me was when you changed diapers, these are blow dryer to change a really frustrating [01:20.016 --> 01:20.080] experience. [01:20.080 --> 01:24.000] So, I really enjoy full experience and it's like such good advice. [01:24.000 --> 01:27.028] I don't know how you, I guess I can imagine how you thought of it, but it's... [01:27.028 --> 01:27.092] Yeah. [01:27.092 --> 01:30.040] Yeah, no, they love the washing sound, they love the warmth. [01:30.040 --> 01:34.040] I'm kind of obsessed about dad things, so I'm always happy to talk about dad things. [01:34.040 --> 01:35.020] That is this podcast. [01:35.020 --> 01:36.016] Can we start with that? [01:36.016 --> 01:39.060] Now that my dad is able to solve the ability, any suggestions for his series? [01:39.060 --> 01:41.060] Oh, I'm going to say one so soon. [01:41.060 --> 01:44.040] You know, it's like the same with any kind of learning. [01:44.040 --> 01:45.084] It's all about consistency. [01:45.084 --> 01:50.016] So, I think the main thing we did right with Claire, because just, you know, this delightful [01:50.016 --> 01:52.048] child now is we were just super consistent. [01:52.048 --> 01:57.076] Like if we said, like, you can't have X and what's you do, Y, we would never do it. [01:57.076 --> 01:59.036] You know, give her X if you didn't do Y. [01:59.036 --> 02:02.096] And if we like, if you want to take your scooter down to the butt of the road, [02:02.096 --> 02:04.072] do you have to carry it back up again? [02:04.072 --> 02:09.076] We read this great books that are saying, like, if you're not consistent, it becomes like [02:09.076 --> 02:11.084] this thing like, it's like a gambler. [02:11.084 --> 02:16.040] It's like sometimes you get the thing you want, so you just have to keep trying. [02:16.040 --> 02:21.012] So that's by number one piece of advice. It's the same with like teaching machine learning. [02:21.012 --> 02:25.052] We always tell people that tenacity is the most important thing for students, [02:25.052 --> 02:28.040] like the stick with it, do it every day. [02:28.040 --> 02:33.036] I guess just in the spirit of questions, I'm genuinely curious about, you know, you've built this [02:34.040 --> 02:38.088] kind of amazing framework and sort of teaching thing that I think is maybe the most popular [02:38.088 --> 02:42.048] and most appreciated framework. I was wondering if you could, you could start by telling [02:42.048 --> 02:47.020] me the story of what inspired you to do that and what was the kind of journey to making, you know, [02:47.020 --> 02:50.096] fast AI, the curriculum and fast AI, the ML framework. [02:50.096 --> 02:56.040] So it was something that my wife Rachel and I started together. [02:58.008 --> 03:02.064] And so Rachel has a math PhD, super technical background, [03:04.064 --> 03:09.020] early data scientists and engineer at Uber. I don't, you know, I have a [03:10.000 --> 03:15.068] just scraped by a philosophy undergrad and have no technical background. But, you know, from both of [03:15.068 --> 03:21.068] our different directions, we both had this frustration that like, neural networks in 2012, [03:22.064 --> 03:27.060] super important, clearly going to change the world, but super inaccessible. [03:28.016 --> 03:31.052] And you know, so we would go to meetups and try to figure out, like, how do we, [03:32.056 --> 03:36.096] like, I do the basic idea. I'd coded neural networks 20 years ago, but, [03:36.096 --> 03:43.052] how do you make them really good? There wasn't any kind of open source software at the time [03:43.052 --> 03:47.084] for running on GPUs, you know, Dan Ciererson and Yurkin Schmidt-Hodev is thing was available, [03:47.084 --> 03:52.056] but you have to pay for it. There was no source code. And we just thought, oh, we've got to change [03:52.056 --> 04:01.004] this because the history of technology, that leaps has been that it generally increases in [04:01.004 --> 04:07.052] equality because the people with resources can access the new technology and then that leads to kind of [04:07.052 --> 04:13.084] societal upheaval and a lot of happiness. So we thought, well, we should just do what we can. So we thought, [04:15.012 --> 04:22.008] how are we going to fix this? And so basically the goal was, and still is, be able to use [04:22.008 --> 04:27.044] deep learning without requiring any code so that, you know, because the vast majority of the world [04:27.044 --> 04:33.028] can't code, we kind of thought, what to get there, we should first of all see like what what exists right now, [04:34.080 --> 04:40.008] learn how to use it as best as we can ourselves, teach people how to best use it as we can, [04:40.008 --> 04:44.088] and then make it better, which requires doing research and then turning that into software and then [04:44.088 --> 04:50.088] changing the course to teach the hopefully slightly easier version and repeat that again and again [04:50.088 --> 04:59.012] for a few years. And so that's what kind of in that process. That's interesting. Do you worry that [05:00.008 --> 05:04.032] this stuff you're teaching, you're sort of trying to make it obsolete, right? Because you're trying [05:04.032 --> 05:08.032] to build higher level abstractions. I think one of the things that people really appreciate about [05:08.032 --> 05:14.008] your courses, the sort of really clear in depth explanations of how these things work. Do you think [05:14.008 --> 05:18.096] that that's eventually going to be not necessary? How do you think about that? Yeah, just some extent. [05:18.096 --> 05:28.072] I mean, so if you look at the the new book and the new course, the chapter one starts with like [05:28.072 --> 05:34.080] really really foundational stuff around like what is a machine learning algorithm? What do we [05:34.080 --> 05:39.036] mean to learn an algorithm? What's the difference between traditional programming and machine learning [05:39.036 --> 05:46.096] to solve the same problem? And those kinds of basic foundations, I think, are always [05:46.096 --> 05:52.008] be useful even at the point you're not using any code. I feel like even right now if somebody's [05:53.004 --> 05:59.020] using like platform AI or some kind of code free framework, you still need to understand these basics [05:59.020 --> 06:05.092] of like okay. An algorithm can only learn based on the data you provide, you know, it's generally [06:05.092 --> 06:13.004] not going to be able to extrapolate to patterns. It's not seen yet. Stuff like that. But yeah, [06:13.004 --> 06:19.084] I mean, we have so far released two new courses every year. You know, a part one and a part two every [06:19.084 --> 06:25.076] year because every year it's totally out of date. And we always say to our students at the start of part one [06:25.076 --> 06:31.084] look, you know, none of the details you're learning are going to be even a use in a year or two's time. [06:32.064 --> 06:38.008] There's a good, you know, when we're doing the auto and then tens of flow and caras, [06:38.008 --> 06:43.060] you know, and then playing pie torture or say look, there were too much about the software we're using [06:43.060 --> 06:51.028] because none of it's still any good, you know, it's goal changing rapidly, you know, faster than [06:51.028 --> 06:59.092] JavaScript frameworks. But the concepts are important and yeah, you can pick up a new library and [07:00.064 --> 07:09.052] I don't know a wake, I guess. Do you, it seems like you've thought pretty deeply about learning both, [07:09.052 --> 07:15.004] you know, human learning and machine learning and you had, had you a racial had practice [07:15.044 --> 07:21.012] teaching before? Was this kind of your first teaching experience? You know, I've actually had a lot of [07:21.012 --> 07:28.024] practice teaching of this kind, but in this really informal way, partly it's because I don't have [07:28.024 --> 07:35.012] a technical educational background myself. So I found it very easy to empathize with people who don't know [07:35.012 --> 07:40.080] what's going on because I don't know what's going on. And so way back when I was doing management consulting, [07:40.080 --> 07:48.064] you know, 25 years ago, I was always using data driven approaches rather than expertise and [07:48.064 --> 07:54.016] interview driven approaches to solve problems because I didn't have any expertise and I couldn't [07:54.016 --> 07:59.028] really interview people because nobody taught me seriously because they're too young. So and so then I would like [07:59.084 --> 08:05.084] have to explain to my client and to the engagement manager like well, I solved this problem using [08:05.084 --> 08:13.060] this thing called linear programming or multiple regression or a database or whatever. And yeah, [08:13.060 --> 08:18.000] what I found was I very, I wouldn't say very quickly, but within a couple of years in consulting, [08:18.000 --> 08:23.092] I started finding myself like running training programs for what we would today call data [08:23.092 --> 08:31.092] science, but 20 something years before we were using that word. Yeah, basically teaching our client and [08:31.092 --> 08:37.052] you know, so when I was at AT County, I ran a course to the whole company, basically, that every [08:39.004 --> 08:44.040] associate and BA had to do in what we were today called data science, you know, a bit of SQL, [08:44.040 --> 08:50.080] a bit of progression, a bit of spreadsheets, a bit of Monte Carlo. So yeah, I've actually [08:51.068 --> 08:59.068] done quite a lot of that now you mentioned it and certainly Rachel also, but for her on pure math, [09:00.040 --> 09:06.040] you know, so she ran some courses at Duke University and stuff for post grads. So yeah, [09:06.040 --> 09:12.040] I guess we both had some some practice and we were pretty passionate about it. So we also study [09:13.076 --> 09:23.044] the literature of how to teach a lot, which most teachers weirdly enough don't. So that's good. [09:24.008 --> 09:30.000] Do you feel like um, there are things that you feel like uniquely proud of in your teaching or [09:30.000 --> 09:36.000] like things that you're doing particularly well compared to, um, you know, other classes that people might take? [09:36.056 --> 09:41.092] Yeah, I mean, I wouldn't say unique because there's always other people doing good stuff, you know, [09:41.092 --> 09:50.056] I think we're notable for two things in particular. One is code first and the other is top down. [09:51.020 --> 09:58.000] So, you know, I make a very conscious decision and kind of everything I do to focus on myself as the [09:58.000 --> 10:05.084] audience. I'm not a good mathematician, you know, I'm like, I'm capable nowadays, but it's not [10:05.084 --> 10:10.008] something that's really in my background and doesn't come naturally to me for me, the best [10:10.096 --> 10:18.032] explanation of a technical thing is like an example in some code that I can run debug, look at the [10:18.032 --> 10:25.020] intermediate inputs and outputs. So I make a conscious decision in my teaching to teach to people [10:25.020 --> 10:34.024] who are like me. And although most people at kind of graduate level in technical degrees are not [10:34.024 --> 10:39.092] like me, they've all done a lot of math. Most people that are interested in this material are like [10:39.092 --> 10:45.076] me, they're people who don't have graduate degrees and they're really underrepresented in the teaching [10:45.076 --> 10:51.028] group because like nearly all teachers are academics and so they can't empathize with people who [10:52.032 --> 11:01.028] don't love Greek letters, you know, and integrals and stuff. So yeah, so I always explain [11:02.008 --> 11:09.020] things by showing code examples and then the other is top down, which is, again, the vast majority [11:09.020 --> 11:15.092] of humans, not necessarily the vast majority of people who have spent a long time in technical degrees [11:15.092 --> 11:21.028] and made it all the way to being professors, but most regular people learn much better when they have [11:21.028 --> 11:27.020] context. Why are you learning this? What's an example of it being applied? You know, what are some [11:27.020 --> 11:33.020] of the pros and cons of using this approach before you start talking about the details of how [11:33.020 --> 11:40.032] it's put together. So this is really hard to do, but we try to make it so that every time we introduce [11:40.032 --> 11:47.076] a topic, it's because we need to show it in order to explain something else or in order to improve [11:47.076 --> 11:53.060] something else. And this is so hard because obviously everything I'm teaching is stuff that I know [11:53.060 --> 11:59.076] really well. And so it's really easy for me to just say like, okay, you start here and you build on [11:59.076 --> 12:05.036] this and you build on this and you build on this and here you are. And that's just a natural way to try to [12:05.036 --> 12:11.036] teach something but it's not the natural way to learn it. So I don't think people realize how difficult [12:11.036 --> 12:18.088] top down teaching is, but people say it really appreciate it. Yeah, they do seem to really appreciate it. [12:18.088 --> 12:23.004] Do you think I've been a lot of the types of racial methods directly, but do you think [12:23.004 --> 12:27.036] rates are the same approach as you because it sounds like she has a pretty different background? [12:27.036 --> 12:32.088] Yeah, she does have a different background, but she certainly has the same approach because we've [12:32.088 --> 12:40.072] talked about it and we both kind of jump on each other to say like, hey, you know, because we kind of [12:40.072 --> 12:45.020] do a lot of development together or we did before she got onto the data ethics stuff more. [12:46.056 --> 12:50.056] And sometimes, you know, I'll say to her, like, hey, that seems pretty bottom-up don't you think [12:50.056 --> 12:56.048] can jump you like, oh, yeah, it is. Damn it, it's started again, you know. So we both know it's [12:57.036 --> 13:00.088] important and we both try really hard to do it, but we don't always succeed. [13:02.000 --> 13:06.088] And do you tell me about the library that you've built like how that came about? Do you think it was [13:06.088 --> 13:12.056] necessary to do it to teach the way you wanted to? Well, it's not, it's remember the purpose of this [13:12.056 --> 13:17.068] is not teaching, so we want there to be no teaching. So the goal is that they're all minimal [13:17.068 --> 13:23.044] teaching. The goal is that there should be no code and it should be something can pick up in half an hour [13:23.044 --> 13:33.028] and get going. So the fact that we have to teach what ends up being about 140 hours of work is [13:33.028 --> 13:40.072] a failure, you know, we're still failing. And so the only way to fix that is to create software, [13:41.044 --> 13:49.020] which makes everything dramatically easier. So really the software is, that's actually the thing, [13:49.020 --> 13:56.096] that's actually our goal. But we can't get there until, you know, we first of all teach people [13:57.052 --> 14:03.084] to use whatever it exists and to do the research to figure out like, well, why is it still hard, [14:03.084 --> 14:08.040] why is it still too slow, why does it still take too much compute, why does it still take too much [14:08.040 --> 14:14.048] data like what are all the things that limit accessibility through the research to try and improve each of [14:14.048 --> 14:20.016] those things a little bit? Okay, how can we kind of embed that into software? Yeah, the software [14:20.016 --> 14:26.008] is kind of the end result of this, I mean, it's still a loop, but eventually hopefully it'll all be [14:26.096 --> 14:33.068] in the software. And I guess we've gotten to a point now where we feel like we understood some of the [14:34.040 --> 14:40.096] key missing things in deep learning libraries, at least we're still a long way away from being no code, [14:40.096 --> 14:47.012] but at least saw things like, oh, you know, basic objectory into design is basically, [14:47.012 --> 14:54.024] is largely impossible because tenses don't have any kind of semantic types. So let's add that and see [14:54.024 --> 14:59.084] where it takes us. You know, kind of stuff like that, we really tried to get back to the foundation. [14:59.084 --> 15:05.004] So is there any other ones that was a good one, any other that comes to mind? Yeah, I mean, [15:07.004 --> 15:14.048] you know, I mean dispatch is a key one. So the fact that the kind of Julius dialed a dispatch is not [15:14.048 --> 15:21.044] built into Python. So function dispatch on type arguments, we kind of felt like we had to fix that [15:21.044 --> 15:28.048] because really in for data science, the kind of data you have impacts what has to happen. [15:28.048 --> 15:39.020] And so if you say rotate, then depending on whether it's a 3D CT scan or an image or a point [15:39.020 --> 15:47.052] cloud or a set of key points for human pose, rotate, semantic remains the same thing, but requires [15:47.052 --> 15:52.064] different implementations. So yeah, we built this kind of, [15:53.044 --> 16:01.028] Julius inspired type dispatch system also like realizing that to go with, again, it's really all about [16:01.028 --> 16:08.040] types, I guess, when you have semantic types, they need to go all the way in and out by which I mean, [16:08.040 --> 16:16.096] you put an image in, it's a pillow, you know, image object. It needs to come all the way out the other [16:16.096 --> 16:25.012] side is, you know, an image tensor, go into your model, the model then needs to produce an image, you know, [16:26.032 --> 16:32.048] an image tensor or a category type or whatever. And then that is to come out all the way the other [16:32.048 --> 16:38.008] side to be able to be displayed on your screen correctly. So we had to make sure that the entire [16:38.008 --> 16:48.024] transformation pipeline was reversible, so we had to set up a new system of reversible composable transforms. [16:50.016 --> 16:56.008] So this stuff is all like, as much as possible, we try to hide it behind the scenes, but without [16:56.008 --> 17:04.040] these things, our eventual goal of no code would be impossible because, you know, you would have to [17:04.040 --> 17:11.012] tell the computer, like, oh, this tensor that's come out actually represents, you know, three bounding boxes [17:11.012 --> 17:18.088] along with associated categories, you know, and describe how to display it and stuff. So it's all [17:18.088 --> 17:25.092] pretty foundational to both making the process of coding easy and then down the track over the next [17:25.092 --> 17:32.016] couple of years, you know, removing the need for the code entirely. And what did you, um, like, [17:32.016 --> 17:39.036] was the big goal behind releasing a V2 of the library? That was kind of a bold choice, right, to [17:39.036 --> 17:45.060] to just make it complete re-write. Yeah, I'm, you know, I'm a big fan of [17:46.080 --> 17:53.052] second system, you know, the kind of the opposite of Joltsbalski, you know, I, I, I would love re-writing [17:53.052 --> 17:58.032] a more, I mean, I'm no author Whitney, but you know, author Whitney who created K and KDB, [17:59.092 --> 18:05.052] every version he re-write the entire thing from scratch, and he's done many versions now. [18:06.096 --> 18:10.064] But that's, that's, I really like that as a general approach, which is like, [18:11.092 --> 18:19.004] if I haven't learned so much that my previous version seems like ridiculously naive and pathetic, [18:19.004 --> 18:25.012] then I'm, I'm not moving forwards, you know, so I do find every year I look back at any code I've [18:25.012 --> 18:30.056] gotten think like, oh, that could be so much better. And then you re-write it from scratch, I did the [18:30.056 --> 18:38.088] same thing with the book, you know, I re-wrote every chapter from scratch a second time. So it's partly [18:38.088 --> 18:44.040] that, and it's partly also just that it took a few years to get to a point where I felt like, [18:46.032 --> 18:51.020] I, I, I actually had some solid understanding of what was needed, you know, the kind of things [18:51.020 --> 18:58.048] I just described. And some of, a lot of it came from like a lot of conversations with Chris Latner, [18:58.048 --> 19:03.092] that the inventor of Swift and L. O'yam. So when we taught together, [19:05.052 --> 19:10.040] it was great sitting with him and talking about like, pouring faster, I had a swift and like, [19:11.036 --> 19:18.064] the type system in Swift and then working with Alexis Gallagher, who's like, maybe the world's [19:18.064 --> 19:26.056] foremost expert on the Swift's value type system and he helped us build a new data block API for Swift. [19:27.020 --> 19:31.092] And so kind of through that process as well, it made me realize like, yeah, you know, this is, [19:34.088 --> 19:43.004] this is actually a real lasting idea. And I should mention it goes back to the very idea of the [19:43.004 --> 19:50.088] data block API, which actually goes back to the first day over version one, which is this idea that, [19:50.088 --> 19:56.064] and again, it's kind of based on really thinking carefully about the foundations, which is rather [19:56.064 --> 20:03.044] than have a library, which every possible combination of inputs and outputs ends up being this [20:03.044 --> 20:10.008] totally different class, you know, with a different API and different ideas, let's have some types that [20:10.008 --> 20:16.064] represent, that could be either an input or an output and then let's figure out the actual steps you need. [20:16.064 --> 20:21.068] It's like, okay, you know, how do you figure out what the input items are, how do you figure out what the [20:21.068 --> 20:27.020] output items are, how do you figure out how to put out the validation set, how do you figure out how to get the labels. [20:28.096 --> 20:34.056] So again, these things are just like, yeah, we, you know, came to them by stepping back and saying, [20:36.008 --> 20:42.096] what is actually foundationally what's going on here and let's do it properly, you know. So first day I [20:42.096 --> 20:51.020] too is really our first time where we just stepped back and, you know, literally I said, [20:52.024 --> 20:55.044] you know, so Sylvan and I worked on it and I said to Sylvan, like, we're not going to [20:56.096 --> 21:02.000] push out any piece of this until it's the absolute best we can make it, you know, right now, [21:03.052 --> 21:07.068] which I don't see over how kind of got a bit, you know, it will always be crazy sometimes, like the [21:08.040 --> 21:12.072] transform's API, transform's API, I think I went through like 27 rewrites, [21:14.096 --> 21:19.020] but you know, I kept thinking like, no, this is not good enough, this is not good enough, you know, [21:20.056 --> 21:23.068] until eventually it's like, okay, this is, this is actually good now. [21:23.068 --> 21:28.056] So there's the hardest part, the external APIs then, because that does seem like it would be really [21:28.056 --> 21:35.028] tricky to make that, I mean, that seems like an endless task to make these APIs like clear enough and [21:36.000 --> 21:41.028] organize. Well, they never, I never think of them as external APIs, to me, they're always internal [21:41.028 --> 21:45.084] APIs, they're what I mean. Because you want to make a bigger system. Yeah, they're what are [21:45.084 --> 21:50.064] my building the rest of the software, where exactly, and you know, we went all the way back to like [21:50.064 --> 21:55.036] thinking like, well, how do we even write software? You know, I'm a huge fan of always been a huge fan of [21:55.036 --> 22:00.056] the idea of literate programming, but never found anything that made it work. And you know, we've been [22:00.056 --> 22:08.056] big proponents of Jupiter, notebook, forever, and it was always upsetting to me that I had this like [22:08.056 --> 22:15.060] Jupiter world that I loved being in, and this like, IDE world, which I didn't have the same ability to [22:18.000 --> 22:24.064] explore in a documented, reproducible way and incorporate that exploration and explanation [22:24.064 --> 22:30.064] into the code as I wrote. So yeah, we went all the way back and said, like, oh, I wonder if there's [22:30.064 --> 22:38.048] a way to actually use Jupiter notebooks to create an integrated system of documentation and code and [22:38.048 --> 22:47.060] tests and exploration, it turns out the answer was yes. So yeah, it's really like just going, [22:47.060 --> 22:53.084] going right back at every point that I kind of felt like I'm less than entirely happy with the [22:53.084 --> 22:58.024] way I'm doing something right now, it's like to say, okay, can we fix that? Can we make it better? [22:59.028 --> 23:06.024] And Python really helped there, right? Because Python is so hackable, you know, the whole, the fact [23:06.024 --> 23:12.000] that you can actually go into the meta object system and change how type dispatch works and change how [23:12.000 --> 23:19.052] inheritance works. So like how type dispatch system has its own inheritance implementation built into [23:19.052 --> 23:27.084] it, yeah, it's amazing you can do that. Wow, why? Because the type dispatch system needs to [23:27.084 --> 23:35.012] understand inheritance when it comes to how do I decide if you call a function on A and B that, [23:35.012 --> 23:44.064] you know, on types A and B. And there's something registered for that function, which has some super [23:44.064 --> 23:49.084] class of A and some higher super class of B and something else with a slightly different combination. How do you [23:49.084 --> 23:57.068] decide which one matches, you know? So in the first version of it, I ignored inheritance entirely and [23:57.068 --> 24:05.060] it would only dispatch if you had the types exactly matched or one of the types was none. But then later [24:05.060 --> 24:11.060] run, I added, yeah, I added inheritance. So now you can, you've got this nice combination of [24:11.060 --> 24:17.068] multiple to spatch and inheritance, which is really convenient. Can you give me some examples of how [24:17.068 --> 24:22.008] the inheritance works with your types? Because I would think it could get kind of tricky, like what's [24:22.008 --> 24:27.004] even inheriting from what and the types that just quickly come to mind for me, like if you have [24:27.004 --> 24:32.064] an image of the multiple bounding boxes with that inherit from just a raw image, or that, yeah, [24:32.064 --> 24:40.000] so generally those kind of things will compose. You know, so we, I don't think we ever use multiple inheritance [24:41.044 --> 24:47.004] I would try to stay away from extra, but I would always find it a bit hairy. So it's two things tend to be a lot more [24:47.004 --> 24:56.064] functional. So, you know, a black and white image inherits from image. And I think a die-com [24:56.064 --> 25:02.048] image, which is a medical image, also inherits from image. And then there are transforms with the type [25:02.048 --> 25:07.028] signatures which will take an image and then there will be others which will take a die-com image. [25:07.028 --> 25:14.024] And so if you call something with a die-com image for which there isn't a registered function, [25:14.024 --> 25:17.036] that takes a die-com image, but there is one that takes an image, it'll call the image one. [25:18.064 --> 25:24.096] And so, and then we kind of use a row there in ways where there'll be a kind of, [25:27.044 --> 25:33.052] we use a lot of duck typing, so there'll be like a call dot method and dot method can be implemented [25:33.052 --> 25:40.008] differently in the various image subclasses. And something the other thing you can do with a type [25:40.008 --> 25:46.056] dispatch system is you can use a top-art of types which means that that function argument can be any [25:46.056 --> 25:50.096] of those types, so you can kind of create union types on the fly which is pretty convenient too. [25:52.016 --> 25:56.096] Are there parts of in the V2 that you're still not happy with? Are you really able to [25:56.096 --> 26:02.024] realize that vision of, there are some parts, yeah, there are partly that happened kind of [26:02.024 --> 26:14.000] across the COVID and you know, unfortunately found myself the kind of face of the global masks movement, [26:14.080 --> 26:20.016] which didn't move much for more interesting things like deep learning. So some of the things that [26:20.016 --> 26:27.036] we kind of added in towards the end, like some of the stuff around inference is still a little [26:28.088 --> 26:37.060] possibly a little clunky. But you know it's only some little pieces, like I mean on the whole [26:37.060 --> 26:43.004] inference is pretty good, but for example I didn't really look at all at how things would work [26:43.004 --> 26:50.040] with Owen and X, for example, so kind of mobile or highly scalable serving. [26:54.080 --> 27:01.020] Also the training loop needs to be a little bit more flexible to handle things like the hugging face [27:01.020 --> 27:05.044] transformers API makes different assumptions, don't quite fit our assumptions. [27:07.004 --> 27:14.096] TPU training because of the way it runs on this separate machine that you don't have access to, [27:14.096 --> 27:20.072] you kind of have to find ways to do things that have the except really high latency. [27:20.072 --> 27:29.020] And so like for TPU we kind of, it's particularly important because we built a whole new computer [27:29.020 --> 27:35.068] vision library that runs on the GPU or runs in PyTorch, you know, which generally is targeting the GPU. [27:36.024 --> 27:44.016] And PyTorch has pretty good GPU launch latency along with a good Nvidia driver. So we can do a lot of [27:45.044 --> 27:55.028] stuff on the GPU around transformations and stuff. That all breaks down with TPU because like every time [27:55.028 --> 28:01.084] you do another thing on the TPU you have to go through that whole nasty latency. So yeah there's a [28:01.084 --> 28:07.092] few little things like that that need to be improved. So it's important to you that your library [28:07.092 --> 28:14.088] is used widely outside of a learning context, like is one of your goals to make it kind of widespread [28:14.088 --> 28:20.048] and product systems. Yeah, yeah, yeah. I mean because the learning context hopefully goes away eventually. [28:20.048 --> 28:25.068] Hopefully there will be no fast AI course and it'll just be software. So if people are only using [28:25.068 --> 28:32.008] our software and a learning context, what we used at all. Yeah, we want to use everywhere or something [28:32.008 --> 28:35.092] like it. I mean I don't care whether it's fast AI or if somebody else comes along and creates something [28:35.092 --> 28:42.008] better. We just want to make sure that the learning is accessible. That's super important. And [28:42.072 --> 28:49.084] the funny thing is because deep learning is so new and it kind of appeared so quickly a lot of the [28:49.084 --> 28:55.084] decision makers even commercially are people that are highly academic and the whole kind of academic [28:56.040 --> 29:02.096] ecosystem is really important much more so than in any other field ever than in. [29:04.096 --> 29:10.056] So one of the things we need to make too is make sure that researchers are using fast AI. So we try [29:10.056 --> 29:17.052] and we research as to, so we try to make it very research friendly and that's one of the key focuses [29:17.052 --> 29:22.096] really at the moment. Does that? I mean I would think it's just naively like making something [29:22.096 --> 29:30.000] research friendly would involve the opposite of making it like a single clean API like it or like [29:30.056 --> 29:35.044] abstracting away all the details. I would think researchers would want to really tinker with the [29:35.044 --> 29:44.072] low level assumptions. Yeah well that's why you need a layered API because the first thing to [29:44.072 --> 29:49.044] realize is that's getting to the point now or maybe it's at the point now where most researchers [29:49.044 --> 29:53.084] are doing research with deep learning are not deep learning researchers. You know they're [29:54.064 --> 30:02.016] proteomics researchers or genomics researchers or animal husbandry researchers or whatever you know or [30:02.016 --> 30:08.088] restroof physics. No husbandry. I was the keynote speaker at a couple of years ago the major [30:08.088 --> 30:15.052] international animal husbandry congress. So that's a lot more. I got a nice chapter Auckland with [30:15.052 --> 30:22.072] a family. It was very pleasant. In fact, hadly wickens father organized it and he invited me. [30:22.072 --> 30:28.016] Well I'm sorry I cut you off your making it a single point. I'm different of reason. [30:28.016 --> 30:34.032] I didn't know that you were so ignorant about animal husbandry because I'm disgusted to. I love [30:34.032 --> 30:38.080] the unusual use case to deep learning. Definitely it's something I collect but that's not heard [30:38.080 --> 30:47.012] that one. Yeah so, sorry where were we? We were talking about, oh yeah researchers. [30:47.012 --> 30:52.000] So you're doing research into a thing, right? So like I don't know maybe it's like you're trying [30:52.000 --> 31:00.072] to find a better way to do gradient accumulation for FB 16 training or maybe you're trying an [31:00.072 --> 31:08.008] new activation function or maybe you're trying to find out whether this different way of handling [31:08.008 --> 31:14.056] for channel input works well for high-pospectual satellite imagery or whatever. So the idea is to [31:14.056 --> 31:20.064] let you focus on that thing and not all the other things but then you want all the other things to [31:20.064 --> 31:27.028] be done as well as possible because if you do a shitty job of all the other things then you might say like [31:27.028 --> 31:32.008] oh my activation function is actually really good but then somebody else might notice it like oh no it [31:32.008 --> 31:39.044] was just just doing a crappy version of data augmentation effectively so if we add dropout then [31:39.044 --> 31:48.032] your thing doesn't help anymore. So with a layered API you can use the high level easy bits with [31:48.032 --> 31:54.088] like all the defaults that work nicely together and then you just pick the bit that you want and [31:54.088 --> 32:00.072] delve in as deep as you like so this kind of really fall layers key layers in our API so maybe you're [32:00.072 --> 32:05.036] going to create a new data block or maybe you're going to create a new transform or maybe you're [32:05.036 --> 32:12.040] going to create a new callback. So like the thing about first AI is it's actually far more hackable [32:12.040 --> 32:18.064] than say keras right being take take what I'm very familiar with so like with keras you kind of have this [32:20.064 --> 32:26.040] ready-worldifying transformation pipeline or TF.data if using that pretty well defined set of [32:27.092 --> 32:33.084] atomic units you can use and if you want to customize them you kind of add a black you know often [32:33.084 --> 32:40.064] it requires going and creating a new TF up in C++ or something so it really helps using pie torch [32:40.064 --> 32:45.068] they kind of provide these really nice low latency primitives and then we build out everything out of [32:45.068 --> 32:50.056] those latency primitives and we kind of gradually layer the API's on top of each other and we make sure [32:50.056 --> 32:55.044] that they're very well documented all the way down so you don't kind of get to a point where it's like [32:56.008 --> 33:02.096] oh you're now in the internal API good luck it's like nope it's all external API and it's all documented [33:02.096 --> 33:10.008] and it all has tests and it all has examples and it all has explanations so you can put put your research [33:10.008 --> 33:16.080] in at the point that you need it I think but I guess when you talk about academics then or researchers [33:16.080 --> 33:23.044] sorry not academics but you're imagining like actual machine learning researchers researching on [33:23.044 --> 33:28.088] machine learning itself versus like an animal husbandry researcher who needs an application of machine learning [33:28.088 --> 33:37.036] I guess both yeah both and so I mean it's a much easier for me to understand the needs of [33:37.036 --> 33:44.072] ML researchers because that's what I do and that's who I generally hang out with but there's a lot of [33:44.072 --> 33:50.064] overlap like I found back in the days when we had conferences that you could go to you know as I walked [33:50.064 --> 33:56.008] around Europe's a lot of people would come up to me and say like oh I just gave this talk I just gave [33:56.008 --> 34:01.052] this poster presentation and three years ago I was a fast day I was student before that I was a [34:01.052 --> 34:10.016] meteorologist or astrophysicist or neuroscientist or whatever and you know I used your course to understand [34:10.016 --> 34:16.048] the subject and then I used your software and then I brought in these ideas from astrophysics or [34:16.048 --> 34:21.004] neuroscience or whatever and now I'm presenting them at Europe's and so this kind of like this [34:21.092 --> 34:27.044] yeah really interesting overlap now between the worlds of ML research and domain expertise [34:27.044 --> 34:37.052] in that increasingly domain experts are becoming you know pretty well noted and well respected ML researchers [34:37.052 --> 34:42.096] as well because you kind of have to be you know like if you want to do a real kick-out job of [34:42.096 --> 34:49.060] medical imaging for instance there's still a lot of foundational questions you have to answer about like [34:49.060 --> 34:58.008] how do you actually deal with large three-day volumes you know it's still these things are not solved and [34:58.008 --> 35:03.004] so you do have to become a really good deep learning researcher as well you know I think one of the [35:03.004 --> 35:07.092] things that that I always worry about for myself is kind of you know getting out of date like I [35:07.092 --> 35:13.068] remember being in my early 20s and looking at some of the you know the tenured professors that were [35:13.068 --> 35:18.048] my age now in thinking boy you know they've just not said current in the state of much but she [35:18.048 --> 35:23.084] learning and then you know I started a company and I kind of you know realized that you know I actually [35:23.084 --> 35:28.064] wasn't staying you know up to date myself and you know kind of often stuck in like older techniques [35:28.064 --> 35:32.040] that I was more comfortable with like languages as more comfort with and yeah I feel like one of [35:32.040 --> 35:37.060] things that you do just phenomenally well from at least from the outside is staying kind of really current [35:37.060 --> 35:42.072] and on top of stuff yeah what do you think about how you do that because I mean I kind of say I really [35:42.072 --> 35:49.052] admired what you did with moving away from from your world of crowdsourcing into deep learning and I think [35:49.052 --> 35:55.052] you took like a year or so just to figure it out right not many people do that you know and I think a lot [35:55.052 --> 36:02.080] of people assume they can't because if you get to I don't know your mid 30s or whatever and you [36:02.080 --> 36:10.024] haven't learned a significant new domain for the last decade you could easily believe that you're [36:10.024 --> 36:16.032] not capable of doing so so I think you're going to have to do what you do which is just to decide to do it [36:16.032 --> 36:24.024] I mean for me I took a rather extreme decision when I was 18 which was to make sure I spent half [36:24.024 --> 36:31.044] of every day learning or practicing something new for the rest of my life which I stuck to certainly [36:31.044 --> 36:41.084] on average nowadays it's yeah nowadays it's more like 80 percent yeah I mean it's so for me [36:42.088 --> 36:47.020] I mean it's weird I my brain still tells me I won't be able to understand this new thing because I [36:47.020 --> 36:50.088] start reading something that I don't understand it straight away and my brain's like okay this is too [36:50.088 --> 36:57.092] hard for you as you kind of have to push through that but yeah for me I kind of had this realization [36:58.096 --> 37:06.072] you know as a teenager that learning new skills is this high-level rejectivity and so I kind of [37:06.072 --> 37:11.036] hypothesize that if you keep doing it for your whole life like I noticed nobody did like or nobody [37:11.036 --> 37:16.096] I knew did and I thought well if you did wouldn't you get this kind of like exponential returns [37:17.092 --> 37:23.004] and so I thought I should try to do that so that's that's kind of my pain my approach [37:23.004 --> 37:30.008] well so you reasoned your way into that choice that's amazing is it is it like a um do you do you have to kind [37:30.008 --> 37:37.004] of fight your immediate instincts to do that or is it kind of a pleasure to my instincts to find [37:37.004 --> 37:44.048] now what you do I do have to do is to fight well not anymore not now that I work with my wife and [37:44.048 --> 37:48.096] you know I'm working with Sylvan who's super understanding and understood me in a similar but [37:48.096 --> 37:56.072] for nearly all my working life fighting or at least dealing with the people around me because if somebody's [37:56.072 --> 38:02.072] like particularly when you're the boss and you're like okay we urgently need to do x and somebody can [38:02.072 --> 38:08.024] clearly see that like why the fuck you like using Julia for the first time to use x we don't even [38:08.024 --> 38:12.008] know Julia you could have had it done already if you just use pell or python or some shit that you're [38:12.008 --> 38:24.000] already knew as like well you know I just wanted to learn Julia so yeah it's like it drives people around [38:24.000 --> 38:32.024] me crazy that I'm working with because everybody's busy and it's hard to in the moment [38:32.024 --> 38:37.012] appreciate that like okay this moment isn't actually more important than every other moment [38:37.012 --> 38:43.020] for the rest of your life and so if you don't spend time now getting better at your skills then the [38:43.020 --> 38:47.004] rest of your life you're going to be a little bit slower and a little bit less capable and a little bit [38:47.004 --> 38:52.072] less knowledgeable so that's the hard bit it also sounds to me like just from the examples that you've [38:52.072 --> 38:58.008] given that you have a real bias to learning by doing is it is that right like do you also like [38:58.008 --> 39:02.048] kind of read papers and synthesize that in a different way yeah but if I read a paper [39:03.036 --> 39:06.008] I only read it until I get to the point right aside it's something I want to [39:06.080 --> 39:17.036] implement or not or that there's some idea that I want to take away from it to implement yeah so I like I [39:18.080 --> 39:24.072] find doing things I don't know I'm a very intuitive person so I find doing things and experimenting [39:24.072 --> 39:31.044] a lot I kind of get a sense of how things kind of fit together I really like the way Richard [39:31.044 --> 39:38.008] Feynman talked about his research and his understanding of papers was that he always thinks about [39:38.008 --> 39:43.076] a physical analogy every time he reads a paper and he doesn't go any further on a paper until he has [39:43.076 --> 39:49.020] a physical analogy in mind and then he always found that he could spot the errors and papers straight away [39:49.020 --> 39:54.000] by recognizing that the physical analogy would red break down so I'm kind of like that I'm always [39:54.000 --> 40:00.088] looking for the context and understanding of what it's for and then try to implement it. [40:03.028 --> 40:08.008] So she we expect the next version of Fast Day to be in a new language every thought of moving away from Python [40:09.028 --> 40:14.040] Oh I mean obviously I have because I looked at Swift you know and sadly you know Chris [40:14.040 --> 40:21.044] Latin I left Google so I don't know you know they've got some good folks still there maybe they'll [40:21.044 --> 40:29.068] make something great of it but you know I tend to kind of follow people like you know people who have [40:29.068 --> 40:34.056] been successful many times and Chris is one of those people so yeah I mean what's next I don't know like [40:34.056 --> 40:42.056] it's certainly like Python is not the future of machine learning you can't be you know it's it's so [40:42.056 --> 40:51.020] nicely hackable but it's so frustrating to work with a language where you can't do anything fast enough [40:51.020 --> 41:01.004] unless you you know uh call out to some external couter or c code and you can't run anything in parallel [41:01.004 --> 41:06.048] unless you like put on a whole other process I find working with Python there's just so much [41:06.048 --> 41:16.008] overhead in my brain to try to get it to work fast enough it's obviously fine for a lot of things but [41:16.008 --> 41:21.060] not really in the deep learning world or not really in the machine learning world so like I really hope that [41:22.072 --> 41:27.076] Julia is really successful because like there's a language with a nicely designed type system and [41:27.076 --> 41:34.008] nicely designed a space system and most importantly it's kind of Julia all the way down so you can [41:34.008 --> 41:43.060] get in and write your GPU kernel in in Julia or you can you know all the basic stuff is implemented in [41:43.060 --> 41:49.004] Julia all the way down until you hit the LLVM so this is a very simple question Julia's kind of like [41:49.004 --> 41:54.096] Matt Lab is a what I should be thinking it was designed to be something that Matt Lab people could [41:55.060 --> 42:08.088] could use but um no it's more like I don't know like common list mates Matt Lab mates Python so [42:08.088 --> 42:18.096] that's a little bit like R maybe um you see I'll have some nice ideas but um you know that the R [42:20.016 --> 42:27.052] object system this I mean I there's too many of them and be their all such a hack and then see [42:27.052 --> 42:32.000] it's because it's so dynamic it's very slow so again you have to implement everything in [42:32.000 --> 42:37.044] something that's not R and R just becomes a glue language on top of it I mean I spent so so many years [42:37.044 --> 42:44.072] writing writing R and certainly better than what came before but I never enjoyed it so Julia is a compiled [42:44.072 --> 42:54.088] language and it's got a rich type system and it's entirely based on function dispatch using the [42:54.088 --> 43:01.092] type system it's got a very strong kind of meta programming approach so that's why you can write [43:01.092 --> 43:09.020] your CUDA kernel in Julia for example you know it's got an auto grad again it's written in Julia [43:10.008 --> 43:18.048] so it's got a lot of nice features but unfortunately it hasn't really got the corporate [43:19.044 --> 43:29.012] buy-in yet so it's highly reliant on a kind of this core group of super smart people that started it [43:29.012 --> 43:34.056] and they're run Julia computing which doesn't seem to have a business model as far as I can tell other [43:34.056 --> 43:42.032] than get getting funding from VCs which works for a while but at some point stops [43:45.012 --> 43:49.084] I guess what is it yes I know what is the fastest business model is there a business model? [43:49.084 --> 43:54.080] The fastest business model is that I take money out of my bank account to pay for things I need and [43:55.092 --> 44:01.092] that's about it. Well you know we always end with two questions I want to make sure we have time for that [44:01.092 --> 44:07.092] to have a little bit of consistency here and the first one is you know when you when you look at [44:07.092 --> 44:14.032] the different topics and you know kind of machine learning broadly defined certain topics that you think [44:14.032 --> 44:18.072] that people should pay a lot more attention to than they generally are paying attention to. [44:19.076 --> 44:25.076] Yes and I think it's the world of deep learning outside of the area that you're familiar [44:25.076 --> 44:35.052] with. So for example when I got started in NLP I was shocked to discover that nobody I spoke to in the [44:35.052 --> 44:40.056] world of NLP had any familiarity with the last three or four years of development in computer vision [44:41.076 --> 44:46.080] the idea of like transfer learning for example and how incredibly flexible it was. [44:48.080 --> 44:55.036] So that's what led to ULM fit which in turn led to GPT which in turn led to GPT 2 [44:55.036 --> 45:00.072] and before ULM fit happened every NLP researcher I spoke to I said like what do you think you know about [45:00.072 --> 45:06.032] this idea of like super massive transfer learning from language models and everybody has spoke to [45:06.032 --> 45:10.064] NLP said that's a stupid idea and everybody has spoke to in computer vision said yes of course [45:10.064 --> 45:17.020] I'm sure everybody does that already. So yeah I think in general people are way too specialized in [45:17.020 --> 45:22.064] deep learning and there's a lot of good ideas in other parts of it. [45:23.084 --> 45:30.008] Interesting cool and then our final question we always ask and a kind of wonder you'll have an interesting [45:30.008 --> 45:35.020] perspective on this you know typically we're talking to people that are trying to you know [45:35.020 --> 45:39.084] use machine learning model for some purpose like animal husbandry but you've started seeing this [45:39.084 --> 45:45.060] wide range of applications and when you look at when you look across the things that you've seen [45:45.060 --> 45:51.028] kind of go from like ideations like to play things that's working and useful where do you see the [45:51.028 --> 45:58.088] biggest bottleneck? I mean the projects I've been involved in throughout my life around machine learning [45:58.088 --> 46:04.072] have always been successfully deployed you know so I kind of get frustrated with all these people [46:04.072 --> 46:08.000] who tell me that machine learning is just this abstract thing that no one's actually using. [46:09.004 --> 46:13.052] I think a big part of the problem is there's kind of people that understand [46:14.088 --> 46:20.016] business and logistics and process management there's kind of people that understand AI and algorithms [46:20.016 --> 46:26.088] data and there's not much connectivity between the two so like I spent 10 years working as a management [46:26.088 --> 46:33.076] consultant so my life was logistics and business processes and HR and all that stuff you know. [46:33.076 --> 46:38.008] It's kind of hard to fix you as a measurement so you're as a minister forizing [46:40.008 --> 46:46.080] try to fake it as best as I could for sure. I've noticed a lot of people in the kind of machine learning [46:46.080 --> 46:57.004] world really underappreciate the complexity of dealing with constraints and finding opportunities and [46:57.004 --> 47:03.004] just aggregating value chains or they'll do the opposite notice this humor is so hard that it's [47:03.004 --> 47:08.056] impossible without realizing there's like you know large groups of people around the world who's been [47:08.056 --> 47:14.048] their lives studying these questions and finding solutions to them. So I think in general I'd love to see [47:14.048 --> 47:24.008] better cross disciplinary teams and more people on the kind of the MBA side developing kind of [47:24.008 --> 47:30.016] AI skills and more people on the AI side kind of developing an understanding of business and teams and [47:30.080 --> 47:35.084] well I mean I guess you have like this broad view you know from your background you know and you've [47:35.084 --> 47:40.040] watched these ML products kind of get deployed in these full so I guess like maybe the question is like [47:40.040 --> 47:46.008] more like what like where their points are sort of surprised you with their their level of difficulty just to kind [47:46.008 --> 47:52.016] of move through it like did you have like miss apps where you you know you thought the model was [47:52.016 --> 47:56.096] working and then when it was deployed into production it didn't you know it didn't work as well as [47:56.096 --> 48:03.052] you were hoping or thought it would. No not at all I don't know that sounds weird but it's just [48:05.004 --> 48:11.036] you know even a small amount of background and like doing the actual work that the thing you're building is [48:11.036 --> 48:18.064] meant to be integrating with you know I spent ten years working on an insurance pricing business [48:19.036 --> 48:25.004] entirely based on operations research and machine learning but before that you know the last [48:25.004 --> 48:29.020] four or five years of my management consulting career was nearly entirely an insurance so [48:30.088 --> 48:36.008] you know there's not much very surprising that that happens I know I know the people I know the processes [48:37.052 --> 48:43.036] and so that's why I think like I would much rather see I don't know like if somebody's going to do a [48:44.040 --> 48:50.016] a paralegal AI business I'd much rather see a paralegal do it then an AI person or if they're going to do [48:50.016 --> 48:57.012] a like you know HR recruiting AI business I'd much rather see someone with an HR recruiting background [48:57.068 --> 49:03.092] do it like it's it's it's super difficult like there's just no way to understand an industry really [49:03.092 --> 49:11.012] well without doing that industry for for a few years I say well what you so like you know because [49:11.012 --> 49:14.040] I know some of these people and I get this question all the time about channel a question that I'm [49:14.040 --> 49:19.020] sure is in people's heads watching this so if you are that that you know paralegal who's starting [49:19.020 --> 49:24.024] you know paralegal AI enabled business how would you do the AI part? [49:26.016 --> 49:32.064] Well obviously I would take the fast AI courses I mean I would I mean seriously I would make sure I was good at [49:32.064 --> 49:41.060] coding you know I'd spend a year working on coding and yeah I mean the fast AI courses are absolutely [49:41.060 --> 49:50.096] designed for for you and I would be careful of bringing on a so-called AI expert until you've [49:51.068 --> 49:57.020] had a go doing it all yourself because I found like most people in that situation for obvious reasons [49:57.020 --> 50:03.052] you're pretty intimidated by the AI world and kind of a bit humbled by you know a bit overwhelmed by it [50:03.052 --> 50:08.088] and they're bringing on you know a self-described expert they have no ability to judge the expertise of [50:08.088 --> 50:15.052] that person so they end up bringing somebody who's just good at projecting confidence which is probably [50:15.052 --> 50:24.000] negatively correlated with actual effectiveness so tell you do it do it yourself for for a year build the [50:24.000 --> 50:29.092] best stuff you can I do find a lot of fast AI alarm with kind of backgrounds of domain experts [50:31.004 --> 50:37.020] are shocked when they then get involved in the world of AI experts and they find they're much better at [50:37.020 --> 50:44.008] training models that actually predict things correctly than the modeling experts are I'm sure you [50:44.008 --> 50:49.012] have had that experience is somebody who you know like me doesn't have a technical background in this area [50:50.032 --> 50:54.016] yeah well thank you so much this is a this is super fun and educational for me [50:54.096 --> 51:06.016] thank you very much for having me yeah much ledger [51:06.016 --> 51:14.024] you

58.76202

52.23251

1-20

of 374