The story of & why Python is not the future of ML with Jeremy Howard
Jeremy shares his experiences in learning, teaching, developing, and making deep learning more accessible.
View all podcasts
Gradient Dissent - a Machine Learning Podcast · Evolution of Reinforcement Learning and the Robot Hand

Jeremy Howard is a founding researcher at, a research institute dedicated to making Deep Learning more accessible. Previously, he was the CEO and Founder at Enlitic, an advanced machine learning company in San Francisco, California.

Howard is a faculty member at Singularity University, where he teaches data science. He is also a Young Global Leader with the World Economic Forum, and spoke at the World Economic Forum Annual Meeting 2014 on "Jobs For The Machines."

Howard advised Khosla Ventures as their Data Strategist, identifying the biggest opportunities for investing in data-driven startups and mentoring their portfolio companies to build data-driven businesses. Howard was the founding CEO of two successful Australian startups, FastMail and Optimal Decisions Group. Before that, he spent eight years in management consulting, at McKinsey & Company and AT Kearney.

Follow Jeremy on Twitter:


Deep learning R&D & education:





The business impact of deep learning

De-identification Methods for Open Health Data



0:00 Introduction

0:52 Dad things

2:40 The story of

4:57 How the courses have evolved over time

9:24 Jeremy’s top down approach to teaching

13:02 From the course to the library

15:08 Designing V2 of the library from the ground up

21:44 The ingenious type dispatch system that powers

25:52 Were you able to realize the vision behind v2 of the library

28:05 Is it important to you that is used by everyone in the world, beyond the context of learning

29:37 Real world applications of, including animal husbandry

35:08 Staying ahead of the new developments in the field

38:50 A bias towards learning by doing

40:02 What’s next for

40.35 Python is not the future of Machine Learning

43:58 One underrated aspect of machine learning

45:25 Biggest challenge of machine learning in the real world

Lukas: You're listening to Gradient Dissent, a show where we learn about making machine learning models work in the real world. I'm your host Lukas Biewald. Jeremy Howard created the course, which is maybe the most popular course to learn machine learning and there are a lot out there. He's also the author of the book Deep Learning for Coders with and PyTorch and in that process, he made the library which lots of people use independently to write deep learning. Before that, he was the CEO and co-founder of Enlitic, an exciting startup that applies deep learning to health care applications. And before that, he was the president of Kaggle, one of the most exciting earliest machine learning companies. I'm super excited to talk to him. So Jeremy, it's nice to talk to you. And in preparing the questions, I realized that every time I've talked to you there have been a few gems that I've remembered that I would never think to ask about. Like one time you told me about how you learned Chinese and another time you gave me Dad parenting advice, very specific advice and it's been actually super helpful.

Jeremy: Oh great. Tell me what Dad parenting advice worked out?

Lukas: Well, what you told me was when you change diapers, use a blow dryer to change a really frustrating experience to a really joyful experience and it's like such good advice. I don't know how you.. I guess I can imagine how you thought of it, but it's...

Jeremy: Yeah, yeah, I know they love the whooshing sound, they love the warmth. I'm kind of obsessed about Dad things. So I'm always happy to talk about Dad things. That is this podcast.

Lukas: Can we start with that? Now that my daughter is eight months old. Do you have any suggestions for her?

Jeremy: Oh my goodness! Eight months old. You know, it's like the same with any kind of learning. It's all about consistency. So I think that the main thing we did right with Claire was just, you know, this delightful child now is we were just super consistent. Like if we said you can't have X unless you do Y, we would never give her X if she didn't do Y. If you want to take your scooter down to the bottom of the road, you have to carry it back up again. We read this great book that was saying if you're not consistent, it becomes like this thing, it's like a gambler. It's like sometimes you get the thing you want, so you just have to keep trying so that's my number one piece of advice. It's the same with teaching machine learning. We always tell people that tenacity is the most important thing for students. To stick with it, do it every day.

Lukas: I guess just in the spirit of questions, I'm genuinely curious about, you know, you've built this amazing framework and teaching thing that I think is maybe the most popular and most appreciated framework. I was wondering if you could start by telling me the story of what inspired you to do that and what was the journey to making, the curriculum and, the ML framework.

Jeremy: So it was something that my wife Rachel and I started together. Rachel has a math PhD, super technical background, early data scientist and engineer, Uber. I don't. I have just scraped by a philosophy undergrad and have no technical background. But from both of our different directions, we both had this frustration that neural networks in 2012 were super important, clearly going to change the world, but super inaccessible and so we would go to meetups and try to figure out like how do we... Like I knew the basic idea, I'd coded neural networks 20 years ago, but how do you make them really good? There wasn't any open source software at the time for running on GPUs. You know, Dan Seresen's thing was available, but you had to pay for it. There was no source code and we just thought, oh, we've got to change this, because the history of technology leaps has been that it generally increases inequality because the people with resources can access the new technology and then that leads to societal upheaval and a lot of unhappiness. So we thought, well, we should just do what we can. So we thought how are we going to fix this? Basically the goal was, and still is, to be able to use deep learning without requiring any code so that, you know, because the vast majority of the world can't code, we kind of thought, well, to get there, we should, first of all, see what exists right now? Learn how to use it as best as we can ourselves, teach people how to best use it as we can and then make it better, which requires doing research and then turning that into software and then changing the course to teach the hopefully slightly easier version and repeat that again and again for a few years. And so we're kind of in that process.

Lukas: That's so interesting. Do you worry that the stuff you're teaching, you're sort of trying to make it obsolete, right? Because you're trying to build higher level abstractions? Like I think one of the things that people really appreciate your course is that it's really clear, in-depth explanations of how these things work. Do you think that that's eventually going to be not necessary or how do you think about that?

Jeremy: Yeah, to some extent. I mean, so if you look at the new book and the new course, chapter one starts with really, really foundational stuff around what is a machine learning algorithm? What do we mean to learn an algorithm? What's the difference between traditional programming and machine learning to solve the same problem? And those kinds of basic foundations I think will always be useful, even at the point you're not using any code. I feel like even right now, if somebody is using like PlatformAI or some kind of code-free framework, you still need to understand these basics of an algorithm can only learn based on the data you provide. It's generally not going to be able to extrapolate to patterns it's not seen yet, stuff like that. Um, but yeah, I mean, we have so far released two new courses every year, you know, a part one and part two every year because every year, it's totally out of date. And we always say to our students at the start of part one, Look, you know, none of the details you're learning are going to be of any use in a year or two's time. There's a time when we're doing Piano and then TensorFlow and Keras, and then playing PyTorch. We always say, look, don't worry too much about the software we're using because none of it's still any good, you know, it's goal changing rapidly, you know, faster than JavaScript frameworks, but the concepts are important and yeah, you can pick up a new library and I don't know by weekend, I guess, you learn it all.

Lukas: It seems like you've thought pretty deeply about learning both human learning and machine learning. Had you and Rachel had practice teaching before or was this your first teaching experience?

Jeremy: Um, you know, I've actually had a lot of practice teaching of this kind, but this really informal way, partly it's because I don't have a technical educational background myself. So I found it very easy to empathize with people who don't know what's going on because I don't know what's going on. And so way back when I was doing management consulting twenty five years ago, I was always using data driven approaches rather than expertise, interview driven approaches to solve problems because I didn't have any expertise and I couldn't really interview people because nobody took me seriously because I was too young. And so then I would like to have to explain to my client and to the engagement manager, well, I solved this problem using this thing called linear programming or multiple regression or a database or whatever. And what I found within a couple of years in consulting, I started finding myself like running training programs for what we would today call data science, but 20 something years before we weren't using that word. Yeah, basically teaching our client and, you know.. So when I was in IT at Kearney, I ran a course to the whole company, basically that every associate MBA had to do in what we would today call data science, you know, a bit of SQL, a bit of regression, bit of spreadsheets, bit of Monte Carlo. So, yeah, I've actually done quite a lot of that now you mention it and certainly Rachel also. But for her on pure math, she ran some courses at Duke University and stuff for postgrads. So, yeah, I guess we both had some practice and we were pretty passionate about it. We also study the literature of how to teach a lot, which most teachers, weirdly enough, don't. so that's good.

Lukas: Do you feel like there are things that you feel uniquely proud of in your teaching or the things that you're doing particularly well compared to other classes that people might take?

Jeremy: Yeah, I mean, I wouldn't say unique because there's always other people doing good stuff, you know. I think we're notable for two things in particular. One is code first and the other is top down. So I make a very conscious decision in everything I do to focus on myself as the audience, I'm not a good mathematician, I'm capable nowadays but it's not something that's really in my in my background and doesn't come naturally to me. For me, the best explanation of a technical thing is like an example in some code that I can run, debug, look at the intermediate inputs and outputs. So I make a conscious decision in my teaching to teach to people who are like me. Although most people at kind of graduate level in technical degrees are not like me, they've all done a lot of math. Most people that are interested in this material are like me, they're people who don't have graduate degrees and they're really underrepresented in the teaching group because nearly all teachers are academics and so they can't empathize with people who don't love Greek letters, you know, and integrals and stuff. So I always explain things by showing code examples. And then the other is top down, which is, again, the vast majority of humans, not necessarily the vast majority of people who have spent a long time in technical degrees and made it all the way to being professors, but most regular people learn much better when they have context. Why are you learning this? What's an example of it being applied, you know? What are some of the pros and cons of using this approach before you start talking about the details of how to put it all together so this is really hard to do, but we try to make it so that every time we introduce a topic, it's because we need to show it in order to explain something else or in order to approve something else. And this is so hard because obviously everything I'm teaching is stuff that I know really well. And so it's really easy for me to just say, OK, you start here and you build on this and you build on this and you build on this and here you are and that's just the natural way to try to teach something but it's not the natural way to learn it. I don't think people realize how difficult top down teaching is but people tend to really appreciate it.

Lukas: Yeah, they do seem to really appreciate it. Do you think, I really would've loved to talk to Rachel about this directly, but do you think Rachel has the same approach as you? Because it sounds like she has a pretty different background.

Jeremy: Yeah, she does have a different background, but she certainly has the same approach because we've talked about it. And we both kind of jump on each other to say, hey, because we do a lot of development together or we did before she got onto the data ethics stuff more; and sometimes, you know, I'll say to her, hey, that seems pretty bottom up, don't you think? And she'd be like, oh damn it is and she'd be like start again, you know. So we both know it's important and we both try really hard to do it, but we don't always succeed.

Lukas: And can you tell me about the library that you built, like how that came about? Do you think it was necessary to do it to change the way you wanted to?

Jeremy: Well, it's not... Remember, the purpose of this is not teaching. So we want there to be no teaching, or minimal teaching. The goal is that there should be no code and it should be something you can pick up in half an hour and get going. So the fact that we have to teach what ends up being about one hundred and forty hours of work is a failure. You know, we're still failing and so the only way to fix that is to create software, which makes everything dramatically easier. So really the software is, actually our goal, but we can't get there until we, first of all, teach people to use what already exists and to do the research to figure out well, why is it still hard? Why is it still too slow? Why does it still take too much compute? Why does it still take too much data? What are all the things that limit accessibility through the research to try and improve each of those things a little bit? How can we embed that into software? Yeah, the software is kind of the end result of this, I mean, it's still a loop, but eventually, hopefully it'll all be in the software. And I guess we've gotten to a point now where we feel like we understood some of the key missing things in deep learning libraries at least. We're still a long way away from being no code, but we at least saw things like, oh, you know, basic object-oriented design is largely impossible because tenses don't have any kind of semantic types so let's add that and see where it takes us, you know, stuff like that. We really tried to get back to the foundations.

Lukas: Were there any other ones? That was a good one. Any others that come to mind?

Jeremy: Yeah. I mean, you know, I mean, Dispatch is a key one. So the fact that Julia style dispatch is not built into Python, so function-dispatch on type documents, we kind of felt like we had to fix that because really the data science, the kind of data you have impacts what has to happen and so if you say "rotate", then depending on whether it's a 3D CT scan, or an image, or a point cloud, or a set of key points for a human pose, "rotate" semantically means the same thing, but requires different implementations. So, yeah, we built this Julia inspired type dispatch system. Also realizing that to go with again, it's really all about types I guess, when you have semantic types, they need to go all the way in and out, by which I mean you put an image in, it's a pillow image object, and it's to come all the way out on the other side is an image Tenser. Go into your model, the model that needs to produce an image Tenser or a category type or whatever, and then that needs to come out all the way on the other side to be able to be displayed on your screen correctly. So we had to make sure that the entire transformation pipeline was reversible. So we had to set up a new system of reversible composable transforms. So this stuff is all, as much as possible we try to hide it behind the scenes but without these things, our eventual goal of no code would be impossible because you would have to tell the computer oh, this tenser that's come out actually represents three bounding boxes along with associated categories, you know, and describe how to display it and stuff. So it's all pretty foundational to both - making the process of coding easy and then down the track over the next couple of years, you know, removing the need for the code entirely.

Lukas: What was the big goal behind releasing a V2 of the library? That was kind of a bold choice, right? To just make a complete rewrite.

Jeremy: Yeah, I'm a big fan of the Second system. Kind of the opposite of Joel Polski. I love rewriting.  I'm no Arthur Whitney, but, you know, Arthur Whitney, who created K and KDB, every version, he rewrites the entire thing from scratch and he's done many versions. I really like that as a general approach, which is if I haven't learnt so much that my previous version seems like ridiculously naive and pathetic, then I'm not moving forward. So I do find every year I look back at any code I've got and think like, oh, that could be so much better. And then you rewrite it from scratch. And I did the same thing with the book. I rewrote every chapter from scratch a second time. So it's partly that and it's partly also just that it took a few years to get to a point where I felt like I actually had some solid understanding of what was needed; the kind of things I just described. And a lot of it came from like a lot of conversations with Chris Lattner the inventor of Swift and LLVM. So when we taught together, it was great sitting with him and talking about boarding with Swift and the type system at Swift and then working with Alexis Gallagher, who's maybe the world's foremost expert on Swift's value type system and he helped us build a new data block API for Swift and so through that process as well, it made me realize yeah, this is actually a real lasting idea, and actually, I should mention, it goes back to the very idea of the data block API, which actually goes back to the Version One, which is this idea that and again, it's kind of based on really thinking carefully about the foundations, which is rather than have a library which every possible combination of inputs and outputs ends up being this totally different class with a different API and different ideas. Let's have some types that could be either an input or an output and then let's figure out the actual steps you need. It's like, OK, how do you figure out what the input items are? How do you figure out what the output items are? How do you figure out how to get out the validation set? How do you figure out how to get the labels? So, again, these things are just like, yeah, we came to them by stepping back and saying, what is actually foundationally going on here and let's do it properly, you know. So 2 is really our first time where we just stepped back and literally, I said, Sylvia and I worked on it and I said to Sylvia like, we're not going to push out any piece of this, until it's the absolute best we can make it right now. Which I know Sylvia kind of thought I was a bit crazy sometimes. Like the transforms API, I think I went through like twenty seven rewrites. But, you know, I kept thinking no, this is not good enough. No, this is not good enough, you know. Until eventually it's like, OK, this is actually good now.

Lukas: So is the hardest part the external APIs then? Because that does seem like it'd be really tricky to make that... I mean, that seems like an endless task to make these APIs clear enough and organized.

Jeremy: Well, I never think of them as external APIs. To me, they're always internal APIs.

Lukas: Because you want to make a bigger system.

Jeremy: Yeah, what am I building the rest of the software with? Exactly. And, you know, we went all the way back to like thinking well, how do we even write software? You know, I've always been a huge fan of the idea of literate programming, but never found anything that made it work. And, you know, we've been big proponents of Jupiter Notebook forever. And it was always upsetting to me that I had this Jupiter world that I loved being in and this IDA world which I didn't have the same ability to explore in a documented, reproducible way and incorporate that exploration and explanation into the code, as I wrote. So, yeah, we went all the way back and said, oh, I wonder if there's a way to actually use Jupiter Notebooks to create an integrated system of documentation and code and tests and exploration. It turns out the answer was yes. So, yeah, it's really like just going right back at every point that I kind of felt like I'm less than entirely happy with the way I'm doing something right now, it's like to say, OK, can we fix that? Can we make it better? And Python really helped there, right? Because Python is so hackable you know, the fact that you can actually go into the meta object system and change how type Dispatch works and change how Inheritance works, so like how Type Dispatch system has its own inheritance implementation built into it, it's.. Yeah, it's amazing you can do that.

Lukas: Wow. Why?

Jeremy: Because the Type Dispatch system needs to understand inheritance when it comes to how do I decide if you call a function on types A and B and there's something registered for that function which has some superclass of A and some higher superclass of B and something else with a slightly different combination. How do you decide which one matches, you know? So in the first version of it, I ignored inheritance entirely and it would only dispatch if you had the types exactly matched or one of the types was none. But then later on I added inheritance so now you've got this nice combination of multiple dispatch and inheritance, which is really convenient.

Lukas: Can you give me some examples of how the inheritance works with your types? Because I would think it could get kind of tricky what's even inheriting from what. The types that just quickly come to mind for me, if you have an image of multiple bounding boxes, would that inherit from just a raw image?

Jeremy: Yeah. So generally those kind of things will compose, you know. So I don't think we ever use multiple inheritance. I try to stay away from it, I've always found it a bit hairy. So I said things tend to be a lot more functional. So, you know, a black and white image inherits from image and I think a dichon image, which is a medical image also inherits from image. And then there are transforms with a type signatures which will take an image and then there will be others which will take a dichon image and so if you call something with a dichon image for which there isn't a registered function that takes a dichon image, but there is one that takes an image, you recall the image one. And then we use a well there in certain ways where, you know, they'll be a kind of... We use a lot of dark typing, so there'll be like a cold dart method and dart method can be implemented differently in the various image subclasses. And the other thing you can do with that Type Dispatch system is you can use a couple of types, which means that that function argument can be any of those types. So you can create union types on the fly, which is pretty convenient to.

Lukas: Are there parts in the V2 that you're still not happy with, are we really able to realize the vision of...?

Jeremy: There are still some parts, yeah. Partly that happened because of Covid. Unfortunately, I found myself the face of the global masks movement which didn't leave much room for more interesting things like Deep Learning. So some of the things that we kind of added in towards the end, like some of the stuff around inference is still a little, possibly a little clunky. But it's only some little pieces like I mean, the whole inference is pretty good. For example, I didn't really look at all how things would work without an X. For example some kind of mobile or highly scalable serving. Also, the training loop needs to be a little bit more flexible to handle things like the Huggingface Transformers API makes different assumptions that don't quite fit our assumptions. TPU training, because of the way it runs on this separate machine that you don't have access to, you have to find ways to do things that have exceptionally really high latency. And so for TPU we kind of... It's particularly important because we built a whole new computer vision library that runs on the GPU or runs in PyTorch, which generally is targeting the GPU. And PyTorch has a pretty good GPU launch latency, along with a good Invidia driver. So we can do a lot of stuff on the GPU around transformations and stuff. That all breaks down with TPU because every time you build another thing on the TPU, you have to go through that whole nasty latency. So, yeah, there's a few little things like that that need to be improved.

Lukas: Is it important to you that your library is used widely outside of a learning context? Is it one of your goals to make it widespread in production systems?

Jeremy: Yeah. I mean, because the learning context hopefully goes away eventually. Hopefully there will be no Course and it'll just be software. So if people are only using a software and a learning context, it won't be used at all. We want to use it everywhere or something like it. I mean I don't care whether it's or somebody else comes along and creates something better. We just want to make sure that Deep Learning is accessible, that's super important. The funny thing is because Deep Learning is so new and it kind of appeared so quickly, a lot of the decision makers, even commercially are people that are highly academic and the whole academic ecosystem is really important, much more so than in any other field I've ever been in. So one of the things we need to do is make sure that researchers are using and we're researchers, too, so, we try to make it very researcher-friendly, and that's one of the key focuses really at the moment.

Lukas: I mean, I would think just naively, like making something research-friendly would involve the opposite of making it like a single clean API, or like abstracting away all the details. I would think researchers would want to really tinker with the low level assumptions.

Jeremy: Yeah, well, that's why you need a layered API, because the first thing to realize is it's getting to the point now, or it's at the point where most researchers doing research with deep learning are not deep learning researchers. They're proteomics researchers or genomics researchers or animal husbandry researchers or whatever, you know, or astrophysics...

Lukas: But you have not heard that.

Jeremy: I was the keynote speaker a couple of years ago, at the major international animal husbandry Congress. I got a nice trip to Auckland with the family. That was very pleasant. Infact, Hadlee Wickham's father organized it and he invited me.

Lukas: Well, I'm sorry to cut you off. You're making an interesting point that I interrupted for no reason. [laughs]

Jeremy: I didn't know that you were so ignorant about animal husbandry. Lukas, I'm disgusted, dude. [laughs]

Lukas: I love all the unusual use cases that people raise. It's definitely something I collect but that's... I've not heard that one.

Jeremy: Yeah. Sorry, where were we? We were talking about... Oh yeah. Researchers. So you're doing research into a thing, right? So like, I don't know, maybe it's like you're trying to find a better way to do gradient accumulation for F.P. 16 training. Or maybe you're trying a new activation function or maybe you're trying to find out whether this different way of handling four channel input works well for hyperspectral satellite imagery or whatever. And so the idea is to let you focus on that thing and not all the other things, but then you want all the other things to be done as well as possible because if you do a shitty job of all the other things, then you might say, oh, my activation functions actually really good but then somebody else might notice that, like, oh no, it was just doing a crappy version of data augmentation effectively. So if we add Dropout then your thing doesn't help anymore. So with a layered API, you can use the high level easiest bits with like all the defaults that work nicely together and then you just pick the bit that you want and delve in as deep as you like. So there's kind of really four key layers in an API, so maybe you'll go in and create a new data block or maybe you're go in and create a new transform or maybe you'll go in and create a new callback. So like the thing about is it's actually far more hackable than, say, Keras being tech that I'm very familiar with. So like with Keras, you have this pretty well-defined transformation pipeline or, if you're using that, pretty well defined set of atomic units you can use and if you want to customize them, you're out of luck. It often requires going and getting a new TFUP in C++ or something. So it really helps using PyTorch. They kind of provide these really nice low latency primitives and then we build out everything out of those low latency primitives and we kind of gradually layer the APIs on top of each other and we make sure that they're very well documented all the way down. So you don't get to a point where it's like, oh, you're now on the internal API, Good luck. No, it's all external API and it's all documented and it all has tests and it all has examples, it all has explanations. So you can put your research in at the point that you really need it.

Lukas: I see. I guess when you talk about academics then, or researchers sorry not academics, you're imagining actual machine learning researchers researching on machine learning itself versus an animal husbandry researcher who needs an application of machine learning, speaking to both.

Jeremy: Yeah, both. It's much easier for me to understand the needs of ML researchers because that's what I do and that's who I generally hang out with. But there's a lot of overlap; like I found back in the days when we had conferences that you could go to. You know, as I walked around Neurips, a lot of people would come up to me and say, oh, I just gave this talk. I just gave this poster presentation. And three years ago, I was a Student. Before that, I was a meteorologist or a astrophysicist or neuroscientist or whatever. And I used your course to understand the subject and then I used your software and then I brought in these ideas from astrophysics or neuroscience or whatever. And now here I am presenting them at Neurips. And so there's kind of like this really interesting overlap now between the worlds of ML research and domain expertise in that increasingly domain experts are becoming pretty well loaded and well respected ML researchers as well, because you kind of have to be, you know. Like if you want to do a real kickass job of medical imaging, for instance, there's still a lot of foundational questions you have to answer about how do you actually deal with large 3D volumes? These things are not solved, and so you do have to become a really good deep learning researcher as well.

Lukas: I think one of the things that I always worry about for myself is getting out of data.. Like I remember being in my early 20s and looking at some of the tenured professors that were my age now and thinking, boy, you know, they have just not stayed current in the state of machine learning. And then, you know, I started a company and I realized that I actually wasn't staying up to date myself and, you know, kind of often stuck in, like older techniques that I was more comfortable with languages I was more comfortable with. And I feel like one of the things that you do just phenomenally well from at least from the outside, is staying really current and on top of stuff. I wonder if you have any thoughts on how you do that.

Jeremy: Well, I mean, I got to say, I really admired what you did with moving away from your world of crowdsourcing into into deep learning and I think you took, like, a year or so just to figure it out, right? Not many people do that, you know. And I think a lot of people assume they can't, because if you get to, I don't know, your mid-30s or whatever and you haven't learnt a significant new domain for the last decade, you could easily believe that you're not capable of doing so. So I think you have to do what you do, which is just to decide to do it. I mean, for me, I took a rather extreme decision when I was 18, which was to make sure I spent half of every day learning or practicing something new for the rest of my life, which I've stuck to certainly on average. Nowadays it's more like 80 percent. I mean, it's weird. My brain still tells me I won't be able to understand this new thing because I start reading something that I don't understand straight away and my brain is like, OK, this is too hard for you and so you kind of have to push through that. But for me, I kind of had this realization as a teenager that learning new skills is this high leverage activity, and so I hypothesized that if you keep doing it for your whole life, like I noticed, nobody did, like nobody I knew did. And I thought, well, if you did, wouldn't you get this exponential returns? And so I thought I should try to do that. So that's kind of been my approach.

Lukas: How you reasoned your way into that space, that's amazing. Is you have to fight your immediate instincts to do that? Or is it a pleasure?

Jeremy: My instincts are fine now. What I do have to do is to fight, but not anymore. Not now that I work with my wife. And, you know, I'm working with Sylvia who's super understanding and understood me in a similar but for daily on my working life fighting or at least dealing with the people around me. Because if somebody is like, particularly when you're the boss, and you're like, OK, we urgently need to do X, and somebody can clearly say that, why the fuck are you using Julia for the first time to use X? We don't even know Julia. You could have had it done already if you just used Pearl or Python or some shit that you already knew as like, Well, you know, I just wanted to learn Julia. So, yeah, it drives people around me crazy that I'm working with because everybody's busy and it's hard to, in the moment, appreciate that. Like, OK, this moment isn't actually more important than every other moment for the rest of your life and so if you don't spend time now getting better at your skills than the rest of your life, you're going to be a little bit slower and a little bit less capable and a little bit less knowledgeable. So that's the hard bit.

Lukas: It also sounds to me like just from the examples that you've given, that you have a real bias to learning by doing. Is that right? Like, do you also kind of read papers and synthesize that in a different way?

Jeremy: Yeah, if I read a paper, I already rate it until I get to the point where I decide it's something I want to implement or not, or that there's some idea that I want to take away from it to implement. So I find doing things... I don't know, I'm a very intuitive person, so I find doing things and experimenting a lot, I get a sense of how things kind of fit together. I really like the way Richard Feyman talked about his research and his understanding of papers was that he always thinks about a physical analogy every time he reads a paper and he doesn't go any further on a paper until he has a physical analogy in mind. And then he always found that he could spot the errors in papers straight away by recognizing that the physical analogy would break down. So I'm kind of a bit like that. I'm always looking for the context and the understanding of what it's for and then try to implement it.

Lukas: I see, so should we expect the next version of to be in a new language? Have you thought about moving away from Python?

Lukas: Oh, I mean, obviously I have, because I looked at Swift and sadly, you know, Chris Lattner left Google so I don't know... If they've got some good folks still there, maybe they'll make something great of it, but you know, I tend to kind of follow people. People who have been successful many times and Chris is one of those people, so, yeah, I mean, what's next? I don't know. Like it's certainly... Like Python is not the future of machine learning. It can't be. You know, it's so nicely hackable, but it's so frustrating to work with a language where you can't do anything fast enough unless you call out to some external code or C code, and you can't run anything in parallel unless you put in a whole other process. Like I find working with Python, there's just so much overhead in my brain to try to get it to work fast enough. It's obviously fine for a lot of things, but not really in the deep learning world or not really in the machine learning world. So, like, I really hope that Julia is really successful because there's a language with a nicely designed type system and a nicely designed dispatch system and most importantly, it's Julia all the way down so you can get in and write your GPU colonel in Julia, or all the basic stuff is implemented in Júlia all the way down until you hit the LLVM.

Lukas: Sorry this is an embarrassing question. Julia is like Matlab, is that what I should be thinking?

Jeremy: It was designed to be something that Matlab people could use but no, it's more like Common Lisp meets Matlab meets Python.

Lukas: That's a little bit like R, maybe.

Jeremy: R had some nice ideas, but the R object system; a.) there's too many of them, b) they're all such a hack, and then c) because it's so dynamic, it's very slow. So again, you have to implement everything in something that's not R and R just becomes a glue language on top of it. I mean, I spent so, so many years writing R and then suddenly you get what you came in for but I never enjoyed it. So Julia is a compiled language and it's got a rich type system and it's entirely based on function dispatch using the type system. It's got a very strong kind of metaprogramming approach. So that's why you can write your CUDA kernel and Julia, for example. It's got an autograd again that's written in Julia. It's got a lot of nice features, but unfortunately, it hasn't really got the corporate buy-in yet, so it's highly reliant on a kind of this core group of super smart people that started it and now run Julia computing, which doesn't seem to have a business model, as far as I can tell other than keep getting funding from VCs which works for a while, but at some point it stops.

Lukas: What is the business model? Is their a business model?

Jeremy: The business model is that I take money out of my bank account to pay for things I need and that's about it.

Lukas: Awesome. Well, you know, we always end with two questions, I want to make sure we have time for that to have a little bit of consistency here. And the first one is, when you look at the different topics and machine learning, broadly defined, is there a topic that you think that people should pay a lot more attention to than they generally are paying attention to?

Jeremy: Yes, and I think it's the world of deep learning outside of the area that you're familiar with. So, for example, when I got started in NLP, I was shocked to discover that nobody I spoke to in the world of NLP had any familiarity with the last three or four years of development and computer vision. The idea of like transfer learning, for example, and how incredibly flexible it was. So that's what led to ULM Fit, which in turn led to GPT, which in turn led to GPT2 and before ULM Fit happened, every NLP researcher I spoke to, I said, what do you think about this idea of super massive transfer learning from language models, everybody I spoke to in NLP said that's a stupid idea and everybody I spoke to in computer vision said, yes, of course, I'm sure everybody does that already. So, yeah, I think in general, people are way too specialized in deep learning and there's a lot of good ideas in other parts of it.

Lukas: Interesting, cool. And then our final question we always ask and I kind of wonder, you have an interesting perspective on this. You know, typically we're talking to people that are trying to use machine learning models for some purpose, like animal husbandry. But you've sort of seen this wide range of applications. When you look across the things that you've seen go from like ideation to deployed things that's working and useful, where do you see the biggest bottleneck?

Jeremy: I mean, the projects I've been involved in throughout my life around machine learning have always been successfully deployed, you know, so I kind of get frustrated with all these people who tell me that machine learning is just this abstract thing that no one's actually using. I think a big part of the problem is there's  people that understand Business and Logistics and Process Management and there's people that understand AI and algorithms and data, and there's not much connectivity between the two. I spent 10 years working as a management consultant so my life was logistics and business processes and HR and all that stuff, you know.

Lukas: It's kind of hard to picture as a management consultant. It comes up as surprising.

Jeremy: I tried to fake it as best as I could for sure. I've noticed a lot of people in the machine learning world really underappreciate the complexity of dealing with constraints and finding opportunities and just aggregating value chains. Or they'll do the opposite, they'll just assume it's so hard that it's impossible without realizing there's large groups of people around the world who spend their lives studying these questions and finding solutions to them. So I think in general, I'd love to see better cross disciplinary teams and more people on the MBA side developing AI skills and more people on the AI side developing an understanding of business and teams.

Lukas: I guess you have a broad view, you know, from your background,  and you've watched these ML projects get deployed in these fields. I guess, like maybe the question is where there points that surprised you with their level of difficulty just to move through it. Like, did you have mishaps where you thought the model was working and then when it was deployed into production, it didn't work as well as you were hoping or thought it would?

Jeremy: No, not at all. I don't know that sounds weird, but it's just, you know, even a small amount of background in doing the actual work that the thing you're building is meant to be integrating with, you know, I spent 10 years, 8 years, working on an insurance pricing business entirely based on operations research and machine learning. But before that the last four or five years of my management consulting career was nearly entirely in insurance. You know, there's not much very surprising that happens. I know the people, I know the processes. And that's why I think I would much rather see if somebody is going to do a paralegal AI business, I'd much rather see a paralegal do it than an AI person do it. Or if they're going to do like, you know, HR recruiting AI business, I'd much rather see someone with an HR recruiting background do it . Like it's super difficult,. Like there's just no way to understand an industry really well without doing that industry for a few years, I think.

Jeremy: Because I know some of these people and I get this question all the time, I'll channel a question that I'm sure is in people's heads watching this. So if you are that paralegal who's starting a paralegal AI-enabled business, how would you do the AI part?

Jeremy: Well, obviously, I would take the courses. I would, I mean, seriously, I would make sure I was good at coding, you know. I'd spend a year working on coding and yeah, I mean, the courses are absolutely designed for you and I would be careful of bringing on so-called AI experts until you've had a go at doing it all yourself, because I found that most people in that situation, for obvious reasons, feel pretty intimidated by the AI world and kind of a bit humbled by it, a bit overwhelmed by it. And they'll bring on a self-described expert, they have no ability to judge the expertise of that person. So they end up bringing somebody who's just good at projecting confidence, which is probably negatively correlated with actual effectiveness. So I'd say do it. Do it yourself. For a year, build the best stuff you can. I do find a lot of alarm with backgrounds, with domain experts are shocked when they then get involved in the world of AI experts and they find it much better at training models that actually predict things correctly than the modeling experts are. I'm sure you've had that experience as somebody who, you know, like me doesn't have a technical background in this area.

Lukas: Yeah. Well, thank you so much. This was super fun and educational for me.

Jeremy: Thank you very much for having me.

My pleasure.

Join our mailing list to get the latest machine learning updates.