Skip to main content

Will Falcon — Making Lightning the Apple of ML

Will explores Lightning's journey from undergrad project to Series B startup
Created on September 13|Last edited on September 19


About this episode

Will Falcon is the CEO and co-founder of Lightning AI, a platform that enables users to quickly build and publish ML models.
In this episode, Will explains how Lightning addresses the challenges of a fragmented AI ecosystem and reveals which framework PyTorch Lightning was originally built upon (hint: not PyTorch!) He also shares lessons he took from his experience serving in the military and offers a recommendation to veterans who want to work in tech.

Connect with Lightning AI:

Listen



Timestamps

00:00 Intro
44:35 Outro

Watch on YouTube




Transcript

Note: Transcriptions are provided by a third-party service, and may contain some inaccuracies. Please submit any corrections to riley@wandb.com. Thank you!

Intro

Will:
Users are always going to tell you incremental things. They're always going to tell you they want this better. They're never going to tell you they want the iPhone. They're always going to tell you, "Can you make my Blackberry keyboard slide out instead," or whatever. Those inputs are going to usually improve the product, but they're not going to help you create a leapfrog product, right?
Lukas:
You're listening to Gradient Dissent, a show about machine learning in the real world. And I'm your host Lukas Biewald.
William Falcon started his career training to be a Navy SEAL before becoming an iOS developer and eventually the CEO of Lightning.AI, which makes PyTorch Lightning, a very successful ML framework and Lightning AI, which is an awesome website that calls itself the OS for machine learning that we're going to talk a lot about today. This is a super fun conversation and I hope you enjoy it.
I thought it might be fun to start with your background. We don't have a lot of people that went through Navy SEAL training on this podcast. So, could you tell us a little bit of your story on how you came to found Lightning?

From SEAL training to FAIR

Will:
Yeah, sure. So, I'm originally from Venezuela. So I don't know if people know that. I'm actually born and raised there. So English is my second language, which is why you'll hear me slip up today in a few things. Code does not care what language you speak, which is great.
I moved here when I was in my teens and then eventually ended up going to the US military and I went through SEAL training BUD/S. I was there for a few years. If anyone knows BUD/S, I was in classes 272 and 277, which is great. And I came out injured, actually, and I basically got stashed in one of the SEAL teams that does a lot of intelligence work. It's a very interesting team. I also happen to speak Arabic from just fun, I guess. And so, there's a lot of cool stuff that we were doing there.
And when it was time for me to go back into training — this is when we pulled out of Iraq in 2012 or 2013 — the Navy gave me an option to leave or become a pilot or something, and I chose to leave. Maybe if I'd seen Top Gun, I would've stayed as a pilot potentially, but it was a great time. And we did a lot of good work there and very happy about the time. I think it really set me up for success for everything I did afterwards. I didn't care about school until I left the military, turns out.
Lukas:
And then how did you get into machine learning?
Will:
I was at Columbia doing my undergrad — around 2013, I want to say — and people started telling me about this "machine learning" thing. I wasn't super into math or any of this stuff back then.
I started my degree as computer science. And for some reason, the CS part was fun, but it wasn't the most interesting part. I really gravitated towards math at some point. And I think if you were doing anything with statistics or math in 2013 and you were touching code, it's like impossible not to run into SVMs and random forest, and all this stuff.
I remember taking my first neural networks class and they were like, "Yeah, you got this image" — and we've all seen this EMNIST thing that Yann [LeCun] put together back in the day with a carousel music — and I was like, I don't know why this is useful. I don't see the value of this. And then many, many years later, I ended up working with Yann as one of my Ph.D. advisors.
So at some point in my undergrad, I went into finance because it was interesting. And I went there to try to use deep learning in the trading floor. And finance today is maybe not so allergic to deep learning anymore, but back then it was because of all the observability problems. So, I didn't love that, and so, I went back to school, I got into computational neuroscience, and that's really where I learned about deep learning and got really into machine learning.
So, really the science is trying to decode neural activity and trying to understand how the brain works. So I still care a lot about that. And a lot of my drive is really the pursuit of science, but I find that a lot of the tools are really limiting to enable science to advance and do what it needs to do.

Stress-testing Lightning

Lukas:
But then what were you seeing when you started Lightning? What was the problem you were setting out to solve in the very beginning of it?
Will:
When I started Lightning, I was still at undergrad — this is around 2015. I was doing my research and I wasn't like building Lightning for Lightning or anything like that. It was just my research code that I had internally. And what I was trying to optimize for was how do I try ideas as quickly as possible without having to rewrite the code over and over again, but in a way that doesn't limit me.
Because, as a researcher, the worst thing you can do is adopt something and spend six months going through research, and then suddenly the last few months you're blocked and you're like, "Oh my God, I have to rewrite everything," and then it discredits all your results. So flexibility was the number one thing that I cared about. So, that's a lot of what I was solving. And over the years, really... I did open source until 2019, so it took about four or five years to get there.
What I did during that time was just try so many different ideas. My first research was neuroscience, and a lot of that was using GANs and VAEs. Then after that, I moved into NLP when I started my Ph.D. [Kyunghyu] Cho is one of the main authors on the seq2seq and attention paper.
So my first thing was to implement attention from scratch and a seq2seq network and all this stuff and learned, which is very rough if you guys have ever tried this. It's not trivial. I know Lukas has implemented this a bunch of times.
Lukas:
I've tried to do it once and I agree with you, it's non-trivial. Maybe it's not quite as daunting as it seems at first. I don't know. I guess there were probably fewer resources when you did it.
Will:
Yeah. I mean, back then, you were writing everything yourself. Nowadays, there's attention heads and all this stuff you can plug in. But there, you're calculating your own stuff. And then PyTorch didn't support certain things. You're blocked, and it was really confusing. So it was rough.
And then we took that and then we started working on complex. So Cho also introduced GRU units. So we started working on complex GRUs, and the idea there was to help eliminate gradients from exploding or zeroing out. And so, complex numbers can help you do that, especially for audio, with some normalization techniques and all that. But complex numbers are not something that PyTorch supported, really, until a year ago.
So little old Ph.D. me, I'm sitting there and I'm like, "Okay. I have to implement this whole complex number library," which I did and it's open source. It's super slow, don't use that. Use the PyTorch one— it's better now. But it's being willing to do what it takes, I guess, to get the thing done.
But through all those learnings, eventually I ended up in computer vision and self-supervised research. I think if you work with Yann, there's no way you don't do self-supervised learning at some point. So I fell into it. This is 2019, I think. Before it blew up. Well, before the world found out about it— people have been doing this for many years.
All of that stress-tested Lightning. And so, it was pretty flexible by the time that it got open source. I knew you could do a lot of this stuff. And then when I joined FAIR, it was a lot of like, "Oh, can we use it for this or that?" I'm like, "Yes, of course you can. Let me show you how." And it just took forever to explain all the possible ways they could use it.
Today, I think it's obvious that it can work for pretty much anything, but it wasn't back then. And we still learn as we go. Sometimes someone finds that it's not flexible for something, and we fix it and we move on. But it's a long process. It's taken a lot of years to get here.

Choosing PyTorch over TensorFlow and other frameworks

Lukas:
So when you go back to 2015, was PyTorch actually in use at that time? It was just Torch, right? I'm trying to remember what years these things came out. But certainly an unusual choice to build on top of PyTorch in 2015, if that's even possible. How did that happen?
Will:
Well, so my original version wasn't on top of PyTorch. So I had actually started with Theano. So basically what happened, I was using Theano and sklearn mostly. So I think I did what everyone does, where they take the model and they add the .fit to it. And then you start building off of that. And so, that was my original version and that was Theano. Have you worked on Theano? I don't know when you started, Lukas.
Lukas:
I think I might have touched Theano, but very little. I think I was using Keras on top of Theano, if that dates me.
Will:
Yeah. No, for sure. I got really annoyed at it. I think it was great to show proofs of concept, for sure. So I started using Keras immediately, and I think that helped me unblock a lot of stuff. But at some point, you end up running into limitations. And I'm sure that's changed, but back then that was true. And so, that happened, and that's when I was like, "Fine. I guess I have to go and get into TensorFlow." I was trying to avoid it.
My first version actually was built on top of TensorFlow. But the second that PyTorch came out, which was a few years later, I rewrote it all in PyTorch— mostly because it just felt more mathematical: I could see the math, it was easier. Whereas in TensorFlow, you had this duplicate layer where it was a metalanguage on top of the thing — which again, that's changed since then, but back then, that's kinda the world we lived in.
So, it was very experimental. Torch back then was very hard to work with. Oh, sorry. It was easy, but installing things like that was really difficult.
Lukas:
That's really interesting. So were you at all inspired by the way Keras did things? Or do you feel like your Lightning was in contrast to parts of Keras? How did you think about that? Because I feel like Lightning plays a similar role to PyTorch as Keras plays to TensorFlow. Do you feel like that's too simple or wrong?
Will:
Yeah. I mean, I think when I first released Lightning and we put it on the Torch thing, I called it "the Keras for PyTorch" because at a high level it looked like it. But it really wasn't. So I may be the cause of this confusion, unfortunately. But like I just said: I used Theano, I used Keras, I used TensorFlow, I used sklearn. So a lot of my inspiration obviously comes from a lot of these things.
Before I got into machine learning though, I was an iPhone developer. I worked on iOS for a long time. And so, a lot of these ideas that people bring in as callbacks and all these things are actually ideas that have been introduced in objective since the 70s, 80s. So, if you work on mobile or web you've been exposed to these ideas.
I would say a lot of my inspiration really was — the API simplicity, like .fit kind of thing — came from most likely a sklearn. And then, I think a lot of the callback and things like that... I was actually very opposed to callbacks. It turns out a lot of the hook names — and even if you see the way I've named things — a lot of them are inspired by Objective-C and these super long names. You told me you started with Objective-C, so I'm sure you know what I'm talking about. But it's like super long syntax names.
Lukas:
I'm a little surprised you like Objective-C. I feel like most people hate it. And I think one of the reasons people tend to hate Objective-C is the verbosity, but it sounds like you see the sense in it.
Will:
Yeah. I mean, the verbosity makes it so I don't have to think about it. I hate when names are so short and you're like, "What do you mean by this?" Objective-C is like, "You did load on this and that and that." You're like, "That makes sense. I read this whole thing."
I think all of them did inspire me. I think something I really liked about Keras was the feedback that you get. The summary tables and all of that— that's inspired by Keras as well. So I would say it's a combination of a lot of things, but I would say most of the things that I've really thought about really are driven in that fundamental like Objective-C worlds and that iOS world. And in fact, if you look at Lightning apps now — the new abstractions that we put into Lightning — a lot of them are similar to that. So, they have a lot more elements of that.
I think over the years things have evolved. But now, I think Lightning's taken its own soul and become its own thing, and it's started to become kind of a tone paradigm that I hope does become a standard in the industry. I hope that it does inspire a lot of other people — especially in their APIs and how they write things — because I do think it works in scale. So I'm not offended if people grab the APIs and do something with them, because it means that at the very least we standardize ML, which is a win for everyone.

Components of the Lightning platform

Lukas:
What's a part of the Lightning API that you feel super proud of that you feel like was different than what was around when you built it?
Will:
I would say the main two things in Lightning are the Lightning module and the trainer. I think those are the two that everyone uses, and together they allow you to abstract most of it away. And so, I think that's really what I'm proud of. I think I'm proud of... the trainer really, I think has changed a lot and it's starting to become a standard across many other things outside of Lightning because it is a good API. I think it's just the simplicity of it. The ability to see what's happening, change things and just see magic happen.
And I would say probably honestly the new stuff that we just released with the Lightning work — Lightning Flow and Lightning app — it's taken us a few years to really think about this and figure out. How do we take those ideas from building models and generalize that to building full end-to-end ML workflows, research workflows, production pipelines— all that stuff? And that's just not an easy thing to do.
We wanted to do it in a way where it felt like Lightning. It has a spirit and the DNA of Lightning, and you feel like you're using Lightning where you're using it. So I'm very proud of that. And that's something that was a team effort.
All of this, by the way, has been a team effort collectively. I think I've seeded some ideas, but there's no way that we would've been here at all without the community and the team here at Lightning specifically.
Lukas:
Yeah. I totally want to talk about the Lightning launch that you just came out with recently. I'm super impressed by what you did there, but I'm curious before we go into that: I remember a moment where I think PyTorch had something called Ignite, I think, that was really similar to Lightning, or at least the PyTorch team thought it was similar to Lightning. I'm kind of curious— you were actually working at Facebook, I think... were you working at Facebook at the same time that Facebook is also making a somewhat competitive piece of software to you? And was that awkward? Did it feel competitive at the time?
Will:
So, two things. One, Ignite is not done by PyTorch and it's not a Facebook product. It is a third-party product where all they're doing is hosting the docs for it. It's not actually built by Facebook or PyTorch. It just seems that way because of the way the docs have been structured. So, that's the first thing.
The second thing is I was a researcher and a student and I was literally trying to build papers, not build software for a machine learning. I wasn't sitting around using tools and looking around at stuff, so I had no idea that they were in around. I had no idea [?] around. The ones I've used are the only ones I literally knew about.
You've been in research, I'm sure there's a ton of stuff that you're like, "Oh, that's cool, but never used to it because I don't care because I'm doing my research."
Lukas:
Totally.
Will:
I think it's a pretty normal thing for researchers to be pretty narrow-focused. And I think it wasn't until it got launched that people like Alfredo [Canziani] and everyone else were like, "Oh my God, it's kind of like this." I was like, "Oh interesting. What is that thing?" And then I look at it, I'm like, "Oh I guess kind of this like this, but it's got its own DNA."
It's not surprising, though. It happens in research. You have people who are parallel working on something because something has happened that unblocks that, so it's going to trigger similar ideas in a lot of people. But when they come up at the end, they're going to be very different things.
My analogy is always like, if you and I are like, "Hey, let's paint the face of a person," and just say, I describe the face. I bet you and I are going to paint it differently even though we're trying to do the same thing.

Launching Lightning from Facebook

Lukas:
I guess what caused you to actually start a company around Lightning? What was that journey like?
Will:
Very interesting, because the first adopter of Lightning was Facebook, and that got us enterprise features very quickly. I mean, I was really annoyed because I was literally trying to do my Ph.D.
We have this thing internally called Workplace where people message each other, and I kept getting pinged by the Facebook team — not at FAIR, the actual people building all the fun stuff. And I didn't check this thing. We tried to exchange emails. I'm not the best at emails, so I hadn't checked this thing literally for four months.
And then my manager came in and was like, "Dude, you have to check Workplace." I was like, "Why?" And then it's these Facebook teams being like, "Hey, we want to use your thing." I'm like, "Dude, it's a Ph.D. project, why would you want to do that?" And they're like, "No, it's okay. We'll help you make it better." I was like, "Fine."
And so, they took it and started working on it, and we've been super tight with the team since then. But then it was crazy because then big companies started using it immediately. It was like someone would submit a PR and they're like, "Hey, can you fix this?" I'm like, "No, I'm not doing," I don't know, "FFT research or whatever you're doing. I don't want to fix that." And they're like, "But I'm a Bloomberg." I'm like, "That's cool. All right. I guess I should help you out."
As a developer, that's the best thing. You're like, "Cool. My stuff is being used for real. That's great." So I think when I had hundreds of these, I was like, "Okay. Well, these people are really struggling with this bigger problem," which is what we just launched, "so let's go ahead and really solve that problem in a meaningful way." But it turned out that you couldn't do it alone, and you needed a ton of money and people and so on. And so that's how we ended up here.
Lukas:
What year was that? 2019?
Will:
Yeah, that was summer of 2019. And then I left Facebook in December 2019. So I started the company January 2020, two months before COVID. Lukas, you've built a few companies, you've been successful and I'm sure you know how hard it is to build during COVID.

Similarities between leadership and research

Lukas:
Well, I mean actually here we are summer 2022. How big is your company?
Will:
We're about 60 people now, all over the world, and I think we've mostly clustered around New York, San Francisco and London, and then we have people everywhere else. I will say one thing that I'm really proud of in the company: again, I'm not from the US, I'm not from Silicon Valley, so I think that that's been the DNA of the company now. We have a ton of people from 20 different countries, and it's amazing because everyone speaks all these languages. It's pretty cool, you feel it's pretty international. So I think for a New York startup, this is great. It's exactly what you want, that melting pot.
Lukas:
That's awesome. What has the experience been like to go from a researcher, seller, developer to suddenly running a really significantly large company? Do you find time to think to write code on your own still?
Will:
Yeah, good question. Maybe I'll ask you this: Don't you feel like building a company is kind of doing research? There are a lot of parallels, no?
Lukas:
I do think there's some parallels, but you go first. Tell me what you think the parallels are.
Will:
So, what are you doing in research? You have a hypotheses, and you're proven wrong most of the time. You've got to just try something quickly and then move on to the next thing and try ideas until you find something that works, and then you dig into it. That's no different than a company.
The difference is you have to do it through people, which is really hard. It's not just a solo person building. I think people forget this. If you want to build anything meaningful, you have to have a team, you cannot do it alone. At this point, I have to tell you, I just said that Lightning took about five years to go live. If I'd been working with this team, it probably could have gotten there in a year. Because it's a lot faster when you have really smart people around you and you're working together. I don't love this notion of the solo "whatever," who did whatever— that doesn't work, guys, I don't do that.
So it's been amazing. You have to build a company through people and that's really hard to do— people management, taking a vision and getting everyone to go towards that same vision where they don't even know what the output's going to look like. That's really hard, because you're asking 60 people to just dismiss disbelief and say, "You know what, fine, we're going for it. And when we get there, we'll see what it is."
You have to trade that off a lot as a leader. And I think honestly spending the first six years in the military, even though I didn't do all the SEAL training that everyone does and didn't become a full SEAL, but the stuff that I did go through — especially leading small teams and training and at the SEAL team — actually did translate really well. It's like, how do you get an aggressive bunch of people to go towards a goal really fast when you have no information and you have limited resources? It's like, perfect.

Lessons from the military

Lukas:
Well, that's really cool. Tell me more about that. I'm really curious. What are some of the things that you learned about leadership in the military that you applied to running your company?
Will:
Yeah. I mean, if you show up to BUD/S as a junior officer... I was 20 when I started SEAL training. I got put in charge of about 300 person class. That's crazy. You have to be accountable for everything — all their gear, where they are — and it's all 18, 19-year-olds. They're all getting in trouble out in town, they're all doing really silly things. You're having to deal with a ton of people issues. And you're 20, you're learning on the job.
And then you show up to your first SEAL team and then you're put in charge of a team, and those guys have been there for 30, 40 years. They're so much better than you in every possible way. So if you show up trying to teach, you feel like, "Hey, I'm here, big, bad boss. I'm going to do whatever," you're done. That's not how it works.
I can't speak for the whole military, but I can say in the SEAL team and special operations you're taught to lead from the front. As an officer, you are supposed to be the fastest runner — or the best swimmer, all of that — because you're always leading from the front. I still carry that here.
So, that's why I'm not coding all the time right now. But I do want the team to be at a specific level, and I can get there because I can push the team. I think it's a lot about that and some mentality that if I'm going through that door, I'm going first and I'm going to be there first always. A lot of those lessons carry over.
There are bunch of civilian terms for this — whatever leadership is called — but that's ingrained in me since I was 20, basically.
Lukas:
That's really interesting. Do you think there's any really striking differences about managing a company of mostly highly technical people distributed around the world that you were surprised by that's different than leading a team of 18 and 19-year-olds?
Will:
Yeah, for sure. In the military, it's very dictatorial. You make a decision, and that's it. There's no question— no one questions it or anything like that. You, of course, take people's input, and everyone has that. But at the end of the day, you say something and it just happens. And there's no second guessing. In the civilian world, oh my God, there's questions and this and that. So, you have to really learn how to live in that world. It's fascinating. I think the few years that I spent in finance were the best middle ground.
And I actually think a lot of veterans have a hard time adjusting to the civilian world probably for this reason, because the way you do things in the military is just so different. So you can't approach people that way, you have to learn the EQ.
Finance is kind of this hybrid super-aggressive ground, but you still have to learn how to talk to people. If any veterans are watching this, I would urge you to go to finance first so you can learn a soft landing and then go into tech. Because in tech, you're dealing with designers and creatives and people are very different there.
Lukas:
That's awesome. Do you think you have any role to play — this is a total aside, I'm just curious if you have any thoughts on this — but sometimes I feel like at least in Silicon Valley there's often a lot of friction between military and tech working together. Do you think about that at all? Do you hope that there's military applications of Lightning, and do you think you can play a translation role? Or how do you think about that?
Will:
Yeah. Look, I think that specifically AI in the military are like... everyone is "autonomous weapons! blah!" That's what everyone jumps to, and yes, that is an extreme use of it, for sure, and that's not a use that I want to support. I don't think any of us want to support that. Especially having been in some situations where it's pretty clear that you don't want to enable more of that.
But I think what people don't understand is that some of these tools can also be used in positive ways. There are ways where you could, for example — I don't know, I don't even want to get into it because people are going to judge all the parts — but there are ways you can use it still in a good way.
Translation, right? You're in the field and you're meeting someone in a new village and you can't speak to them. How do you do that? A lot of what the military has done during the war has been around winning hearts and minds in Afghanistan and Iraq. And that's really making those connections with villagers and trying to understand what's happens and trying to rebuild countries and so on. And I think that a lot of AI could actually facilitate a lot of these things, right?
Casualties. When you have casualties, you need to call something out. Maybe the person can't speak, so translating or something. So there's some great applications of it, but it's like anything.
Like, yes, can the internet be used to find your long lost family? Of course, it can, but can it be used to traffic people? Yes, it can. So what are you going to do, shut it down? You know it's hard. There's not a simple answer.

Scaling PyTorch Lightning to Lightning AI

Lukas:
Alright. So tell me about the new Lightning website. What's the best way to talk about it, Lightning the operating system? I'm curious to know how you conceived of it and how you built it. It's such an impressive launch with some very impressive demos. I'd love to know about the process and your vision here.
Will:
Yeah, for sure. If you go to Lightning.ai today, you're going to see the new homepage for the Lightning community. I think the first thing to note is PyTorch Lightning has grown.
The project is no longer called PyTorch Lightning, it's called Lightning now. Because when it was just PyTorch Lightning, it let you do one thing which is build models. So, that's cool, except that when you build that model, there's a ton of other stuff you have to do around it: you need to wrangle data, and you have feature stores. You need to manage experiments. You need to do a lot of the stuff that you guys are doing: analyze it, understand what's going on.
What we are now enabling the framework to do — the framework is now Lightning — it enables you to build models. You can still do that. But now, when you want to build research workflows or production pipelines, you can now do that within the framework as well in the Lightning way.
And what we really want to do is allow people to stitch together the best tools in class. So we're really thinking about it as the glue for machine learning. So if I want to use Weights & Biases "feature X" with this other thing, I should be able to, right?
So really, I think you should think of us like Apple. We're really introducing the iPhone-equivalent so that people can build apps on there — so they can build their own apps and publish them. But these apps are extremely complex workflows, they're not just demos or something like that. These are actual end-to-end production workflows or research workflows that can run in distributed cloud environments, but they stitch together the best-in-class tools. So Lightning AI today is really the page for where these apps get published. So if you're trying to start a new machine learning project, you can go there, find something similar to what you're working on, run it on your infrastructure very quickly within minutes and then change the code and off you go.
I think some of the things that I'm super excited about — and you and I have chatted a lot about this — is what are some of those integrations we can do with partners? What are some of the great tools that we can enable, for example, from Weights & Biases there so that people can embed into their apps in really cool ways that probably are not possible today, right? And so it's really around that.
I think I'd like to partner with every single framework and every single tool out there to help them shine and really provide the best capabilities of what they have for the community. So, I think that's what we're shooting for.
Lukas:
How long has this been in the works? It seems like a pretty different vision, as I understand it, from PyTorch Lightning. When it first came out, how did you come to it? And was this always on your mind ever since you started the company?
Will:
Yeah, for sure. That was definitely the vision from day one. It's really hard to build up front. You really have to do the work for it, but that's how PyTorch Lightning had already started to do a lot of this. I mean, we were some of the first early partners there. So, when PyTorch Lightning first launched, we have to go back to 2019, I don't know, May, June, whenever it was: you had frameworks that were running.
And if you wanted to watch your experiments or something, it was really hard to do. You had to integrate something. And so, you had TensorBoard, and I think you guys were probably live by then, I assume. And it was like, no one knew about these things because they weren't there, they weren't easy to use. And so, one of the first things we did was... I personally used TensorBoard, so I used it back then and I was like, "Hey. You know what, I don't want to start it out myself. Let me just let this thing do it."
We started integrating that in there and then very quickly your users started coming by and saying, "Hey, can we add Weights & Biases?" and so on. And then we came up with these abstractions and then suddenly people could use it implicitly. And that was amazing because it started to stitch together tools.
So, that vision started back then already. And then if you look at the accelerators... so we wrote this API called Accelerate, which lets you train on different hardware. This is back in summer 2020, and it powers all of Lightning, but that's what it is. It allows you to go between CPUs and GPUs and TPUs. And I think we're the first framework to actually let you do that seamlessly.
PyTorch supported XLA for TPUs and supported GPUs, but you have to rewrite your code over and over again. So we introduced for the first time, the ability to go between GPU and TPU, just like that. And that really changed the game. And so, that's been amazing because that was an integration.
So it started to become a platform back then. And so, for me was, "Okay, how can we do more of this?" Except that in the model, you're very limited to just these kind of things. But when you start talking about feature stores and deployments and all that stuff, you need something a little bit higher level.
Again, I'm lazy and I hate learning new things, so I was like, "Okay, how do we make it just as easy as Lightning, so that if you're not PyTorch Lightning, you already know how to build production systems?" And so that's kind of what we released. And the hard part was getting it to exactly be like Lightning. What is that DNA? How does the user experience feel like?
Lukas:
I'm curious how you think about product development and customer feedback. It felt like you created a lot from your own vision. How much of what you do is informed by your gut, and how much of it is coming from a user saying like, "Hey X, Y, Z, could you make something that does this or this or this"? What's your product development process look like?
Will:
Yeah. So I think I'm probably the worst person to ask this because I don't care what anyone is doing. I legitimately don't. I don't look at what people are doing. I don't care. We're going to do what we're going to do, and we're going to do things that I think are interesting.
We're going to basically form a thesis around something that we want to do, and we'll see the behavior and the users of course. But if you only talk to users... We speak to users all the time, by the way, so it's not about that. We take their feedback in. But users are always going to tell you incremental things. They're always going to tell you they want this better. They're never going to tell you they want the iPhone. They're always going to tell you, "Can you make my Blackberry keyboard slide out instead?" or whatever.
So you have to have just a different mentality there where you take things with a grain of salt. And you do take their inputs, but it's really... those inputs are going to usually improve the product, but they're not going to help you create a leapfrog product.
That's really where, again, I just don't care what people are working on. I'm just going to do what I think should be done for machine learning, and that's what we build next. And sometimes we're wrong and sometimes we're right.

Hiring the right people

Lukas:
Do you think it's important to hire people with a machine learning background to do the kind of work that you do? Or do you look for people with more like an operational or engineering or database background?
Will:
First and foremost, I care that people are creative, driven and interesting in some way. Like, they have interest and they're not just the same cookie cutter persona. That's the first thing.
Then after that, yes, I want you to be good at your thing, whatever your thing is. Now, specifically machine learning, it's nice to have, please, by all means, I hope you know what you're doing with it. If you are on the Lightning team, you'll 1,000% need to know. And every single person on the Lightning team is a Ph.D. or came out of a Ph.D. program, so they're all experts in this stuff.
But everyone else who's around that, I just want you to be really good at your thing. And I don't care how you got that knowledge. I don't care. Remember, I didn't go... Well, I eventually went to fancy schools, but for most of my life I hadn't. And so, I didn't really care about that. So, I think machine learning is not necessarily a deal breaker, it just depends on your particular role. Now, I could be wrong...
Lukas:
How does the Lightning team fit into the broader company team? What's the distinction there?
Will:
So, the Lightning team works on all the open source stuff. And then we have people who work on all the closed source stuff. When you run Lightning apps on your own, you're using all the free stuff. When you run it on the cloud, that's when you use some private proprietary stuff. You can take a Lightning app, you fork the clone, even models and all that stuff, you run them locally. But if you want to run on the cloud, you say [?] cloud.
And then that stuff is now being built by the other people who are not Lightning teams people. And these people are infrastructure people, they're database people, they're from all sorts of walks of life, I guess. And I think that diversity is always better in this world because there's just a lot of unknowns.
And you and I both know this, that ML is evolving, we just don't know what's going to need to be built next. So we have to have a research hat on a little bit.

The future of Lightning

Lukas:
Are there top-of-mind applications that you hope get built on your Lightning platform right away? What are the next things that you're excited about?
Will:
Top-of-mind right now are a few of these key partners that we've been working with for a long time — like you guys — where we want to make the tools just more widely adopted, bring more visibility to them, and have the ability for people to mix and match and more. So it's really about these immediate partners. Some of these include cloud providers, some of these include the hardware makers, and so on. It's people that we've had really good relationships with for a long time. So it's about enabling those tools to work first.
In terms of capabilities, I think that we want to make sure that people have a really good way to do inferencing, for example. So we're partnering with the cloud providers to do that like SageMaker team and so on. And then for people who want to do anything with data, I would love to partner with the Snowflakes and the Databricks of the world to enable these things. And then there's other labeling things that people are starting to do as well.
So, I don't know if you guys are doing anything there, but obviously happy to partner in any of these. I think it's those things that are immediately around the model development part. There's a lot more that we can do, but we really want to focus on this part first.
Lukas:
Would you ever work with frameworks that aren't PyTorch? Do you like a scikit integration or XGBoost or anything like that? Is that within scope?
Will:
Yeah, for sure. It's crazy, people use Lightning for all sorts of stuff, but people have actually run sklearn in Lightning. I don't even know how they did that.
Lukas:
That's awesome.
Will:
I was like, "How are you doing this?"
But,yeah, honestly, I would love to integrate all the frameworks. I'm long PyTorch in general, but I don't have anything against TensorFlow, JAX, Keras or any of these things. I think any partnerships there, we're obviously happy to work with and enable the tools, as well.
Again, I think that we've really evolved from where we were before, to a point where we're saying, "Okay, now that we're able to support a lot more than we could" — before it was just a function of having bandwidth, right — "Now we can support a lot more than we could, we want to do that," and make sure we welcome these partners as well. So yeah, we're happy to work with any framework.
Lukas:
I'm just curious. Why are you long PyTorch over the long-term?
Will:
I think that a lot of these frameworks have converged in functionality, I guess. I haven't gone back and used TensorFlow, and I think it's probably changed quite a bit. We've just done so much work already in PyTorch that I think we're just excited to continue improving that user experience.
I think if Google wanted to partner with these other ones, we'd be happy to do that as well. But I believe that you can't really do everything well, and so, it's a function of having focus as well as a company. And anything in particular in PyTorch, I think it's really become the standard for research and also production nowadays. And I firmly believe that that team has done a really good job at continuing to push the boundaries.
So I think that the energy, the way that the team thinks about things — and how it's approached even doing production workloads and inference — it's just very unique and different. I don't know, I like unique in different thinking, I guess. So I gravitate towards that.
Lukas:
I guess one of the things that I struggle with as we scale our company and our team... we hire all these really creative, smart people that have slightly different points of view and vision, and keeping things aligned and keeping consistency always feels like a lot of work to me. I'm curious how you've dealt with that if that's been an issue for you as you scaled up to 60 people.
Will:
Yeah. I think you always want to take everyone's inputs into account, but you also want to be opinionated, and that's the difference. I think that when everyone just says whatever, and then they'll do whatever they want, then you end up with something that isn't really cohesive. And so, to some extent, you've got to be a little bit of the bad guy and just say, "Hey. You know what, cool, I get it, but we're going to go this way. And that's just the way it is." And it's a lot of these micro decisions that get made.
It's not just me, it's people on the team where I encourage them to be opinionated. And so, it's the same philosophy that we have for Lightning. It's like, "Cool. You don't like subclassing things? Cool. Sounds good. Go use something else. We don't care. This is the way that we think it should be built and that's fine."

Reducing algorithm complexity in self-supervised learning

Lukas:
We always end with two questions and I want to make sure we get to them. So, the second to the last question is, if you had a little more time on your hands, or if you had time to work on something else in ML research broadly, what would it be?
Will:
If I were back to doing just research right now, I'm pretty sure I would've continued on the self-supervised learning routes. I still track that work.
We published a paper about this a year ago, so I'm going to talk about that, but I believe a lot of the things that have been pushed into self-supervised learning — a lot of those advancements — are actually not necessarily being driven by the methods, like negative sample this versus that. I think it's actually being driven by the transforms.
And so, the paper that we published a while back — I would've continued on this line is my answer — the paper that we published a while back showed that we could achieve very similar performance with like SimCLR using a plain VAE without any of the fancy tricks. And actually we removed one of the terms of the elbow loss. And why we could do that is because we took the SimCLR transforms and used them. But then the way that we generated the negative samples was using the transforms, and then you reconstruct the original.
That actually created a really good learning signal. And what that showed me, and showed our group as well, was that it's not about the fancy negative sampling algorithm and whatever thing you're doing with — I don't know, information theory or whatever thing you're coming up with. It's that I think that we're just embedding most of these things into the transforms and the transforms are actually pulling the weight. Which actually is in line with what the data scientists have been saying forever, it's about the data. It is about the data.
So it turns out that we've just pushed all that knowledge into transforms now for images specifically. So, I'm a little bit sad about that, but at a minimum, I think I would probably continue on that route exploring, how can I reduce the complexity of these algorithms? I don't want these tricks. I don't want these weird learning rate schedulers and all this stuff. I want the super simple VAE loss or something super basic that I know why it works and I can pinpoint exactly why it's doing what it's doing.
And I think self-supervised learning has lost its way in that most of these papers are like "brand new paper that does this!" and it's like, "Oh, they changed this one tiny term." And it's like, "Come on guys."

A fragmented ML landscape

Lukas:
Interesting. Well, my last question is when you look at people that are trying to make machine learning work for real stuff, like companies like Facebook or Bloomberg or anyone, and they're going from like, here's an idea of something we wanted to apply machine learning to deployed and working in production, where do you see the biggest bottleneck right now in summer 2022?
Will:
It's like that meme where it's like expectation and reality. I think that's what we see all the time.
Lukas:
Yeah. Why though?
Will:
I think there's a lot of them like where it's just unknown. Like, the thing is so new that you stress test it in a production system and things break and you're like, "Ah, my chatbot is racist," or something. You're like, "Yeah. Well, no one's employed a chatbot before." So, of course, you're going to learn that lesson.
So, there are a lot of new unknowns we're discovering. But I think a lot of it is the explosion of tooling that's out there and the lack of a standard on how to use that tooling together. So, I think that's a lot of what's holding us back today.
I think there are many ways to solve that problem. I think that we're obviously taking a stab at that with the things that we've just introduced. And so, I honestly think that's a big part of it. Now, I believe that that's only a part of it.
I think that the other ones are this fragmentation. Everyone wants you to go from this, to that, to that, to that, and then use this ONNX thing and then with this thing and that. And it's just like, if we just have a standard and everyone worked together, we can actually do well.
I honestly think there's a super unhealthy, weird competitive thing in ML. Like, guys, this is a massive market. There's a ton of people who are going to pay for this thing. It's not about one or the other tool, everyone is using all the tools together. This unhealthy competition thing is actually causing a lot of these problems. I think actually if the community worked together more and we had better communication and collaboration between frameworks and between open source projects and tools like you guys, then things would be a lot easier. Because we'd be speaking to each other and then some random engineer at Facebook doesn't have to waste six months being like, "Man, if they just did this one thing, it could have been so much easier."

Outro

Lukas:
Awesome. Well, I hope we can find some ways to work together.
Will:
Just think of that one. Just think of that person. Just be like, "I will get you your career back. Don't worry. That's the goal."
Lukas:
Alright. If you're listening, we're rooting for you. We'll make it work for you. Alright. Thanks, Will. Real pleasure. Good talk.
Will:
Yeah. Thanks for having me. This is super fun. And by the way, I'm a big fan of everything you guys are doing. So I appreciate everything you've done for the ML community as well.
Lukas:
Awesome, likewise.

Iterate on AI agents and models faster. Try Weights & Biases today.