Clément Delangue, CEO of Hugging Face, on the power of the open source community

Clem explains the virtuous cycles behind the creation and success of Hugging Face, and shares his thoughts on where NLP is heading.
Angelica Pan

Listen on these platforms

Apple Podcasts Spotify Google Podcasts YouTube SoundCloud

Guest Bio

Clément Delangue is co-founder and CEO of Hugging Face, the AI community building the future. Hugging Face started as an open source NLP library and has quickly grown into a commercial product used by over 5,000 companies.

Connect with Clément

Show Notes

Topics Covered

0:00 Sneak peek and intro
0:56 What is Hugging Face?
4:15 The success of Hugging Face Transformers
7:53 Open source and virtuous cycles
10:37 Working with both TensorFlow and PyTorch
13:20 The "Write With Transformer" project
14:36 Transfer learning in NLP
16:43 BERT and DistilBERT
22:33 GPT
26:32 The power of the open source community
29:40 Current applications of NLP
35:15 The Turing Test and conversational AI
41:19 Why speech is an upcoming field within NLP
43:44 The human challenges of machine learning

Links Discussed

Transcript

Note: Transcriptions are provided by a third-party service, and may contain some inaccuracies. Please submit any corrections to angelica@wandb.com. Thank you!
Clem:
I think through the open source model, you can do things a bit differently with kind of the inspiration of open source for infrastructure and database. With companies like Elastic, MongoDB that have shown that you can, as a startup, empower the community in a way and create a thousand times more value than you would by building a proprietary tool, right?
Lukas:
You're listening to Gradient Dissent, a show about machine learning in the real world, and I'm your host Lukas Biewald. Clem Delangue is CEO and Co-Founder of Hugging Face, the maker of Hugging Face Transformers library, which is one of the most, maybe the most exciting libraries in machine learning right now. In making this library, he's had front row seats to all the advances in NLP over the last few years, which has been truly extraordinary. And I'm super excited to learn from him about that. All right, my first question is probably a silly question, because almost anyone watching this or listening to this would know this, but what is Hugging Face?
Clem:
We started Hugging Face a bit more than four and a half years ago, because we've been obsessed with natural language processing. The field of machine learning that applies to text, and we've been lucky to create Hugging Face Transformers on GitHub that became the most popular open source NLP library, that over 5,000 companies are using now to do any sort of NLP, right? Information extraction, right? If you've a text you want to extract information. Platform like Chegg, for example for homework, is using that to extract information from homeworks. And you can do text classification, we have companies like Monzo, for example, that is using us to do customer support emails classification. They receive a customer support email, does it relate to which product team for example, is that urgent, not urgent? To many other NLP tasks like text generation for auto complete. Or really kind of any single NRP tasks that you can think of. And we've been lucky to see adoption not only from companies, but also from scientists which have been using our platform to share their models with the world, test models of other scientists. We have almost 10,000 models that have been shared, and almost 1,000 datasets that have been shared on the platform to kind of help scientists and practitioners build better NLP models, and use that in the product or in their workflows.
Lukas:
And so Hugging Face Transformers is the library that's super well-known, right? And then the platform is a place where you can go to use other people's models, and publish your own models. Do I have that right?
Clem:
Yeah, exactly. With a hybrid approach to building technology. We feel like you need kind of the extensibility of open source, and practicality of, for example, user interfaces, right? We cover really kind of the full range, meaning that if you're a company, you can do everything yourself from our open source, not talk to us, not even go to huggingface.co, do everything from pip install transformers, right? If you want a bit more help, you can use our hub to discover a new model, find a model that works for you, understand these models. To even in a more extreme way, if you're a software engineer, or if you're new to NLP, or even new to machine learning, you can use our training and inference APIs to train and run models. And we're going to host this inference and this training for you to make it very, very simple so that you don't have to become an NLP expert, to take advantage of the latest state of the art NLP models.
Lukas:
That's so cool. I mean, I want to zoom in on Hugging Face Transformers first, because it feels like it might be one of the most popular machine learning libraries of all time. I'm kind of curious what you attribute to that success. When did you start it and what were you thinking, and what did you learn along the way?
Clem:
I mean, it may be, I don't know if it's the biggest machine learning open source. It's definitely the fastest growing, because it's fairly new. We released the first version of it two and a half years ago, which is not a long time ago in the grand scheme of open source, right?
Lukas:
Yeah, for sure.
Clem:
If you look at all the kind of most popular open source, you see that they usually need a very long time of maturation, right? The grand scheme of open source Transformers is very much still a baby, but it grew really, really fast. It really blew up with over 42,000 GitHub stars, over a million pip installs a month. I think we have 800 contributors to Transformers. And the main reason why I think it's successful is, to me because it really bridges the gap between science and production, which is something fairly new and that not a lot of open source and not a lot of companies manage to do. I strongly believe that machine learning compared to, you can call it software engineering 1.0, or software engineering, or computer science, even if computer science as science in the name of it, it's not a science-driven topic, right? If you look at a good software engineers, they don't really read research papers, they don't really follow the science of computer science. Machine learning is very different, it's a science-driven domain, right? It all starts from couple of dozen kick-ass kind of NLP science teams all over the world that are creating new models like, BERT, T5, RoBERTa, all these new models that you've heard from. And I think what we managed to do with transformers is to give these researchers the tool that they like to share their models, to test models of others, to go deep into kind of the internals of the architecture of these models. But at the same time create an easy enough abstraction, so that any NLP practitioner can literally use these models just a few hours after it has been released by the researchers, right? There's some stuff like a magic, some sort of like network effect, or some sort of magic when you bridge the two. We don't understand all the mechanics of it yet, but there's some sort of a network effects for it each time there's a new model released, like the researcher is releasing it within Transformers. People are hearing about it, they're talking about it, they want to use it, they test it in Transformers, they put it in production, it works. So they want to support it more. The scientist is happy that his research is seen, is used, is impactful. And so they want to create more and they want to share more. This kind of like virtuous cycle that I think allowed us to grow much faster than traditional open source. And that kind of struck a chord on the market and on the field of machine learning.
Lukas:
I guess as an entrepreneur, I'm always kind of fascinated by how these virtuous cycles get started. When you go back two and a half years ago, when you're just first starting the Transformers project, what was the problem you were trying to solve, and what inspired you to even make an open source library like this?
Clem:
I could probably give you a kind of like a smart thoughtful-
Lukas:
No, no, I want the real answer, tell me what's actually happening.
Clem:
The real truth is that we didn't think much about it. We've been using open source for a while. We've always felt like in this field, you're always standing on the shoulders of giants of other people on the fields before. We've been used to this culture of when you do science, you publish a research paper for research in machine learning, you even want to publish open source versus in the paper, right? And so since day one at Hugging Face, we've always done a lot of things in the open, sharing in open source. And here for Transformers, it started really simply, with BERT that was released in TensorFlow. And Thomas, our co-founder and chief scientist was like, "Oh, it's in TensorFlow, we need it in PyTorch, right? I think two days after BERT was released, we open-sourced PyTorch BERT. That was literally first name of the repository. And it grew up, people started using it like crazy. And then a few weeks after, I don't remember what model was released. I want to say RoBERTa, but no, RoBERTa was much later. But another model was released maybe with GPT actually, I think it was the first GPT. It was released, and I think same thing, it was really just in TensorFlow, and we were like, "Okay, let's add it." And we felt like, "All right, let's make it so that it's easier for people to try both, because they have different capabilities, good at different things." We started thinking about what kind of abstraction we should build to make it easier, and very much like that, it went organically, and at some point researchers were like, "I'm going to release a new model, can I release it within Transformers?" And we'll say, "Okay, yeah, just do that." And they did that, and then kind of like a snowball, it became bigger and bigger, and brought us to where we are now.
Lukas:
That's a really cool story. I didn't realize that you were trying to port models from TensorFlow to PyTorch. I mean, now you work with both TensorFlow and PyTorch, right?
Clem:
Yeah.
Lukas:
Did you feel at the time, I guess, a preference for PyTorch, or why was it important to you two and half years ago to move something to PyTorch?
Clem:
I think the user base was different, right? We've always been passionate about democratization or making something a bit obscure, a bit niche, making it available to more people. We feel like that's how you get the real power of technology, is when you take something that is in the hands of just a few happy few, and you make it available for more people. That was mainly our goal, there are like 10 people who are using TensorFlow, there are people who are using PyTorch. We wanted to make it available to people using PyTorch. We were using PyTorch ourselves extensively. We think it's like an amazing framework, so we were happy to make it more available. The funny thing is that, as we got more and more popular at some point, we've seen the other movement in the sense that people were saying... At some point we were actually named PyTorch Transformers, and we starting having a lot of people working in TensorFlow was like, "Guys, it's so unfair, why can I just use Transformers if I'm using PyTorch." And so that's when we extended to TensorFlow, and dropped PyTorch Transformers, dropped the PyTorch in the name, and became Transformers to support both. It's been super interesting, because if you look at our integration of PyTorch and TensorFlow, it's more comprehensive, it's more complete than just having half of it that is PyTorch and half of it that is TensorFlow. You can actually kind of on the same workflow in a way on your same kind of machine learning workflow, you can do part of it in PyTorch. For example, when you want to do more like the architecture side of it, PyTorch is really, really strong, but when you want to do kind of serving, TensorFlow is integrated with a lot of tools that is heavily used in the industry. In the same workflow, you can start building your model in PyTorch, and then use it in TensorFlow within the library. Which we think is pretty cool, because it allows you to take advantage a little bit of the strengths and weaknesses of both frameworks.
Lukas:
Do you get a chance to use your own software anymore, do you build Hugging Face applications ever at this point, or you're just making these kind of tools for other people?
Clem:
Yeah, we play with them a lot. I think one of our most popular demo ever was something called Write With Transformer, which was some sort of kind of text editor powered by some of the popular models of Transformers that got I think, something over 1,000 books, the equivalent of 1,000 books have been like written with it. It's some sort of like what you have in your Gmail to complete, but except much more silly and creative. It works really well when you have kind of the syndrome of the... Can you say that English? Syndrome of the white page when you don't know what to write.
Lukas:
Oh, yeah. I don't think we say it like that, but I understand the experience.
Clem:
In French we say "syndrome de la feuille blanche", when you want to write but you don't know what to write about. It's helping you being more creative by suggesting long, interesting texts to it.
Lukas:
That's really cool. I wanted to ask you, I feel like you have a really interesting lens on all the different architectures for NLP. I guess, are you able to know kind of what the most popular architectures are? Have you seen change in that over the last two and a half years?
Clem:
Yeah, we do. Can see kind of the download, kind of, volumes of models. It's interesting to see, especially when new models are coming up to see if they're successful or not, how many kind of people using... Something that's been super interesting to us is that actually the number one downloaded model on the hub is DistilBERT, right? Models that we distilled from BERT. But there's also a lot of variety in terms of usage of models. Especially I felt like over the years they became in a way a bit more specialized, right? Even if they're still kind of general pre-trained language models. I feel like more and more, as each new model came with some sort of an optimization that made it perform better. Is it on short or longer texts, on generation tasks versus classification tasks, multi-language versus mono-language? You start to see more and more diversity based on what people want to do with it. And what kind of strengths and weakness do they value the most, right? A little bit like what I was talking about between PyTorch and TensorFlow. People are trying to not so much decide which modeling is the best, which is kind of silly in my opinion, but which model is the best for which task, for which context, and then pick the right tool for the task.
Lukas:
I guess, for someone listening to this who doesn't have an NLP background, could you explain what BERT is, and just what it does, and maybe how DistilBERT differs from them?
Clem:
The whole kind of revolution in NLP started with seminal paper called "Attention Is All You Need", right? Which was introducing this new architecture for NLP models based on transfer learning. BERT was the first kind of most popular of these new generation of models. And the way they work is, in a simplistic way without getting too technical, is that you pre-train a model on a lot of texts on one specific task. For BERT, for example, it's mask fitting and you give it sentences, you remove a word in the middle of the sentence, for example, and then you train the model on predicting this missing words, right? And then you do that in a very large corpus of texts, usually slice of the web, right? And then you get a pre-trained model that has some kind of understanding of texts that you can then fine-tune. Hence the name Transfer Learning, because you can go from one kind of pre-training task to other fine-tuning tasks. You can fine-tune this model, for example, on classification, right? By giving it a couple of thousands of examples of a text and classification for customer support emails that I was talking about, "classification - urgent and not urgent", right? And after that, the model is surprisingly good at classifying a new text that you give it based on urgency. And it's going to tell you, this message there's 90% chance, it's urgent based on what I've learned in the pre-training and in the fine-tuning.
Lukas:
For example, with BERT, I guess, you have a model that can fill in missing words. How do you actually turn that into a model that, let's say, classifies customer support messages?
Clem:
With fine-tuning, you fine-tune by adding a layer, you fine-tune this model to perform on your specific task. And that's more kind of long-term way. I think that's very interesting way of doing machine learning, because intuitively you almost feel like it's the right way to do machine learning, in the sense that what we've seen in past with machine learning and especially for startups, a lot of them have kind of sold this dream of doing machine learning, and doing some sort of data network effects on machine learning, right? Because there's this assumption that you're going to give more data to the model, and it's going to perform better. And I think that's true, but the challenge has always been that you have more data, and so your model performed incrementally better, but only on what you're able to do already, right? If you're doing time series prediction, maybe you have 1 billion data points, right? And your model performs at 90% accuracy, you add maybe 9 billion, 10 billion, additional data points, and your model is going to perform at 90.5% accuracy, right? That's great. I mean, that's good improvement, that's something you need, but it doesn't give the kind of increased performance that you're really expecting from a typical network effect, in the sense that it doesn't make your result 100X, 10X, 100X better than without it. With transfer learning, it's a bit different because you're not only kind of improving incrementally the accuracy on one task, you give it more ability to solve also tasks. So you actually not only increase the accuracy, but you increase the capabilities of what your model is able to do. I won't go into kind of the crazy Musk-type kind of prediction. But if you take actually Elon Musk, kind of OpenAI founding story, where he's saying like, "We need to bring the whole community together to contribute to something open source for everyone", intuitively you could think that could come with actually transfer learning in the sense that you could envision a world where every single company is contributing with their datasets, with their compute, with their weights, the machine learning model weights, to build these giant kind of open source models that would be able to do 100X more things than what each of these companies could do alone. I don't know if we're going to get there in the foreseeable future, but I feel like that's in terms of concepts, that's something interesting to look at when you think about transfer learning, as opposed to the other techniques of machine learning.
Lukas:
I guess, did you have a feeling about OpenAI, not releasing the weights for the early GPT models? Or I guess, any of the GPT models.
Clem:
Yeah. GPT, GPT-2, I think a couple of versions in between were open source, right? And it's in Tranformers, and we have a lot of companies using them. Probably more companies using GPT-2 through Transformers than GPG-3 today. They're private companies, so I totally respect their strategy not to open source the models that they built. They've done an amazing job with GPT-3, it's a great model for everything when you want to do text generation, it's really useful. I'm really thankful for all the work they've done democratizing the capabilities of NLP. As our goal is to democratize NLP, I feel like what they've done promoting it into more like of the startup community in a way. A lot of people realize too, if with the communication that you could do so much than what we've been doing so far with NLP, which is great. I think it participated to the development of the ecosystem and putting kind of NLP in the spotlights, which has been really great. And we see a lot of companies starting to use GPT-3, and then obviously it's expensive, it's not really extensible. You can't really update it for your own use case. It's hard to build some sort of technological competitive advantage when you build on top of an API proprietary, or API from someone else. We see a lot of companies using GPT-3, and then discovering NLP, and then coming to our tools. And the same way happens, I'm sure, the other way around. Some people start with our tools, are open source, and then they decide to kind of use something a bit more off the shel like GPT-3, or Google NLP services, or AWS Comprehend. Providing an API for NLP has been around from these companies too. I think everyone is part of the same ecosystem that is growing, so that's super exciting.
Lukas:
Do you feel like there's a difference in the GPT approach versus the BERT approach that you were talking about? I mean, GPT has been very high-profile, and the text generation is really impressive. Do you feel like OpenAI is doing something kind of fundamentally different there?
Clem:
Yes. They are both Transformer models, right? They're kind of same technique, with slightly different architectures, right? For example, when BERT is doing mask filling, GPT is doing language modeling. So, next word prediction, so it's a bit different, that's why the text generation capabilities are so much stronger. It has its limitations too, for example, if you want to do classification, you shouldn't do it with GPT, it doesn't make sense at all. They solve different use cases with kind of slight variations of the architecture. We've had people reproducing GPT, I mean, we've had GPT-2 and a team called Eleuther, I don't even know how to pronounce it, but released GPT-Neo a few days ago, which has the same architecture as a GPT-3, just with less weights for the moment, but they intend to kind of grow the weights. I think the size of their model is the equivalent of the smaller GPT-3 that OpenAI is providing through an API today. And it works well, it's interesting to see the power of the open source community. I think one of my fundamental conviction is that on a field like NLP or machine learning in general, the worst position to be in is to compete with the whole science and open source fields.
Lukas:
Sure.
Clem:
Just because I've been in this position before, actually the first startup I worked for, we were doing machine learning for computer vision back in Paris. I'm French, obviously, as you can hear from my accent. But competing against the science fields and the open source fields on such a fast moving topic is a difficult position to be in, because I think you have 100s of research labs at larger organizations, or at universities, that are not so much, kind of, potentially, each one better than what you can do at the startup, but just there are so many of them that when you can do just one iteration, you have 100 out there doing one iteration too. You can outpace them, and be the state-of-the-art for a few days, then someone who started just a few days after you is catching up, and then you're not kind of ahead anymore. We've taken a very different approach by instead of trying to compete, I think, with open source and with the science field, we're trying more to empower it in a way. And I think through the open source model, you can do things a bit differently with kind of the inspiration of open source for infrastructure and database, with companies like Elastic, MongoDB, that have shown that you can, as a startup, empower the community in a way, and create a thousand times more value than you would by building a proprietary tool, right? And that you don't have to capture 100% of the value that you create, right? That you can be okay creating immense value and just capturing 1% of it to monetize to make your company sustainable. And that can still kind of make a large public company like in the case of MongoDB for that. Both has kind of like this open source core, but at the same time can grow an organization and be sustainable. And I don't see why it should be different for machine learning. We haven't seen a lot of large open source machine learning companies yet. For me it's more a matter of how early the technology is. It's too early to have large open source machine learning companies, because I mean, five years ago, nobody was using machine learning, but it's going to come. I think I would wouldn't be surprised in five, ten years, you'd have kind of one, two, three, four, five, ten massive open source machine learning companies.
Lukas:
I guess, you've had really front row seats to the cutting edge of NLP over the last couple of years. Do you feel like the applications have changed with these models getting more powerful and useful? Are there things you see people doing now that you wouldn't have seen people doing three years ago?
Clem:
Yeah, honestly, I think out of the 5,000 companies that are using Transformers, I mean, the vast majority, I mean, it's hard to tell, but we see a lot of them that are using Transformers in production. And I would say that most of them weren't using NLP in production five years ago, right? A lot of these are new use cases, that either were impossible before, so the companies were just not doing it, or really were performed by humans, right? Moderation, for example, is a good example of that. Customer support classification as I was saying, it's replacing kind of a very manual process. Auto-complete is really, really big in Gmail. It's been my biggest productivity enhancement, I feel like in the past few months is using Gmail to complete basically write just half of my emails. Now, most of the search engines are mostly powered by NLP, by Transformer models. I know Google now is saying that most of their queries are powered by Transformers. Arguably it's like the most popular consumer product out there. I think it's changing so many products, the way products are built. I'm really [interested]...and that's why also seeing GPT-3 kind of promoting NLP into the startup world is super interesting. I think it's very game changer when you have companies starting, building products from scratch, leveraging NLP. Because I think you build differently, right? When you start kind of building legal...you can think of basically every company today. It's really fun to think, "What if these companies started today with today's NLP capabilities?" And you'll see that you have so many ideas for them to do things differently. You take like DocuSign, right? What if DocuSign with kind of analysis of documents starting today with NLP. You think Twitter-
Lukas:
Wait, wait, tell me about DocuSign. Because what I do with DocuSign is I get like a message, and then I click sign, and then I sign the thing. What would be different about DocuSign if it started with all the technology available today?
Clem:
I don't know. It would give you so much analysis of the... There would be a "too long; didn't read".
Lukas:
For the contract?
Clem:
Yes, for the contract. Instead of having to read five different pages, five-page long documents, you would have an automatically generated summary of the document-
Lukas:
I see, I see.
Clem:
-with highlights in green or reds. The interesting part in the documents, like when you see oh, there's a big kind of like money shot, that's why they define how much money you're going to make.
Lukas:
Yeah, right.
Clem:
Big green flashing lights, be careful about... Or when there's a small star that says "Everything that we wrote before is completely...not...it doesn't work in that case", the small kind of conditions would put big red flashing light, "Be careful, they're trying to screw you here."
Lukas:
I love it.
Clem:
Things like that-
Lukas:
That was so fun. Tell me about if Twitter started with this technology available.
Clem:
What could Twitter do? First it would do the feed completely different, right? It would not show you tweets because they're popular or tweets because they're, I mean, not popular I would say, controversial. But it would show you tweets that you would relate to, tweets that you would be interested in based on what tweets you tweeted before. Hopefully it will be able to moderate things, it would be better avoid more biases, avoid more kind of violence, inappropriate, racism, and bad behaviors. What else could it be? I would have wanted obviously an edit button, but I don't know if NLP would help with that.
Lukas:
A what button?
Clem:
No. This famous thing that for ages everyone asked for, everyone has been asking for an edit button on-
Lukas:
Oh, edit button, no, yeah, right, right.
Clem:
But it wouldn't be an NLP-powered, let's say I just started today, I would add that. What else? Do you have any idea of what they would do differently with NLP today?
Lukas:
Well, honestly, I don't know how you feel about this, but when I look at the text generation technology, the NLP technology, and that was the field I actually started in 15 years ago and more. And I almost feel like the thing that's intriguing is the lack of applications, for how amazing the technology seems to me. I remember the Turing test, was this thing of if you could converse with the... I forgot exactly the framing, but it's like converse with computer for 10 minutes, and you can't tell if it's a human, maybe we have like AGI at that point. That seems so impossible, and now it seems like we'll pass it sometime soon. I mean, there's variants of it, but I feel more and more like it's probably computers could trick me into thinking that I'm talking to a person with just GPT-3 or another text generation model. But I actually feel like I don't engage with totally new NLP applications yet. And I kind of wonder why that is.
Clem:
I mean, I wouldn't agree with you. I think that usage of it is really everywhere right now. I mean, there are not a lot of products that don't start to use some NLP, right?
Lukas:
Maybe it's just more subtle than I would-
Clem:
Yeah, maybe. It's less in-your-face in the sense that it hasn't been this big kind of conversational AI interfaces that took over in a way, right? For a very long time, it was kind of like a most popular, and kind of mainstream, face in a way of NLP, right? People think NLP for Siri, Alexa in a way. And that's true is that we haven't seen that picking up, right? Chatbots haven't proved to be very good yet, and we're not there yet in the capabilities in really kind of solving real problems. But I think it became adopted in a way more sober way, in a way more kind of incremental way compared to its existing use cases. You're probably using Google everyday, and that's true that maybe you don't see much of the difference between the search results before and now. But the reality is that, it's the most mainstream, most used product of all rimw that most of the people using every day and it's powered by modern NLP, it's powered by Transformers. But it's not as kind of...maybe the word is "groundbreaking" in terms of experience changes as you could have expected, right? I think one of the challenges of NLP is that because language has been so much of a human topic for so long in a way, it carries all this kind of association with AI, right? And kind of AGI, and kind of almost this machine intelligence. And obviously if you look at all the sci-fi with "Her", you associate that a little bit with NLP, and that's kind of what you could have expected from NLP. The reality has been more kind of productivity improvements behind the scene that you don't really feel or see that much as a user, it's true.
Lukas:
Are you optimistic about chat interfaces?
Clem:
I am. I think what most of us got wrong...I mean, we started by building an AI friend, or a fun conversational AI with Hugging Face. When we started Hugging Face, they were saying we were obsessed with NLP, and we were like, "Okay, what's the most challenging problem today?" Open domain, conversational AI, building this kind of AI that can chat about everything, about the last sports game, about your last kind of relationship, and really talk about everything. We were like, "That's the most difficult thing, we're going to do that." And it didn't work out. I think what we got wrong and what most people are getting wrong is probably the timing in a way. In the sense that conversation, and especially open domain conversation, the way we're doing it now is extremely hard. It's almost kind of like the ultimate NLP task, because you need to be able to do so many NLP tasks together at the same time, ranking them. I need to be able, when you're talking to me, to extract information, to understand, classify your intents, classify the meaning of your sentence, understand the emotion of it, right? If your tone is changing, then it means different things. I think we're going to get to better conversational AI ultimately, I don't know if it's in five years, if it's in 10 years, if it's longer. But I think we're going to get there. It's already solving some kind of more vertical problems with sometimes customer support chat-bots. I think Rasa in the open source community is doing a really great job with that. I think we won't get tomorrow to the AI who you can chat with about everything, and kind of what we started Hugging Face with. But ultimately I think we'll get there, and that's when in terms of user experience, you're going to realize it's different at that time, but it's probably going to take much more time than what we are expecting.
Lukas:
Cool. Well, we always end with two questions, I'd love to get those in the last couple of minutes we have. We always ask, what's an underrated topic in machine learning? Or maybe in your case, what's an underrated topic in NLP, something that you might work on if you didn't have a day job?
Clem:
That's a good question, I mean, something that I've been super excited about in the past few weeks is the field of speech. Speech to text, text to speech, because I feel like it's been a little bit like NLP a few years ago, it's been kind of relegated as some sort of little bit boring field with not so many people working in it. And I feel like thanks to a couple of research teams, especially the team of Alexis Conneau at FAIR with wav2vec, you're starting to see new advances actually leveraging Transformer models, that are bringing kind of new capabilities. I'm pretty excited about it, I think there's going to be some sort of a resurgence of it and kind of leapfrog in terms of quality. Not only in English, but what's interesting is that it's also in other languages. We hosted a few weeks ago community sprints at Hugging Face, with over 300 participants who contributed models, speech to text, for almost a 100 low resource languages. And so it's been pretty cool to see the response of the community. I think there's going to be more things happening in the coming months in speech, which is going to unlock new use cases. Because if you think that you can combine speech with NLP, you can start to do really cool stuff. We were talking about what if like the product is built today, if Zoom was built today with good speech to text and NLP, you can do pretty cool stuff too. I'm seeing something cheery, it should be automatic clapping, because otherwise everyone is kind of [muted]. That's the problem with the current Zoom is that, with everyone muted, when I say something to cheer, I'm the only one cheering. Or when you say "Hoorah!", there should be kind of emoji showers, celebratory emojis, or things like that. I'm excited for speech. If you haven't checked the fields lately, you should definitely check it, there are cool things happening.
Lukas:
Very cool. And the final question, and I feel like you're in a unique place to see this, is what's the hardest part, or what's some unexpected challenges, in just getting a model from kind of thinking about it, to deploy it into production. And I guess you have a unique point of view here where you actually, you have a platform that makes it super easy. Are there still challenges when folks use your stuff? Is there more to do or does it work out of the box?
Clem:
There are still a lot of human challenges to it, I think, in the sense that machine learning model is doing different things in a different way than traditional software engineering. And for a lot of companies it's really, really hard to make the transition. For example, the lack of explainability, the fact that it's harder to predict the outcomes of these models, and kind of tweak them in a way. It's still really hard to understand and adopt for people who have spent a career in software engineering when you can really define the outcome that you want to get. I think from what I'm seeing a lot of the time like human and understanding of machine learning part is the most difficult thing, more than the technical aspect to it. On the technical part, I mean, we've been excited to bring on larger and larger models, which are still difficult to run in production. We've been working a lot with the cloud providers, we announced a strategic partnership with AWS not so long ago, but we're still working heavily with Google Cloud, Azure, and another cloud providers. But bringing these large language models in production, especially at scale, requires a little bit of skills, and require some work. You can get there. I think Coinbase [Ed: Clem meant "Roblox"] has a good article, and a good blog post on how they use one of our, I think it was DistilBERT from Transformers, on over a billion inferences an hour, I think if I'm not mistaken. But it's still a challenge and still requires a lot of infrastructure work to it.
Lukas:
Awesome. Well, thanks for your time. It was a real pleasure to talk to you.
Clem:
Thanks Lukas.
Lukas:
Thanks for listening to another episode of Gradient Dissent. Doing these interviews are a lot of fun and it's especially fun for me when I can actually hear from the people that are listening to these episodes. If you wouldn't mind leaving a comment, and telling me what you think, or starting a conversation, that would make me inspired to do more of these episodes. And also if you wouldn't mind liking and subscribing, I'd appreciate that a lot.