Chris Albon — ML Models and Infrastructure at Wikimedia

Chris talks about machine learning at Wikimedia, from which models they're currently running to where their deployment infrastructure is heading.
Angelica Pan

About this episode

In this episode we're joined by Chris Albon, Director of Machine Learning at the Wikimedia Foundation.
Lukas and Chris talk about Wikimedia's approach to content moderation, what it's like to work in a place so transparent that even internal chats are public, how Wikimedia uses machine learning (spoiler: they do a lot of models to help editors), and why they're switching to Kubeflow and Docker. Chris also shares how his focus on outcomes has shaped his career and his approach to technical interviews.

Connect with Chris:

Listen

Apple Podcasts Spotify Google Podcasts YouTube SoundCloud

Timestamps

0:00 Intro
1:08 How Wikimedia approaches moderation
9:55 Working in the open and embracing humility
16:08 Going down Wikipedia rabbit holes
20:03 How Wikimedia uses machine learning
27:38 Wikimedia's ML infrastructure
42:56 How Chris got into machine learning
46:43 Machine Learning Flashcards and technical interviews
52:10 Low-power models and MLOps
55:58 Outro

Watch on YouTube

Transcript

Note: Transcriptions are provided by a third-party service, and may contain some inaccuracies. Please submit any corrections to angelica@wandb.com. Thank you

Intro

Chris:
When you have small teams the value of ML is you could start to really scale things out because you start to use machines as the assistant to you, right? So you train something manually and then you send it out in the world, and then it does that at scale for you, which is like a superpower. And so I just started going farther and farther down the path of saying, "Hey, we can make this team of 4 people behave like a team of 50 people if we start to use ML more and more."
Lukas:
You're listening to Gradient Dissent, a show about machine learning in the real world. And I'm your host, Lukas Biewald. Chris Albon is the director of machine learning at the Wikimedia Foundation. Before that, he had a number of really interesting jobs, director of data science at Devoted Health, director of data science at Ushahidi, which is an open source non-profit that did mapping, and a project director at FrontlineSMS. He's also a well known educator on machine learning, the author of Machine Learning Flashcards, Machine Learning with Python Cookbook, and several fantastic machine learning tutorials. I'm super excited to talk to him today.

How Wikimedia approaches moderation

Lukas:
Maybe we'll jump into...there's kind of a theme around, I guess, moderation and truth and security that I'm sure you think about a lot. One question we got from Twitter was basically, someone was wondering if Wikipedia has experimented with tools for moderators or kind of tools for educating disputes. I have to say, I've seen a lot of fighting in the comments at Wikipedia pages, and I'm kind of always impressed that they resolve, but are there special tools or algorithms that you all use?
Chris:
No, and I mean, that's really sort of foundational to Wikipedia, that it is...at the end of the day, it is a human project of humans deciding, like trying to get the truth. What is the perspective? What's that neutral perspective between the parties? I have to say that after joining the foundation, the new thing that I do is I care less about the Wikipedia page, I really care about what's called the talk page. Every page of Wikipedia has like a separate comment page, where people are constantly discussing over and over again, discussing, debating, finding new information, going back and forth. With things around disinformation, we've definitely been exploring areas of, for example, sock puppet protection — sock puppets, if you have like lots of accounts — and trying to build models that help predict that. Around dispute resolution...at the end of the day, if you see something on Wikipedia, we really want you to think, "Okay, cool, at the end of the day a human has decided this, a human has made this kind of decision." And so things like algorithmically making those decisions for people, that stuff is an anathema to everything of...which is why frankly, people love it so much, right? And which is why doing machine learning in that environment is so interesting, because you are trying to do things at scale with a human in the loop. You just have thousands and thousands and thousands of humans who are willing to help you out.
Lukas:
Totally. Another question along the same lines is, someone was asking, what are some of the most contentious Wikipedia articles and does your team ever get involved to resolve edit wars in any way?
Chris:
There's a lot of contentious pages across many languages. The interesting thing that I think people don't realize a lot is that I work for the Wikimedia Foundation. We are the non-profit organization that helps keep the infrastructure up. We fight legal battles for different Wikipedia communities, but each individual language of Wikipedia manages their own show, with their own rules based on a common set of norms across all Wikipedias, but it is their own show. English Wikipedia has an incredibly elaborate system of dispute resolution, different levels of user access between the admins. There is a full volunteer organization in English Wikipedia that is managing those kind of things, and that's the same with other languages. So for us at the foundation, it is very critical that we actually don't get involved and jump in because our role is sort of the folks who are one step back where like, we'll make the site better for you, we'll make your experience better, we'll recommend things that we think are interesting, we'll highlight, we'll help you work faster as an editor using ML, but we're not going to jump in and say, "Hey, Steve was right and Jason was wrong in this particular article." That is not our role. I think if we played that role, we would be the most hated organization very quickly.
Lukas:
Right. And I guess, someone else was asking is sort of the implications of ML top of mind? I would imagine it's hard to be really neutral with any kind of tool, right? Do you ever feel like there's implications for how your tooling works, even though you're really just supporting moderators? I would imagine that, for example, subtle changes in how search works might actually really change what content people are seeing, because there's such a high profile set of webpages.
Chris:
I think probably one of the key foundations of the team is the idea that any kind of ML that we do is not neutral, at the end of the day. Our gold standard, when we are making models, is that the model reflects the training data from the particular community that is served by that model. So for example, French Wikipedia wants a model that predicts the article quality, like if this article is really good or bad, to help editors decide which articles they should really jump in and help in. We want to get that data from the French Wikipedia community, train it, train that model, and then serve it back to the French Wikipedia community, and give that community the ability to actually manage and govern the use of that model in their system. What we're saying is like, "Hey, there is no neutrality here", but we will try to limit our ability to, say, train something on English Wikipedia and then apply it to Vietnamese Wikipedia, by gathering the training data from that original community and then serving back. It's not possible all the times because some models have to be like...you need to be global, scalable, there isn't like enough training data, and that kind of stuff. But that is our gold standard that we go for, and that we've done many, many times over the years.
Lukas:
Got it. One other question on this theme of moderation that somebody asked, and I'm kind of curious about, is what's the most common type of spam attack that you deal with, like adversarial problems that you come across on your different properties?
Chris:
Well, I mean, the most common one is someone like putting in like "poop" or swear words randomly into articles. Detecting that through...the community has actually done a great job, because I think people don't realize that English Wikipedia community and other Wikipedia committees actually have developed their own machine learning models, like as bots that they deploy by themselves with no need from the foundation. We host them, but it is theirs to do whatever they want with. But the most common one is definitely adding swear words. It's something...as you can imagine. The ones that are the most dangerous are definitely the ones that the attackers have a lot of resources. One of the things that you quickly realize when you work here is that all of our models are open source. Everything we do is open source. You can see the whole thing, you can see my internal chat. You can see my ticketing system, like my Jira is totally public. What I'm working on a given day is public. I'm live streaming the work that I'm doing every single other week or something like that. All this is open and every single article on adversarial...not adversarial machine learning, but adversarial attacks on machine learning systems, it's like, if you have the model or you could actually use the prediction really, really quickly, you can start to figure out how to game the system because you have such exposure. We are exposing ourselves to that all the time, by showing them exactly what's happening with the model, by giving them the training data. And that is that sort of give and take that you sit where, okay, how do we work to see how other people are behaving in the system in order to detect any kind of problems while also making it that like...all of our models, you can hit an API for free and just use. Use as much as you want. As long as you don't crash the system, you're good to go. People are using it tons of times, they can download the model, they can download training data, they can run it locally, they can do whatever they want. But of course, there's risk in that, right? Because there's no...people can see your entire hand. It's like playing poker where you're showing your whole hand and they're not showing any of their hand. You're definitely at a disadvantage, but it is a trust-based activity that people who spend hours and hours and hours making changes to the site, writing new articles, finding some new interesting fact and then hunting down with where to put that in, or sitting on those talk pages and debating and discussing how to exactly phrase a single sentence about some article because it's really important to get that right. That only works if they can come and see my team and say, "Hey, they're doing everything...I can see what they're doing. I understand what they're doing. I understand where they're coming from and I can participate in that." That's the only way we have anything, because the worst case scenario would be that people thought that what we were doing was a black box that you just couldn't see and there was some mystery behind what it was. And we were just like, "Oh, just trust us, just trust us." Don't trust us, come and look, come and see, run the code yourself, tell us we're wrong. We're Wikipedia, so we'll definitely invite changes all the time.

Working in the open and embracing humility

Lukas:
What does it feel like working with that level of transparency? I can see how it really must keep you honest around security holes and thinking really carefully around not doing security through obscurity, but like, what's the experience like? I mean, I assume that your previous roles you didn't live stream your work as you were doing it.
Chris:
Yeah, it's interesting, because I've worked in the non-profit space and the startup space for a long time, and in both of those spaces that I've traditionally worked, even when we were doing open source work, it was sort of off in a corner. Like there wasn't really that many people who were paying attention to it, or if you're a startup, it's literally all IP. And so you're like, deep in the bowels of the organization in the back working on some algorithm that people hope will help them raise money or something like that, but no one's going to see it, you're never going to publish it, there's never going to be a paper about it, it's just your secret sauce in the rear. At Wikipedia, because we do everything so open, I have learned to lean in on the idea of being open with a large amount of humility. Just to give a real example, we are going to start releasing model cards. So, an individual page that describes every single model that we host, and we've been sort of making prototypes and experimenting with them. The experiments are public, you can take a look at the experiment page, and you'll sort of like see what's happening. But some of the models are going to look embarrassing, like you're going to look and be like, "Wow, that's a really bad model, I can't believe you put that in production." And we just need to like...that is the only way to go in this scenario, is to just say, "Hey, we are going to be open, we're not going to take offense to something that you say our model is crappy, come help us fix it." We will lean into all the humility that we can because that is the only way to do this. The only way to do this is just to come in with a huge heaping pile of humility and openness and just let things go. It is weird, and it is different, because when you work on the team, you work on this nexus between machine learning, which a lot of people are interested in, and Wikipedia, which a lot of people are interested in. It's like working under a spotlight, in a sense. I do live streams of myself working and the first few weeks like a hundred people were showing up, and they would just watch me not know something, like not understand how something's working, not understand how my system's working. Another example is there was this bug report that I randomly saw that showed a huge percentage of the traffic of one of our data centers was because of one image, like one image was all the data. It was some flower or something like that. And I was just like, "Oh, that's kind of cool, I'll tweet about it." I tweeted it and then within 24 hours there was like a hundred articles about this flower that was causing all this problems on Wikipedia. People kept on coming to the fabricator tickets, like the Jira ticket that the engineers were working on to fix it, and it was crashing fabricator because of so much traffic. They were sending in messages of support and all these comments and ideas of what they thought it was, and the engineers were like, "Just stop, just stop, we think we got it, stop posting comments." You're in the open, you're in the open, you're in the public, and you cannot be defensive with how you do it. Because I mean, if you're really defensive about it, it's probably not a great job, so probably won't be that enjoyable.
Lukas:
Do you think you've had to develop a thicker skin? I know whenever I do anything that's very public, mostly the feedback is positive, but I really feel the negative feedback much more. And I think it causes...any kind of public thing we do, a little tinge of stress. I kind of can't imagine if everything was like public and visible, and people were watching it. Has it changed your mindset at all, or the way you work?
Chris:
Oh yeah. I think when I started, I think I had a regular thickness of skin. I was like, "I'll do fine in this role. What would you do possibly?" And then you see what happens, right? People don't like what you're working on. People don't think the foundation should exist. People don't think there should be machine learning in it. People think your model is wrong or dumb or stupid, or why would you do it, or like there's this particular problem, or why aren't you working on this other thing or 10,000 things. And remember, everything we do is public, so someone can post a comment about a ticket from like 2014 and say, "Oh, this is stupid," or whatever. And people can like take your code and say...you get that all the time. It is something that I think everyone on the team just learns to be okay with. I think the best people who do it are the people who just come in just lean into the idea, like "Hey, it's okay." People like what we're doing, for the most part. Some people won't, that's okay, but there'll be people who just won't like it and there's nothing to do about that, right? There's no other way to operate. But yeah, there's definitely times where you're like, "Oh, my God, this is brutal. This person really doesn't like me." But, all of that pales in comparison to like the simple fact that... I get up every single morning and people pay me money to work on Wikipedia and all the other projects. That's what I do all day. I sit down and I'm like, "This would be a cool thing to do here, we should work on this, let's change this up," like that's all I do. Just make Wikipedia, work on Wikidata, work on Wiki Commons, all the cool projects for all these people who've volunteered, volunteered thousands of hours to work on this stuff. My salary is paid by donations, so like people are donating 5, 10 dollars to make my salary, right? That is how I'm working on it. Once you put that more into perspective, you're able to take a lot of heat.

Going down Wikipedia rabbit holes

Lukas:
That makes sense. Do you find yourself getting distracted by the content?
Chris:
Oh, yeah.
Lukas:
I mean, Wikipedia, I find so fascinating. I would think if I was working directly on it... I actually remember my first job, I was writing a search engine, we were practicing on Wikipedia. And I remember, like every time I was editing the...or monitoring the search results, I'd just go down these rabbit holes on whatever topic it was pulling up.
Chris:
It is...it is genuinely hard. And not just the straight content, but all the layers underneath it because when you start to work on it, you realize all these little decisions that were made around like, "Oh, how do we do licensing? Or like, what is the kind of ramifications of that?" So for example, we have Wikimedia Commons, which is all the images that we have, and it's like, "Oh, there's faces in the images? Why are faces allowed in these? What's the rabbit hole?" And that's like been a huge multi-year discussion by these folks of what to do about that and if that's okay, and that kind of stuff. And then you just look at the talk pages and you look at the discourse, and there's just, there's so much. It is like the classic iceberg diagram. There is so much that's all public, but just not that front page. There'll be a page of like Dalmatian puppies, and then there's like just a huge, massive discussion of licenses and behind the scenes of how to do certain things. I have definitely become very distractible, because...research comes out about really interesting ideas and I'm sort of constantly being pinged by like, oh, there's this cool thing about how to auto translate stuff, or this cool idea of how do we detect these particular stuff, like maybe we should work on this and like try to keep the team sort of focused on pursuing just a few things to move forward, is hard enough. But I definitely do like that I can have Wikipedia open on my browser window forever and it's technically working, even though I'm randomly scrolling Prussian military history or some super duper, duper random topic. And-
Lukas:
Do you have a favorite Wikipedia page or topic that I could look at after this interview?
Chris:
I do, I do.
Lukas:
Tell me.
Chris:
It is called perpetual stew. Perpetual stew is the idea of a bowl of stew that is never stopped cooking. So it is cooked forever. And the idea is you're constantly adding to the pot and as you're taking out from the pot. This sounds like a crazy concept when you think about it that you just have this like, a hundred year old stew that you're doing.
Lukas:
It seems a little disgusting, is it good?
Chris:
See, this is why ... And then the photo is amazing because it has like a whole fish in the photo, which is like someone's thrown a whole fish. It's weird. It's weird, but that is not something that I would ever imagine, but yeah, it's a cool idea. But there's another-
Lukas:
That was a great answer that you just had instantly. We didn't-
Chris:
I spend all day looking at Wikipedia. Literally all my conversations about Wikipedia, like all the time. The images of it, the different parts of it...So yeah, I definitely have a long list of ones that I think are great. I think some of the ones that I have really appreciated have been the ones that are in the news. I don't think I really appreciated how much work the volunteers do when something is like fast-moving news. I remember during the US presidential election, and I was going to the page...there's all these procedures in place. The volunteers all on their own, they lock down the page through this process, only these kind of edits go through. How do you make changes? How do we do the wording of this kind of stuff? All that, that happens in the moment, live, which is just so cool to watch. Now whenever there's some kind of event, I immediately go to the relevant Wikipedia page and go to the talk page and watch people hash it out to like figure out how to work, which is just so cool.

How Wikimedia uses machine learning

Lukas:
That's awesome. I mean, I think one of the things I was excited to talk to you about was actually the ML infrastructure at Wikipedia, because a lot of the real world people we talk to, you have to be a little bit cagey or vague about exactly what the problems are with the infrastructures, but you're so open about this stuff that I think we can really get into the nitty gritty.
Chris:
Yeah. We are all open.
Lukas:
Before diving in, and this is actually a question that somebody asked, but I think it's a really good one to start with is, what are the important ML applications at Wikipedia? You mentioned some of them and you said some of them aren't even run by your team, but just off the top of your head, what are the things going on using ML?
Chris:
We do a lot of models that help editors, that's probably our main body of work. This would be things like, for example, predict if a particular edit is...we think it's a productive edit or not, or whether we think it's a damaging edit or not. The idea is not to make changes to Wikipedia ourselves, but to flag it for editors in the UI, literally the UI changes that they can say, "Oh, okay, cool, like I should go deal with this because this edit is probably bad. So I can skip this particular edit, and I can go to this other edit," sort of prioritize work. We also are working on some things that we call structured tasks. The idea is that there are many ways to participate in Wikipedia, and one of the hardest barriers is that you try to get your first edit in and it's like instantly rejected, because it fails some long established rule about how things should go. And so, one thing we've been doing with structured tasks is like, can we use ML to recommend edits that we think will pass? Sort of like an easy mode. And they might be something simple, like grammar, or they might be the one that we're working on right now is a link. So like, is this word a link to another article? Should that be true? And so we'll highlight the word and then highlight where we think it should be pointing to and then ask them, is this right or not? And then if they say yeah, it becomes an edit that gets pushed to "Production." Our big focus is to make that editor and reader experience better using ML. There's other things that we do, like we predict the topic of the article, and we look at sock puppet stuff, but the big one is trying to make editors' experience better.
Lukas:
Do you build separate models for every language, or is this kind of all baked together as a single model?
Chris:
We traditionally do one model per language. Right now I'm looking at a kind of shift, where we end up doing one model per language for every single model that we can, but then doing a language-agnostic model for everything else. You could imagine that the 300 languages that we would support, there would be a language-agnostic model that would work for all of them, but not as good as a language specific model of where we can. Gathering the training data from each individual community is really time consuming. You can't do that 300 times with a really small team. Trying to do that balance where we can do that global coverage, but believe that the gold standard should be an individual language-based model. It doesn't happen for everything. For example, when recommending whether a link is... or recommending whether a word is a link or not for that link recommender I just described, we don't need to have a language-specific model for that, we can take advantage of that. But I know one of the questions that someone asked on Twitter was like, what am I interested in NLP, and language-agnostic models is the thing that I'm really interested in. Because when you start to do one model per language, you run into a scalability problem pretty quick, like how do you maintain with fresh training data, with monitoring, with all that stuff of like a huge breadth of languages, well beyond the languages spoke on the team? How do you maintain that? Having some kind of idea of like, okay, cool, let's do like a mix where we'll have like some models that are just across all languages, but our gold standard whenever we could, is to make like one model per individual language. What we want...the community governs the Wikimedia Foundation, like they're the ones who select members to the board. And then like the board decides what the priorities of the organization are and that trickles down to me. For us, we want communities to feel that they have the power to decide what they want to do with the model. So like, if French Wikipedia is like, hey, we want a model that predicts the edit quality, great, we'll help them get training data and put that model out. If they then decide that they don't want that model anymore, we'll turn it off, right? Because the goal is that we're here to support them and their stuff. They're the ones who are putting up the huge amount of hours and effort and time unpaid to make this stuff, we're just trying to make their lives a little bit better.
Lukas:
I would imagine that you have probably more requests than you can really field, how do you prioritize all the requests that come in for different models, and also improving existing models?
Chris:
Yeah. A lot of times what is really hard is distinguishing different types of requests. One of the things that happens a lot is that volunteers have really spiky participation, this is just sort of natural, right? They do a lot of work on something, and then they get a new job, and so they kind of disappear for six months, and then they come and do a lot of participation again, right? That's exactly how volunteering works. Because you're volunteering, you have other things. School starts, you have a new kid, you decide that you're bored of doing it, you take on another hobby, and that kind of stuff. That kind of really spiky participation means that...when I took over the team, we talked about it a lot and we decided that what we wanted to do is that if we ended up hosting anything on the foundation servers, that we will own it. If someone comes in, and really works with us and helps us build a model and that kind of stuff, and then they go off and do something else, we will continue to maintain that model in perpetuity, and keep on running with it. That means that you have to be selective of what you take, because you can't take every single thing that people are asking for if you're going to own everything that comes in. And so there is a process of deliberating what that would be and whatnot. There's other ways that people can host models at the foundation...this is a technical podcast, people are probably familiar with AWS and EC2, we run our own EC2 instance, essentially, which is what you call cloud services, where people can actually go and host their own stuff. If they wanted to host their own things on our servers, that's totally fine and they could do it through there. But when it comes to my team, we know that we need to own something, because part of our idea of what it would look like to do community-based, public, ethical ML is ownership of us saying like, "Hey, we screwed up, that this model is bad, we screwed up that this model is harmful." And the only way we can do that is if we actually own the model, we understand how it works, and that kind of stuff. Evaluating models that get submitted or requests for models and that kind of stuff is a real challenge, which is unique to the foundation in a way.

Wikimedia's ML infrastructure

Lukas:
How many models are you owning, like running at any given time?
Chris:
We have, I think, 120 models right now. And maybe five that are currently being built. We stopped building new models for quite a while over the last year, because we're switching infrastructures for model deployment, which we could talk about.
Lukas:
Yeah, let's talk about it.
Chris:
There was definitely this moment where we were like...the current infrastructure, which has lasted us a really long time and is sort of what got ML at the Wikimedia Foundation off the ground, is not serving us anymore. We need to go back and figure out what to do. Because of the nuances of the foundation, the foundation is a strong believer of privacy and of open source, which means we don't use cloud-hosted services. We are not on AWS, except for like very, very small things. We're not on Google Cloud compute. We are on our own servers in our own data center, or not our own data center, but in our own racks in the data center. Building out a new model deployment system was literally starting off with like, what are the specs of the servers that you want, like how many sticks of RAM? Just to show you the level, I had conversations about how the racking was going to go. We bought a GPU to try to test if we could use it in our server and I got this photo from the person in the data center, like the Wikimedia Foundation employee in the data center. In the photo, he's trying to install the GPU into the server blade and he can't, it doesn't fit and he's like showing me in the photo that it doesn't fit. Like that's the level of like bare metal up. Which, as a technical challenge is really fun. I've taken a lot of appreciation that the foundation actually cares so much about privacy, that it is unwilling to give up anything. It is very, very thick. It is funny because there's a ton of SREs at the foundation, like most of the tech stuff is by SREs because you constantly need to have these people maintaining the systems and building the system for that low level. But, yeah.
Lukas:
What are these models? A lot of the questions that we got were actually like, is Wikimedia using deep learning? I guess, I should just ask that. But I actually want to be more specific, can you describe like what frameworks are you building these models in? What are they like?
Chris:
Yeah. So right now we have a lot of models in scikit-learn. That was sort of the initial set of models, these are the ones that are predicting article quality, and the quality of an edit or like the topic of that kind of stuff. We've started to move towards more deep learning-based models, particularly around like computer vision and NLP, because there's just big advantages to using that. Right as I joined the foundation, they were setting up some GPUs in...because we have to use our own stack. So, literally installing the GPUs in the machines and starting to work on there. As we move forward, I know we're using fastText for some model, which is that Facebook library. For me, as the person who's herding the cats in this instance, I have become very interested in simple models, because the goal of what we do at the foundation is accessibility. You should be able to understand what we're doing. Not every single person, it's okay if not everyone who doesn't work in ML understands what we're doing, but my goal is that if you see a model that we're using, here's the foundation's model for detecting whether or not this piece of text is a link or something, that you can go to an open source page on GitLab, you can see the code that's plainly documented, you can see the link to the data that you use to train it, you understand what's happening, because it's not so insanely complex that it's impossible to access. And then you can fix it, you can make it better, you can throw in improvements, that's what I want. I want people to see what we're doing. And so I am less interested in the most technical solution, I'm definitely more in the sort of practical, like what is the sort of lowest common bar that does it. That said, there's some things that are, frankly, particulary with NLP that I feel are just really complex. We were just talking this morning about some models using BERT to try to basically replace some of the models that we're using scikit-learn models on that should actually use BERT to throw in there to make it better. There is value, there is value in complexity, but it goes back to the idea that I don't want people to think that we have a secret sauce. I want people to think that we're a set of, hopefully, somewhat humble people building out in the open, and you can come and help us and participate and challenge us and ask those questions. And so the more accessible we use, the better. If we end up using a proprietary system to make it...I mean, that would never happen, but the reason that would never happen is you'd never be able to trust us that it was true, it'd just work or not work and you'd have to believe it. We want you to go and dig it. We are moving into deep learning. I actually have a big ask for GPUs...it is really hard to buy GPUs in case anybody has ever been in that world. It's super hard to do that. We're out there hunting around for GPUs that fit into our servers at this moment.
Lukas:
You mentioned an infrastructure change, can you talk about what prompted that, like what was happening and what infrastructure you moved to?
Chris:
Our system and how it's run since the beginning, was on what's called ORES, which is our homegrown model management system. Before there was Kubeflow, ML Flow, or before ML Ops was a thing, there was people at the foundation that were building, essentially, those functionalities from scratch. It is 18 servers split across two data centers. One in Virginia, one set in Texas. There was issues around...one of the things it does is, it is for deploying a very certain type of model, particularly edit quality ones and that kind of stuff. And it's really paired with the training system So the training system and the deployment system are very, very, very interconnected. Which means that you couldn't add, say, a deep learning model in there, because it wasn't part of the training system, which is also a homebred system. The big one for me, as sort of the director of the project was that, because it doesn't use serverless infrastructure, there is a hard memory requirement. So if your model is...I think the machines have 128 gigabytes of memory, each of them. And if your model is two gigabytes, you now only have 126 gigabytes of memory left. No matter how much that model is used, it could be used every single second, it could be used once a month, it is a finite amount of resource, which is very problematic for us, because so many, as we were talking about, so many people come to us and are interested in deploying a model or interested in sort of how we do things, which means that we need to...in order to participate with those people at a real level, we need to not so much care if something is really used or not, right? If someone comes in and they say, "Hey, I have a great idea for this project," and we work on it with them and we create a model and then we deploy it, we need to be fine with it being dormant for months. Maybe it's only used once a year, or maybe it's used all the time, and that's okay. When you reach that finite level of...literally you're running out of RAM and every single time you need to...it's a zero sum game where you're using more and more of the physical RAM to hold the models in memory? It got too far. What I think happened is that this was sort of a pioneer in the space of MLOps. And now what has happened is there's so many great projects out there that are doing MLOps, that there's such a value to switching over. We've moved to setting up what we call lift wing, which is a Kubeflow instance on a new Kubernetes cluster that we did. And Kubeflow is a open source project for MLOps on Kubeflow. There's so many great advantages of that that we've been taking in. For example, the custom libraries. So we had a researcher, he used fastText and didn't tell us because we just hadn't made that communication. And it was fine, right? He gave us the signal, we've never seen fastText before, but hey, we'll build the server for it. We'll build the service for it, and it'll run. It means you could run deep learning models or TensorFlow or PyTorch or whatever you want to do in that system. Everything's all nicely Dockerized. We've been Dockerizing our models that we have on ORES, and just Dockering and then move the Dockerfile over to the new system. There's way more storage analytics around things are working. We want to pair it with a full training suite. Right now we're focused on model deployment, but we want to get to the point where we're doing nightly re-trainings. That would mean that we could do things like shadow models, so a prediction comes in, we serve it to two versions of the model, compare the stats of like how it's doing, sort of an A/B test, except for one of the, I guess the A, actually serves backup prediction to the user. But just a huge amount of taking advantage of that modern infrastructure. And it wasn't...when this was started at the foundation, there just wasn't this infrastructure. And now there, and so taking a step back and building that out has been really fun. I will completely admit that it is somewhat terrifying to start at a job, look around, and say, "Hey, I think we need to build the infrastructure from scratch," which becomes like a planning document, which becomes a budget line, which becomes server specs, which becomes a server box deployed to a data center like to plug in, which becomes hiring SREs, which becomes slowly configuring the system, which becomes running through a thousand problems. I mean, right now, where are we right now in that system? Two days ago we got our "Hello, World!" that we served a prediction using the system, which was so cool to see after all that work. That's really the fun part about the foundation is that you're doing something out in the open and you're doing something, frankly, from a technical experience, from bare metal. Like from bare metal all the way up, that's how you're figuring it out. Sometimes you hate your life for it, because you're like, you know what's easy? AWS, AWS is easy. Look at all these wonderful services which they provide people. But at the same time, having the control to own the system from scratch and know that people's privacy is protected, that we have control over everything, where any of the data goes, any of that kind of stuff, which means that people can participate in the projects feeling safe, that they're not going to be exposed because they edited an LGBTQ article or something like that, we have that ability, which is so nice. And it feels so good to have that. But it is going to be a long process of us getting from...we're going to build a second cluster, which we're going to be using mostly for training. In our architecture, we're trying to split up one Kubeflow instance for model serving and keep that with really good uptime and keep that really, really simple. And then we're having a second one, which has access to the data center, which is more like, if it goes down for a day, that's fine, right? And so we could be a little bit more experimental, we can push a little bit farther, we can give more people access to the system, they can come in and break it without any kind of interruption to service and then move the models between the two, as needed.
Lukas:
What's the piece that Kubeflow is doing for you, it's the swapping in and out of the models? Is that the key thing that is happening?
Chris:
The big part is the resource management. I think that's always been the real value in that, for us, our model usage is really spiky. There's sort of always a hum, a certain amount of noise of people using the models. And then there'll be someone who wants to know a prediction of every single article on Swahili Wikipedia, and so you get this huge spike. We try very, very hard to not limit people. When we're limiting people's API access, it's because you're going to break the system if you do more. That is really our goal. We're funded by people, so people should be able to use it. Along those lines, being able to maintain that really spiky structure, particularly with models for a broad range of systems. So like, maybe one requires a GPU that uses TensorFlow, maybe two don't require one that uses scikit-learn, and then sort of managing all those resources in an automated fashion is super powerful for us, because it means that we can not have what we have currently with ORES, which doesn't do that so well. Around a year ago, I had my kid on my lap and I was manually restarting the prod server using a script I'd written after a glass of wine to try to get the server back up. Try to get around those kind of issues with something that balances those resources really well. There's other things that we care about at the foundation. Because it's an open source project, the foundation believes in open source projects, we want to contribute back to them. I think there's lots of nice...on the training side, I'm pretty excited about some of the UI parts of it. For example, Jupyter Notebooks that could connect...would allow our researchers to actually connect to the database and actually construct models in the Jupyter Notebook and then push a button to put it in production. Those are some of the things I'm interested in down the road, but just straight resource management is a big deal. It's weird, because the thing about the foundation is the foundation is 500 people, which are like, wow, that's a lot of people, but you're running like one of the top 10 websites in the world. The scale is crazy. Tying to do that with what ends up being a small team, when you cut people...down to people who are working on the tech department, people who are working on ML, people who are working on this particular system. A very small number of people have a lot of responsibility for things and so automating what you can out to these systems is pretty nice. And leaning on open source projects that other people can help you with issues, is definitely true.

How Chris got into machine learning

Lukas:
Makes sense. There's another theme of questions I want to make sure that we cover here, just in our crowdsourcing of questions for you, which I'd sort of summarized as...I think people admire the career that you've had and working on really impactful stuff in machine learning. How did you get into machine learning and how have you thought about your career? How do you feel like you've managed to get to all these super interesting projects?
Chris:
My formal training is in quantitative research, actually quantitative social science research. I went to a PhD program that was all about stats, basically. When I was graduating, I knew some people who were working on a Kenyan non-profit and I just joined them, and kind of was working on that. And then from there, you sort of grow a community of people in a social network that you know and people keep on pulling you into other things to work on. I think for me, ML...where the appeal was was and I'm going to anger some statisticians on here. So this is hot take, hot take-
Lukas:
Nice, Gradient Dissent.
Chris:
Yeah. The thing that frustrated me about statistics is I tended to not care about the causal inference about a lot of things. I cared about the results that was happening because I was doing a lot of this stuff, as a job, in impact. So I was doing like election monitoring, like helping someone set up an SMS feature, electoral monitor. I didn't so much care about the causal relationship between whether or not someone would send a message in or not, I cared if they did or not, right? I really, really focused on outcomes. When you have small teams the value of ML is you can start to like really scale things out because you start to use machines, it's like the assistant to you, right? So you train something manually, and then you send it out in the world. And then it does that at scale for you, which is like a superpower. I just started going farther and farther down the path of saying, hey, like we can make this team of 4 people behave like a team of 50 people, if we start to use ML more and more, and keep walking down that and just get more and more complex. As I started doing things of more scale, you sort of move from the modeling side to the engineering side of like, okay, now we have 200 models. How do we make sure every single model is running at all times, that it's totally okay. And how do we do that at scale, and like constantly moving to the next more technical challenge in those range. For me, I feel like I have stumbled into this stuff. But really, it was probably because when I got started, I knew some people who were working at this teeny little tech non-profit in Kenya, and just got to know them. Then they were sort of like, oh, what about this other place? So then I switched places. And then like, hey, what about this other thing, and I joined that. You just sort of go from one thing to another, to another, to another. It's true that some of the people that I worked with 10 years ago on various projects around...environmental projects and that kind of stuff, work at Wikipedia, right? Like work at Wikimedia, like they're still here. There's this like group of people who are working on stuff. It doesn't mean that other people don't come in, and it doesn't mean that it's not a job. It is a job that I go to everyday and I do my job, but you start to see the same faces over and over again as you do this for a while and people invite you to come and apply for a role or that kind of stuff.

Machine Learning Flashcards and technical interviews

Lukas:
How does it relate to your well-known ML flashcards and tutorials. What prompted you to do that? Do you think it is similar to your focus on outcomes and applications versus the underlying statistics?
Chris:
Yeah, no, completely. I think and people will...so I make these flashcards. They're hand drawn, they're all about ML concepts. People have come to me over the years and been like, hey, you should really read more books about ML rather than flashcards. And I was like, well, one, I have read a lot of books about ML at this point. But the point of the flashcards and the point has always been one single thing, that ML interviews require a certain amount of rote memorization. There are people that try to throw you gotcha questions and I have received those questions, like describe a random forest from scratch and that kind of stuff. Those questions, it's just easier to just memorize them, right? To just sit down and memorize it. Interviews shouldn't be run that way, I totally understand that, we should all get to a better place where that's not happening. But yet it does happen, in most job interviews. For me, I just started making flashcards for it, like what is this concept? What is this concept? What is this concept, right? Like, can I do it, and just looking at the flashcards over and over again. From there, I just sort of developed more and more of them. And then other people got interested in that kind of stuff. But it is about getting that stuff into your brain. It's not something that you can read. If you read a thousand books, maybe you probably forget the concepts because there'd just be so many. Instead, these are the concepts that I've run into in interviews and other people have run into in the interviews, and like memorize it, memorize it and then regurgitate it back up, because you look really cool when you write an equation from scratch or something like that, because you had it in your brain. It goes back to the idea that I am interested in impact. I'm 1,000% interested in impact. A game is being played in an interview where they try to stump you with like describe gradient descent to me. That's a game, they're trying to throw a trick question at you. Crush it. Memorize the concept and then crush it in the interview. That's it. I wish people didn't throw those kind of questions, but they do. And so great, I will make flashcards to get past that part. It's less an issue now because I do more management stuff, so the questions are not so deep. But it definitely was a big part of my career. Especially for me, because people look at me with a social science background, like literally my PhD is in political science. People were like, oh, so you're a terrible coder and a non-technical person, so I'm going to throw you some gotchas in there. And just being able to memorize it and spit it out has been, frankly, a really useful tool.
Lukas:
When you interview a technical person, now that you're a manager, how do you approach that? How do you avoid gotcha questions? What questions do you ask to get at the competence of somebody's work?
Chris:
I actually really prefer to give people a choice of what they talk about. Some of the questions that I've really liked have been like, "What algorithm can you actually describe in detail? Whatever you want, what's that one that you like? What's that one that you have, this is your go-to?" I like that, because I'm not trying to say, "Hey, in my experience this algorithm is important and therefore, if you don't know that particular one, you're not qualified." It's instead saying, "Hey, I want you to go deep, but you can pick anything that you get to go deep in and let's just jam out about it." I have really, really, really appreciated that, because I have had candidates who come in who have been pretty nervous, and it can come off that they don't know what they're talking about, or something like that, and I'll throw that question to them, and they will just destroy it. They will just go so incredibly deep, and they'll start to geek out on it and they'll start to enjoy it, the whole interview process, because they get to talk about what they know and they light up about it. It is so fun to participate in that, and it shows you that people have this variety of expertise, because they did this particular ML model for four years, and they really, really, really know it. So you can say like, "Okay, that's cool. It'd be fun to work with that person." That's the kind of stuff that I have grown to like, because the fundamental truth about data science is that it's such a broad field. The questions that you get in an individual interview can be all over the place, from deep statistics, like I've had to write a statistical proof at one point, to model production stuff, like MLE, MLOps kind of things, like how would you architect a system to do this, to computer science stuff, to social science stuff, just all over the place. Frankly, I'm amazed that anybody passes these interviews. I've liked giving people the opportunity to dive into wherever they want. If they can't find the place that they really dive into, that's also a signal, right?

Low-power models and MLOps

Lukas:
That makes sense. Well, we're almost out of time and we have two questions that we like to end with, that I think you'll have interesting responses. The second to last question is, what's an underrated aspect of machine learning that you think people should pay more attention to?
Chris:
Oh, wow. That's an interesting approach. I think the one that I really have started to like a lot is low-power models, so models that don't require...there's one direction that ML is taking, which is bigger and bigger and bigger and bigger models. It's sort of like getting a bigger and bigger, bigger truck, right? You just like, oh, what would be better? Two engines. You know what's better than two engines? 6 engines. You know what's better? 24 engines. I have really started to like, TinyML, like very, very, very small ML that you can run on a Raspberry Pi, and that kind of stuff. I think there's a pureness around it, but there's also like, creativity comes from constraints. Constraining yourself to very, very low resource settings is really interesting. I think it opens up stuff around cheaper smartphones and that kind of stuff, which...it's just a different direction than you're going to get from some of the really cool but huge models that take $24 million to train or something like that.
Lukas:
Totally, yeah. And even a Raspberry Pi is kind of big, I mean, try an Arduino. The final question is, what's the biggest challenge of making these models actually run in the real world? I mean, you're actually responsible for running models, what's the biggest challenge?
Chris:
The biggest one I think that I face is...well, I'll take a step back. When me and you were getting into ML, because we're both slightly older, I don't want to claim that your old, but you're around my age, ML was just starting off. You could totally join an organization and make any model and run it on your laptop, and it was better than the hard-coded thing that you were using and you were amazing, right? That's no longer the case. Now it's the case that they've had 10 years worth of models that they've made, all in these different settings, all in these different contexts. They're retraining models every single night, and so they have like thousands or 10s of thousands of models to deal with. A big part of what I found is hard is, how do you just manage all those models? And this is a real pitch for MLOps. It is hard to manage just all those models all the time and make sure they're all...not broken, not old data, they throw errors, there's dependency management around it. It is difficult to have, in the real world setting, hundreds of models going out all the time. Whether you're at a company or whether you're at the Wikimedia Foundation, it is just hard to do that. It is not a surprise to me that ML Ops has become the thing that is really, really helping people in this field out because it is something that is otherwise just difficult. It's insurmountable to think to do it yourself, because...it's easy when you have one model, right? You can be like, oh, let me think about this particular hyperparameter deeply after reading a book. It's another where it's like, we're going to be training 6000 models tonight. How do you keep them organized? How do you keep them up? How do you see how they're being used? How do you maintain them? That is a different game, which is where we're going for sure.

Outro

Lukas:
Awesome. Thanks so much. Great note to end on.
Chris:
Yeah.
Lukas:
Appreciate it, Chris. If you're enjoying these interviews, and you want to learn more, please click on the link to the show notes in the description where you can find links to all the papers that are mentioned, supplemental material, and a transcription that we work really hard to produce, so check it out.