Stanford's Polly Fordyce on microfluidic platforms and machine learning

Polly explains how microfluidics allow bioengineering researchers to create high throughput data, and shares her experiences with biology and machine learning. .
Angelica Pan

Listen on these platforms

Apple Podcasts Spotify Google Podcasts YouTube Soundcloud

Guest Bio

Polly Fordyce is an Assistant Professor of Genetics and Bioengineering and fellow of the ChEM-H Institute at Stanford. She is the Principal Investigator of The Fordyce Lab, which focuses on developing and applying new microfluidic platforms for quantitative, high-throughput biophysics and biochemistry.

Connect with The Fordyce Lab

Show Notes

Topics Covered

0:00 Sneak peek, intro
2:11 Background on protein sequencing
7:38 How changes to a protein's sequence alters its structure and function
11:07 Microfluidics and machine learning
19:25 Why protein folding is important
25:17 Collaborating with ML practitioners
31:46 Transfer learning and big data sets in biology
38:42 Where Polly hopes biology research will go
42:43 Advice for students

Links Discussed

Transcript

Note: Transcriptions are provided by a third-party service, and may contain some inaccuracies. Please submit any corrections to angelica@wandb.com. Thank you!
Polly:
We make these devices called microfluidic devices, that are kind of like, you can sort of picture the way integrated circuits made it possible to do a lot of electronic computations in a very small footprint, and that kind of led to this revolution in computer science hardware. We make these microfluidic devices that allow us to do fluidic computations in high throughput in very small footprints.
Lukas:
You're listening to Gradient Dissent, a show about machine learning in the real world. And I'm your host Lukas Biewald. Polly is Assistant Professor of Genetics and Bioengineering at Stanford. Her lab's main focus is on developing and applying new microfluidic platforms to create high throughput data, which is crucial to making machine learning work in biology and genetics. I'm super excited to talk to her today. Thank you so much for agreeing to this interview, people have been asking us to get more kind of content on the intersection of biology and machine learning. And it's kind of funny, I'll just say, you told me that you didn't know anything about machine learning, but as we've kind of gone around, we've realized that you're well-respected as someone in biology that knows a lot about machine learning. I don't know if I can trust your self-assessment here, but-
Polly:
That's really nice to hear. I feel like we don't know very much about machine learning, but we have been collaborating more and more with experts in machine learning. We're just trying to learn as we go.
Lukas:
Well, it's funny that I've discovered with our pharma customers and we've been getting a lot of those lately, I started to realize dropping your name actually gives me like a ton of street creds.
Polly:
That's so awesome, that's great to hear.
Lukas:
I guess, I should say, I was friends with you from undergrad. It's a little funny, I mean, it's awesome to watch your career trajectory, and it's exciting to talk to you about your work.
Polly:
It's same on my part, right? If I tell any of my students that I know you, instantly it's like I'm a Silicon Valley celebrity, right? I'm like at least in close proximity to it, so it goes both ways.
Lukas:
Nice. All right. Well, maybe you could explain kind of at a high level of what your research interests are. You kind of laid it out in the notes, and I tried to do some background research like reading your papers, like I normally do with the machine learning guests, but I found your academic record pretty impenetrable. So I think you can take a big step back with me and sort of explain what you're doing and why it's important.
Polly:
It's really technical. I guess I would say, a couple of examples of the things that I'm interested in are the promise of the Human Genome Project a long time ago. It was this idea that we were going to be able to sequence everybody's genomes. And then we would look at the difference in the sequences of those genomes. And we would instantly be able to say whether a particular mutation in the genome meant that somebody was going to have a particular disease, or maybe they would respond to a particular treatment. And I think the challenge is that the amount of possible variation is really huge. There are so many different variants that we discover, and we still don't really know for the vast majority of variants. Three quarters of variants that we found, we have no idea whether they're likely to have a functional effect.
Lukas:
Sorry, I'm going to start the dumb questions early. You mean variants of genes, different DNA, is that right?
Polly:
Yeah, I mean different letters in the genome, different letters in the genome, right?
Lukas:
Different letters in the DNA, okay, got it.
Polly:
Different letters in the DNA. And so probably the main thing that my lab is really interested in is trying to figure out...maybe from high school biology, everybody kind of remembers that DNA makes RNA makes protein. And we're pretty good for portions of the genome, the parts of the genome that say what proteins to make. We have a pretty good sense of what RNAs are made, and what proteins they make, kind of. But then what we really don't know is how to predict what those proteins do from the sequence, right? So it's like we have parts of the program, but we just don't really know how to predict what the functional effects are going to be when we make changes.
Lukas:
Right. I mean, isn't kind of deterministic, don't you actually know from the DNA what RNA it might make? Or I guess, in biology anywhere you pull out a thread it's more complicated than you think, right?
Polly:
Yeah, it's pretty interesting in that we have a sense of, I guess, I would say so we know for the parts of the genome that actually code for proteins, which is a tiny amount of the genome, a really small fraction. We have a sense of what RNAs are made, but there's way more regulation after that. First just for the RNA, the RNA kind of will loop around and cut itself up to make kind of different variants. And then when we make proteins from that, I think one of the big challenges is figuring out...a protein is a linear sequence that has to fold into a three-dimensional structure, and that three-dimensional structure does something. And I think a great example of where machine learning has had a real impact in biology is AlphaFold 2. It is a great example where there's been this problem for a long time, what three-dimensional structure do linear protein sequences make? And here machine learning algorithms have improved our ability to predict that, but we still don't know what those proteins do when they're folded, or whether they just fold into one confirmation or multiple confirmation. I think there's a lot more questions like that.
Lukas:
Could you give me an example of one that you do know? Because we know some, right? There's some mechanisms that we understand, right?
Polly:
Yeah, there's some, like in terms of protein folding or in terms of-
Lukas:
Just in terms of the whole sequence. What's the sort of canonical example from high school biology, you have some different letters, so then you're missing a protein and then you have some disease, right? That's sort of my mental model, is that even right?
Polly:
Yeah, there are like a small number of, I guess, initially it's like Mendel with the peas, right? You learn about Mendel with a pea in high school. And it's like, "Oh, depending on what the sequence is, it's either going to be pink flowers or white flowers." And I think people thought that was going to be the case for genes. And there are a small number of genes, like sickle cell anemia is a great example of a gene where we know that this gene, if you have this variant, you're going to have sickle cell anemia. If you don't, you won't. But most traits whether it's height, or autism, or diabetes, or whatever, are actually, it's sort of like there's a whole collection of thousands of genes that determine whether or not you're going to get a particular disease and how you have a distribution of genes that mean you're more or less likely to have a disease. And then that distribution interacts with your environment and what you're exposed to. It's more complicated than Mendel made it seem.
Lukas:
Your research is on the actual kind of physical mechanism that goes from you have more of some kind of protein and then something happens. Is that right?
Polly:
I guess my research, and again, it's like so technical. There's a few different things that I would say my research focuses on. At a basic level, one of the questions that I'm interested in is "When you have changes in the sequence of a protein, or changes in the part of the genome that tell you when and how much to make of that protein, how do those changes alter or function?" I guess, initially I was a physicist, my PhD is in physics. And one of the things that I think is really interesting is that these sequences code for molecules, three-dimensional molecules, and a change in the sequence of that molecule changes the physical forces that it uses to interact with other molecules. That can affect whether a cell lives or dies, whether a fetus lives or dies. It's sort of this interaction at the scale where a change at the level of a molecule can have profound influences for a fetus or a cell. My lab is really interested in how changes in the sequence of a molecule affect its structure and function. I'm not sure if that's like specific-
Lukas:
No, totally, let me see if I can repeat this back. It sounds like you're interested in... The DNA makes RNA, and there's probably some asterisks there, and then the RNA kind of makes a linear sequence of a protein. And it sounds you're sort of interested in like how the changes in the composition, I guess, of that protein sort of change something that happens beyond that.
Polly:
DNA makes RNA makes protein. Proteins then fold into a three-dimensional structure, and they do things in the cell. Sometimes they bind RNA to tell the cell when it should make other genes, they bind other proteins to transmit signals. Proteins are kind of the functional workhorse of what makes stuff happen in yourselves. And my lab is really interested in "How do changes in the sequences alter the structure and function of the molecules?" And I guess, I would say sort of two more things. One of the things, our approach is it's a problem of staggering complexity, right? The number of possible amino acid combinations for an average size protein is larger than the number of atoms in the universe. So we're never going to be able to test all possible variants and see what they do. That's just impossible. So we're really interested in trying to figure out, "Can we create libraries in which we systematically vary sequence?" It varies these physical properties, and we assess the effect on function, so that we can kind of learn not just a black box relationship between sequence and function, but we can ultimately develop quantitative and predictive models that would allow us to predict not just for the molecules we study, but for all molecules, how sequence changes alter function.
Lukas:
I see. But through kind of like a physical understanding versus, I feel like that the machine learning perspective might be to sort of like, "Hey, let's treat this as a black box, possiblym, and let's sort of look for patterns here" versus trying to understand the actual physics of what's happening.
Polly:
Exactly.
Lukas:
Interesting.
Polly:
Where we've really loved collaborating with machine learning specialists is, our approach is we develop these tools, we make these devices called microfluidic devices, that are kind of like, you can sort of picture the way integrated circuits made it possible to do a lot of electronic computations in a very small footprint, and that kind of led to this revolution in computer science hardware. For us, what we do is, we make these microfluidic devices that allow us to do fluidic computations in high throughput in very small footprints. Now what we can do is, we can do-
Lukas:
Fluidic computation?
Polly:
Right. Normally if you were going to an experiment itself in biology, you sort of picture test tubes, and Petri dishes, and big things. And if you wanted to do a thousand reactions, you need these giant expensive robots. So what we've been doing is we've been using this approach where we can create these tiny devices that instead of using five milliliters of fluid for each reaction, we use about a nanoliter. These devices make it possible to use fewer reagents, so everything is low cost. We can automate things on these devices without the use of expensive robots. And now the main power of these technologies is that they allow us to make a thousand measurements in the amount of time and cost that it used to take to make one in biology. And now I think that that means that we can generate data at a scale that allows us to quantitatively test predictions from our colleagues in ML, right? You all need ground truth. You need some ground truth measurement to assess what's going on. And you can't just have one or two, you need enough that you can do some sort of regression to figure out where is your model successful, and where is it failing? And so our job is to make measurements of a thousand things really quantitatively, where we can interface back and forth with ML people to test those predictions, revise, and refine those models. And hopefully try and use some of these ML predictions to learn new physics. That's what we want to do.
Lukas:
That's so cool. What would be something that would happen at that tiny scale? Are you literally putting a protein in there and watching what happens- I mean, can you explain exactly what goes into that?
Polly:
Exactly. Here's two examples of some platforms we've developed. We've been working really closely with Dan Hirschlag, and he's like an enzymologist. So one type of protein that we're interested in is enzymes, and enzymes underpin all of our metabolism, right? They make it possible to do chemical reactions that would never happen in the absence of an enzyme. They're important, both for ourselves, they're the tools people use in modern molecular biology, you use them to make libraries for sequencing. People use them, you use them when you do your laundry, right? Enzymes are the type of things that bust up stains on your clothes. And we still don't really know how the sequence of an enzyme specifies its function. One thing that we can do now is, just like the Moderna vaccine, right? Everybody's sort of heard now we can make this mRNA vaccine, and we can program it to make something that we want. We can create little pieces of DNA, each of which specifies a protein we want to make. We can use a robot so that we spot bits of this DNA in an array. So we have like a thousand little spots, and we know the program encoded by the DNA in each spot. We can take one of these devices that we make, that has little chambers, and align them to the spots. And then there's sort of this magical mixture of all of the stuff that you need to turn DNA into RNA and protein. The companies sell, it's like you just buy this little tube that has the polymerase you learned about in high school biology, the ribosome that makes the protein, all that stuff. We'd push it into these little chamber-
Lukas:
And that fits in a nanoliter? A nanoliter's not huge, it all fits?
Polly:
A nanoliter is like...your hair, a hair strand is like 100 microns. Each of the chambers in these devices is about the diameter of your hair and the height of a 10th of your hair, right? We use like a lot of the machinery that people use for lithography to make these integrated circuits. We use all the same equipment to make these tiny devices. And now we can make a little-
Lukas:
I can say I see the integrated circuit analogy.
Polly:
Yeah, exactly. We really do use a lot of the same equipment, except for now, instead of pushing electrons around, we're actually pushing fluid that contains molecules in different ways within these devices. We can make each one of these enzyme variants in each chamber. And now we can quantitatively ask, "When you make this mutation, how does it affect the ability of this enzyme to catalyze the reaction it's supposed to catalyze?" That's an example of one of the things that we do, and the reason why you would want to do it is, this might help us classify variants in the human population for whether or not they're likely to compromise function and cause disease. It could also maybe help us generate new enzymes that eat up environmental waste, or design new enzymes to do things that we want to do. One other example, I guess, of something that we do is, historically when you've looked at a population of cells, let's say, from a tumor. We've ground up all those cells, and we've asked, "What's the behavior of that population of cells?" Within all of those cells, maybe there's one or two rare cells that's resistant to a drug. And when we treat a patient with that drug, those one or two cells are going to proliferate and drive treatment failures, right? We need a way where instead of looking at all of the cells mashed up together, we want to be able to profile the cells one by one. Another technology that we're using, that this field of microfluidics allows you to do, is we can actually put every cell in a tiny droplet. Basically a little water in oil, a droplet that serves as a tiny compartment where we can interrogate that cell by itself without looking at all of its neighbors at the same time. And so, again, those droplets are like a nanoliter, right? And we can look at a million cells individually at once in their own little nanoliter compartments.
Lukas:
How do you break up all the cells?
Polly:
Some cells just grow, like blood cells grow by themselves. For solid cells, this is something that actually our collaborators do. I never actually really know how to do this, but you can treat them with enzymes that chew up the stuff that connect them so that they separate, right? If they grow on a surface, you treat them with this enzyme, and then they separate from each other and come into the solution. And then we put them in the bubbles, in the droplets.
Lukas:
In some automated way, I assume?
Polly:
Yeah, I wish I could show you the videos, I could send you-
Lukas:
I know. Send me some videos, we'll put some links to them. That'd be awesome.
Polly:
I'll send you videos of both.
Lukas:
Cool. I mean, I guess it's funny, a really dumb question that I keep being kind of afraid to ask, but I think other people might be feeling..it's like everyone's sort of saw in machine learning the protein folding thing. And kind of everybody knows that protein folding is this interesting big problem that a lot of ML people have worked on, but I've always kind of, I guess, I'll ask the question, why is protein folding so important? It seems like it would be really critical to your work. But can't you also just look at the proteins, and see what shape they have. Are they literally just that?
Polly:
It's such a good question. These questions are awesome. Yes, they're tiny, right? They're really tiny. And so to see the structure of a protein, you have a few options. Historically people have tried to crystallize them. They've tried to get them to basically form a three-dimensional crystal where they're all in the same shape. And then they've taken them to a giant x-ray beam, right? Like the Stanford Linear Accelerator or other places like this. They've shot x-rays through them, they've looked at the diffraction pattern that they make. Then they apply a bunch of kind of super fancy Fourier transforms, essentially, to take the diffraction pattern and turn it back into a picture of what the protein looks like in 3D. It's really hard, right? You go to talks all the time where a graduate student is like, "I spent five years trying to crystallize this one protein", right? A lot of proteins don't crystallize, it's slow. And the other thing is most proteins don't exist as a single static structure, they're wiggling around all the time. And that wiggling is really important for their job, for how they do their function. More recently, people have started using...cryo-electron microscopy is another way to kind of look at proteins, where you like freeze proteins down on these metal grids. And then you use the super fancy, like $10 million microscopes to look at the individual particles. There's been a real revolution in this in the last several years, basically because image processing algorithms have made it possible to align many different particles and kind of reconstruct what things look like. But that's only suitable for big proteins. You can't really do it for small proteins. The vast majority of proteins don't have crystal structures, or these cryo-EM structures, so we just don't know what they look like. And we've really looked at some of them fold into these three-dimensional structures. A lot of them are kind of unfolded and we have very few pictures of what they're doing. So trying to predict the number of structures we have is just tiny compared to the number of proteins we know about, and the structures are off in a static picture. That's one reason why it's a hard problem, and the reason why you want to know is let's say, you want to design a new drug to target a protein. You kind of need to know that 3D shape, so you can figure out where would you put a drug, and what kind of drug is likely to fit in there, and alter the function of that protein, maybe. I'm not sure if that makes sense.
Lukas:
Yeah, that makes sense. No, that was really helpful, thank you. And I saw this amazing blog post that I think was from more of a computer science perspective on how the Moderna drug works, which is super helpful for me to understand why you would kind of care about...I dunno that's my mental model now.
Polly:
I think Drew forwarded that to me.
Lukas:
Oh, cool. I was like, "Wow, this is so amazing that people could figure this stuff out and then make a certain shape." And then it seemed like they modified it a little bit from the natural one to kind of make the shape better. And I can't believe they figured it out, but it sounds like they figured it out in days.
Polly:
What was really interesting was yeah, Drew was like, "The people who figured this out should get some huge prize." And I think what's really interesting is that, it's been like tens of thousands of people over decades who have made it possible, right? I think for this particular vaccine, there are people that sort of specialized in mRNA vaccine production, that'd be critical. There are people that specialize in coronavirus in general, and spike protein, which is the protein on the surface that we're trying to mimic with these vaccines. But it's really kind of a beautiful example of so many different fields of biology have contributed to that, in terms of thinking about the folding and the structure of RNA to figure out, I mean, both in terms of immunology, what parts of the protein should we be targeting? In terms of thinking about nucleic acid biology, how do we make an mRNA that's going to be pretty stable, right? Some of the modifications that you're talking about made it more stable. Thinking about delivery, how do we wrap it so that it can go into your body and isn't just instantly chewed up by all of the enzymes in your body that are looking for foreign invaders and want to chew them up all the time? It's an amazing triumph of the scientific community, and scientists from so many different fields. It's really exciting, I guess.
Lukas:
Yeah, it seems cool. I guess, I'm kind of curious your experience collaborating with machine learning practitioners. Can you maybe describe what that's been like, and what... I mean, I remember when I first started working with people in medicine with my last company, it was such a funny kind of cultural mismatch. I remember them telling me, they were doing microscopy and they were like, "We have so much data, we have like 500 people's tumors that have been sliced and stained", or something. And I was just like, "Wait a minute, I'm not sure any of my methods would work with that." They're big file I guess, but I think I need more than a big file.
Polly:
I mean, I love it. I think it's so pleasurable. I love working with practitioners of machine learning, both because as a field it's moving so fast, right? The things that are possible this year are different than what was possible six months ago, a year ago. It's interesting to think about all of the ways in which algorithms that are developed for figuring out whose face is in a photo can instantly be ported to biology, right? So you can leverage all of the commercial interest in developing something like that towards problems like what we study that are never going to be as commercially viable or interesting, right? So that's really exciting. In terms of the culture mismatch, what's funny, I think is, I'm on thesis committees for a lot of ML students now. And for ML students, what they want is they want their algorithm to have the best AUC by 2%. Even at small...an incremental benefit is good, right? Because it could potentially scale. But for them, any points that are unexplained are like a failure. Whereas for us, that's the most interesting part, right? What do those points that are not explained by the algorithm have common and are we discovering new biology or new physics that we hadn't thought about before? It's cool. The mathematical facility that ML practitioners have is astounding. And it's fun where some of the questions, people are like, "I'm sorry, just what is a protein?" Where we're like, "Oh, okay, we can answer that." And then at the same time, I'm looking at that image everybody shows of their neural net, with all the layers, and I have no idea how you would actually implement that. I've seen the picture, I have the picture in my papers, right? But I would never be able to actually do, I don't even know the first thing about how to set it up. I think that's what's sort of fun about it, is that there's this natural complementarity, but there's so much for each side to learn, that it's always really intellectually engaging.
Lukas:
Do you feel like coming from kind of a physics background, is it maybe disappointing that... I mean, do you worry at all that maybe the only way to explain some of these systems is through kind of a black box technique? I feel like the protein folding thing, it seems like for a long time, I knew that people were, it seemed like they were really trying to just simulate what would happen to the proteins. And I'm not totally up on the latest stuff, but it seemed to me like the approach that worked really well with the AlphaFold was sort of less physics simulation, and more just kind of like observing. Where do you think that goes?
Polly:
For me, what I really think about is, and this is sort of the heart of some of the stuff that we've been doing with Anshul Kundaje's lab is...here's my motivation for why I think we need to eventually know the physical principles. Let's say we were wanting to learn how to create a new ballistic, or fly a new thing. If we just wanted to take this black box approach, it would be like, each time we want to fly a new thing or make a new ballistic, we're just going to make a thousand ballistics. And then we're going to shoot them, and we're going to collect the data, and then we'll train, we'll hold out some of the data, and we'll train a neural net. And now we're going to be able to predict it for that system. The fact that we know the laws of gravity means that we're not restricted to just now working with that system that we've tested a thousand times, we can work with all kinds of systems, because we have this generalizable physical model. I don't think it's necessarily at odds with machine learning approaches. One thing that we think is really exciting is, let's say, you're able to train a neural net on a given dataset, where you have a physical hypothesis about what's going on. We can do a lot of experiments, we can do a thousand experiments, a thousand measurements in parallel each time we run an assay, but that's often not enough to fully characterize the system. If you have a neural net that can predict behaviors, now we can feed it in things where we're systematically varying particular physical parameters to ask what it thinks, right? I think sometimes you can use these black box models as a way to do in silico experiments, at a scale far beyond what you could reach with even the highest throughput-
Lukas:
In silico.
Polly:
Right? Like, you can. In silico experiments, you can do millions or billions of in silico experiments. Choose a thousand that you then go back and test with some of our experimental techniques to see what's going on. I think rather than just thinking of neural network predictions as an endpoint, like I'm going to train on the system, and I'm going to predict for the system, can we use them as a tool to uncover generalizable physical principles? To me that's like a really interesting and complementary way to think about those problems.
Lukas:
That makes sense. Another question, and I guess, I'm just asking kind of the dumb questions that maybe I'm afraid to ask other people. But when I look at image recognition, and I feel like I've been working on image recognition for two decades. I've sort of seen it go from totally not working to working maybe better than humans in a lot of controlled cases.
Polly:
Particularly in like clinical cases, right? There's a lot of clinical evidence that it can work better than-
Lukas:
Yeah, and you talked about how it's mostly trained off of images online, like really ImageNet was kind of this moment where it started working, where people decided to collect a huge set of data. And then there was this thing called transfer learning, where that's kind of become mainstream, where people take something trained on a big set of data, and then they kind of fine-tune it on a smaller set of data. Do you feel like that is working in biology? Is there an analogy to that where there's some like big data sets that you could train on, and then kind of modify the models to work on smaller datasets? It just seems so clear that's what happened in images. And I don't really know that the biology analogies to that.
Polly:
I mean, I think that's definitely...when I go to talks right now, everybody's always using transfer learning. It's because of the fact that it's hard to make measurements, right? Maybe you have one system, and you've characterized it to death, and now you want to know the other system. The ability to train on the system that you've really characterized well and then predict in a different system, that's hugely valuable. I think it's seeing applications all the time in biology. Maybe people have characterized one cell type really, really well. Now there's another cell type, but it costs so much money to characterize a cell type at that depth that now if they can use transfer learning to predict behavior for this novel cell type that hasn't been as well characterized, that's super valuable.
Lukas:
Well, I guess there's a lot of commercial interest in biology too, but it's maybe it's cheaper to classify images. It was interesting that one very motivated professor, Fei-Fei at Stanford, could make this amazing data set that kind of changed the whole field. And I sort of imagined that the same type of thing in biology would be expensive enough to make it complicated and hard. And maybe no one's really motivated to do this as a general works project.
Polly:
I guess, another thing is the ability to crowdsource measurements, right?
Lukas:
Yeah, totally.
Polly:
People are generating images all day long and uploading them and making them publicly available. I think the closest you come to that would be sequencing, right? People are sequencing and people are willingly sharing all of their genomic data with 23andme and ancestry.com and all of these places. That has sort of seen crowdsourced growth, and still not on the scale of images, but a huge amount of data. But I think what's really lacking is...we're getting more and more sequences and that's great, but in the same way that for the images you not only needed the images, but you needed initially to know, "Is this a dog or a cat or an arm or a barbell?", or "What is this?" That's what I think we don't have as much in biology right now. We have all the sequence, but we don't have the functional annotation that goes with it that allows us to make that same sort of progress. And I think to me that's the bottleneck, right? That's the thing that we're trying to solve.
Lukas:
I actually didn't really realize your work was on this. It's so cool that you're...I mean, it seems like actually collecting data at a far bigger scale would be the perfect thing to make the mathematical models work better. So it seems pretty cool.
Polly:
I've been obsessed with...Marcus Covert told me about this book, "The Weather Makers", that he said was really great, and so I read it. We both took different things away from it. But part of the book is sort of talking about 100 years ago, people had these kind of primitive atmospheric models where you could have a room full of people, all doing calculations in parallel. They would start calculating, and at the end of 24 hours they had the ability to predict what the weather was going to be in 24 hours in the future. It was like, all of these people calculating could basically just keep pace with time, and it didn't really give you any predictive power. Now, we have these weather models that...you can look 10 days out, and have a pretty good sense of if it's going to rain, if it's going to snow, what's going to happen with the weather. What Marcus took away from it is that, you really need to look at an entire system. Like a cell in its entirety, in order to really be able to model, and understand the behavior. What I took away from it was, this progress was really only enabled by the fact that we had weather stations around the world that were recording huge amounts of data, not in relative terms like, "Oh, it's 10% hotter today than it was yesterday, or it's going to rain twice as much today as yesterday." They were recording all of these data in terms of physical constants, like temperature and precipitation humidity. And that allowed us to develop these atmospheric models and to test the predictions of physical models, and to develop this predictive power. Our big push, using these technologies, using these microfluidic technologies, they make it possible to shrink biology, and make measurements at a much more rapid pace. We're really interested in trying to say, "Can we do this for biological systems, but can we also always do it in the language of physical constance, right?" There's quantities like energies that reflect how much energy it takes to fold something, and what the energy is when two different molecules come together. And so those are the kinds of quantities we're trying to measure, and I think that ultimately those types of measurements in concert with huge amounts of sequence data and ML algorithms that are seeking to predict the function of different sequences and how changes to the sequences alter function, those kinds of physical constants can be integrated with all of that other stuff to eventually attack these problems that seem intractable now. But so did weather prediction 100 years ago.
Lukas:
I guess, I feel like scientists always hate to answer this question, but I'm sure everybody's thinking when you used that analogy...When you roll this kind of work forward, 10 or 20 years or more, how would it affect like my day-to-day life? Is it like, a lot of diseases get cured, or I mean, what is the ultimate impact of this stuff?
Polly:
A lot of our science is pretty basic, to be unashamed about it. A defense of that is, CRISPR has been this amazing tool, and it came from people studying the mechanisms of bacterial immunity, right? Nobody was looking for things that were necessarily going to transform our ability to engineer genomes, but that's what we found in the course of doing basic scientific research. For me, what would be tangible things that I would hope could come out of some of our research are...I hope that we can characterize functional variants across some of these proteins, so that clinicians can tell their patients, "This mutation that you have..." The measurements we make could improve the algorithms that would allow a clinician to say to a patient, "You have this mutation, and I think it means that we should treat you with this drug." So sort of closing the loop on precision medicine, beyond the most common variants to more rare variants that people have to provide clinically actionable information. For precision medicine, some of the things that we're doing, where we're looking at these individual cells...We have these droplet platforms as well as some other platforms. I'd like those to become something where we could work with actual clinical samples to say, "We looked at the cells from your tumor, and we're able to say that a small fraction of them carried this particular resistance marker", or "We tested the drug sensitivity profile of the cells from their tumor, and so we're recommending this course of action." And then beyond medicine, I think the ability to say, "We've got to design enzymes with particular functions", right? The ability to say, "We've got this toxic chemical that has leaked out of this mine, how are we going to clean it up?" What if in the same way that you can write a program to do certain things, what if we could now specify the DNA that makes the protein that would have the capability of breaking down that compound to be non-toxic? To me, those are the three dreams of the research we're doing. And then I would say the last thing is, as a faculty at a university, my biggest impact will probably not even come from my own work, but I'm training all of these incredibly talented graduate students and post-docs. If one of them acquires the skills that allows them to go off and solve some of these problems or run things, then your function is fulfilled, right? So you're not only trying to do research, and take grant money that the public gave you reluctantly, and turn it into papers and knowledge. You're also trying to be a mentor that allows people to come through your lab to gain the skills that they need to be successful in the future, or in industry, or in research, or as public communicators, or whatever they decide to do.
Lukas:
Wow. I wish I could go back to grad school and work at the Fordyce Lab, that's pretty awesome.
Polly:
I love my lab, my lab is amazing. They're great.
Lukas:
One final question, because I know we have so many kind of students that watch this. If you were kind of starting your career...how would you guide an ML-oriented person early in their career who's really interested in the biological applications? Where would you guide them to kind of look for having a successful career for the next few decades?
Polly:
I guess, are they going to school, or are they not going to school?
Lukas:
Going to school.
Polly:
I would encourage...to me the people that have the most impact, and a lot of my collaborators, my ML collaborators, that I'm most always in awe of, I think what they're able to do is...Not only are they incredibly proficient at developing novel algorithms, I think that's hard, but there's a lot of people that are good at that, right? That's very competitive to be the absolute best at just straight up algorithm development. I think the ones that really try and engage with the biology, ask all the stupid questions, starting with the very first thing that they don't understand, right? Ask questions all the time, try and get references about the biology literature, or the physics literature, right? I really think that there's so many things in common between machine learning and the way in which it works, and statistical mechanics thermodynamics, and just the mathematical frameworks of both are really very similar. I think becoming conversant in both of those really positions you where I think it's easier to leverage these really powerful algorithms you're developing, to make the most progress in the field that you're trying to attack.
Lukas:
Awesome. That makes total sense.
Polly:
And the stupid question thing, I really mean it. I mean, when I started my post-doc, I had been trained in physics, and I asked questions where my advisor literally put his head in his hands because he couldn't believe that anybody didn't know this. The students that were training me would be like, "Can you go run this on a gel?" And I was like, "No, because I do not know how to run gels." Which is something that every biologist knows, right? And it was painful, I cried that first six months because I felt so stupid all the time. But I think being willing to ask those incredibly stupid questions over, and over, and over again, it makes that learning curve steep. It's painful, but it makes the learning curve steep, and I think it's the thing to do.
Lukas:
Awesome. Well, thanks for answering my stupid questions. That was a lot of fun.
Polly:
They weren't stupid at all. They were really good.
Lukas:
Thanks for listening to another episode of Gradient Dissent. Doing these interviews are a lot of fun, and it's especially fun for me when I can actually hear from the people that are listening to these episodes. If you wouldn't mind leaving a comment and telling me what you think, or starting a conversation, that would make me inspired to do more of these episodes. And also if you wouldn't mind liking and subscribing, I'd appreciate that a lot.