Alyssa Simpson Rochwerger on responsible machine learning in the real world

From working on COVID-19 vaccine rollout to writing a book on responsible ML, Alyssa shares her thoughts on meaningful projects and the importance of teamwork.
Angelica Pan

Listen on these platforms

Apple Podcasts Spotify Google Podcasts YouTube SoundCloud

Guest Bio

Alyssa Simpson Rochwerger is as a Director of Product at Blue Shield of California, pursuing her dream of using technology to improve healthcare. She has over a decade of experience in building technical data-driven products and has held numerous leadership roles for machine learning organizations, including VP of AI and Data at Appen and Director of Product at IBM Watson.

Connect with Alyssa

Show Notes

Topics Covered

0:00 Sneak peak, intro
1:17 Working on COVID-19 vaccine rollout in California
6:50 Real World AI
12:26 Diagnosing bias in models
17:43 Common challenges in ML
21:56 Finding meaningful projects
24:28 ML applications in health insurance
31:21 Longitudinal health records and data cleaning
38:24 Following your interests
40:21 Why teamwork is crucial

Links Discussed

Transcript

Note: Transcriptions are provided by a third-party service, and may contain some inaccuracies. Please submit any corrections to angelica@wandb.com. Thank you!
Alyssa:
One of the challenges in the healthcare space is often you don't get the answer to did this treatment solve the problem. You either get nothing happened after that, right, or maybe I went to a different doctor or somewhere and you just don't have the data, or maybe I didn't take my meds because I didn't pick them up or whatever else. But there's a lot of challenges in the healthcare space with actually getting good data sets in order to do machine learning.
Lukas:
You're listening to Gradient Dissent, a show about machine learning in the real world, and I'm your host, Lukas Biewald. Alyssa Simpson Rochwerger is an old friend and colleague and expert on real world AI. She's currently the Director of Product at Blue Shield California, and, before that, she was VP of AI at Appen and Figure Eight, the company that I founded and ran for a decade and sold to Appen. Before that, she was Director of Product at IBM Watson, where she was an important partner for Figure Eight, so she has over a decade of experience in machine learning, and she's the author of the book, Real World AI: A Practice Guide for Responsible Machine Learning, which covers basically everything we talk about here in this podcast every week. So I'm super excited to catch up with her and talk with her today. Tell me about your work on the vaccine. I'm dying to hear about it.
Alyssa:
Sure. So at the end of January, you may have heard, Blue Shield got asked to help with the vaccine rollout in California, and I was privileged enough to get a phone call, I think, the following Saturday from one of our senior executives saying, "Hey, Alyssa, can you come help? What are you doing right now? Can you join our meeting with the state either today or tomorrow?" I said, "Sure, Jeff, you bet." It was supposed to be a two, three-week thing, and I think this is week 12 where I've completely dropped my day job on the floor and just helping out the state, and so there's a team of us that has been deployed full-time, and it's been an absolute whirlwind and privilege and really exciting.
Lukas:
So what are you doing practically?
Alyssa:
Yeah. Have you heard of myturn.ca.gov? Which, if you haven't, go get your vaccine, schedule it. So there's a website where everyone in California can get vaccinated through and schedule appointments. We've been coordinating enhancements to that, working with the 61 different local health jurisdictions in California. Each one has a slightly different set of challenges and opportunities. So, for example, the Bay Area, we have really low hesitancy rates and a lot of really eager people who are willing to drive three hours to get their vaccine, whereas down in Southern California at the moment, there's where we have more supply and we're starting to experience more hesitancy. There's appointments availability pretty easily, hard to reach communities that are not interested or not able to access the vaccine, so we put a really heavy focus on equity and making sure the people who need the vaccine most get it first and are able to access it. This week, it's all about home-bound populations, people who can't leave their houses. How do you get vaccine to them? When these things come in 1000-plus dose things, you got to thaw them out. Pfizer's a deep cold freezer situation. It can only be at room temperature for so many hours. If you are an ambulance worker going to a home-bound person's house, you need special training to understand exactly how to administer this and how many houses can you go to before the vaccine expires, right? Anyways, so, logistically, super complicated, and so I'm helping on the operations and tech team, so everything from doing data analysis to understand where should we ship vaccine and who do we get it to to helping onboard providers. I think we've contracted with over 3000-plus or more providers in the state of California, so Kaiser's a massive one, Sutter Health, Dignity, but there's a long tail of much smaller clinics. Providers, I think, there's over 1500 clinics on MyTurn that are giving out vaccine across the state, so different challenges in Tulare County versus Alpine County versus the Moscone Center in San Francisco and the logistics of making sure everyone in California gets vaccinated.
Lukas:
Wow, and is it-
Alyssa:
So there's a lot to do.
Lukas:
At this point, is it mostly the logistical problem just getting the vaccine to the person that wants the vaccine, or are there other-
Alyssa:
There's a lot of challenges. The three big things that are limits to getting shots in arms are supply, so the first several months of this have been supply constraint. We only get so much supply from the federal government. The other potential constraints are ability to administer vaccine, so that was what we focused on really heavily for the first month and a half or so, the third-party administrator for California, is making sure that we could build up a network of providers who had the logistical capability to receive supply of vaccine and administer it, right? So you need nurses, you need security guards, you need freezers, you need ability to mass vax or whatever it is. Some of these are mobile clinics going into agricultural communities, some of these are how do you get the word out to people, so all that capacity. Then the last problem is willingness, right, as you need people, arms, to put shots into, and so some of that is a hesitancy problem. Some of that is ability to schedule an appointment, right? So 40% of California speaks Spanish, and then there's a long tail of other languages, Vietnamese, Chinese, Hmong, and how do you address and reach all those communities, not just logistically support them with making an appointment if they want to but also helping them understand that the vaccine is good and safe and they should show up and get an appointment. Curve balls get thrown like J&J no longer being administrated, so that was last week. I think we found out at 6:00 AM or something, and by three hours later, we were able to switch the supply to ... I think there were 8500 appointments in the 48 hours, and we had to switch to either Moderna or Pfizer for the vast majority and then reschedule a handful of those appointments.
Lukas:
I guess, as a data person, did you have feelings about the J&J decision? Are you even allowed to talk about things like that?
Alyssa:
Oh, I have no insider information. I read the news just like you do. I assume that the really incredible scientists and doctors who have been making this vaccine and diligently testing it and following the quality control protocols ... It's a good thing that they're pausing and reviewing it and looking thoroughly. I have plenty of loved ones who've received the J&J vaccine, and, so far, they've all been good and haven't had any problems, knock on wood, but I'm really glad that everyone's taking it super seriously.
Lukas:
Well said. I guess the main thing that I was planning to talk to you about was the book that you wrote, which is-
Alyssa:
I wrote a book, yeah.
Lukas:
Yeah, you wrote a book. Congratulations.
Alyssa:
Thanks.
Lukas:
Well done. It feels like real world AI is really, as long as I've known your career, what you've been working on, so it does seem like you would be the person to write this book. Actually, I'll say one thing as an aside. I'm always impressed by people that are able to write a book without feedback. Was it a challenging process? How did that go?
Alyssa:
I was voluntold to write a book, which I think I've said before. It was a fascinating process. I had a lot of help, great team. I'm dyslexic, could certainly not have written a book by myself, so-
Lukas:
Are you actually dyslexic?
Alyssa:
Yeah.
Lukas:
Oh, interesting.
Alyssa:
Yeah, extra-
Lukas:
How does that-
Alyssa:
Extra time in school, the whole thing.
Lukas:
Really?
Alyssa:
Yeah.
Lukas:
Wow. Man, working with you, I'd never noticed anything like that. Do you have a hard time reading?
Alyssa:
Dyslexia is an umbrella term. It means a lot of different things for a lot of different people. So me and my sister, both dyslexic, totally different manifestations. I cannot spell to save my life as an example. There are always typos and issues in every email I send, and I don't even see it. My sister is an outstanding speller. Math, not her strength. So our issues are different. To be diagnosed with dyslexic, you have to score above average or pretty high in certain categories and then average or below average in other categories, and the delta between those in enough categories is what classifies you as dyslexic.
Lukas:
Interesting.
Alyssa:
So, person to person, you could score high or low in totally different areas.
Lukas:
So I would imagine that would make it even harder to write a book, something that already seems very hard to me.
Alyssa:
Yes. No, so writing the book, it was a really interesting process, took a long time. We started out with an idea of what we wanted to do and organized that into an outline and then started fleshing out those outlines and interviewed you, thank you so much for your interview and contributing to it, and then lots of other folks who were willing to share their stories about what it's like to actually build and deploy machine learning based technology in the real world for real, actual use cases and not BS hype. What's great about the machine learning community is people are really nice and they want to share their stories and they want to help others is what I've found really consistently. Not every story were we able or authorized to use publicly and put in a book. There's a lot of lessons learned, and we had to anonymize quite a few. But a bunch we could, so it was awesome. The process is you do an outline, and then you talk through each chapter one by one, and then you go back and reorganize information or content that makes sense in perhaps multiple places. Our editing team knows how to write books, and they do this all day long for a living, so turned word vomit from Alyssa and Wilson into actual paragraphs and sentences.
Lukas:
I feel like you had a focus, as you do in your career, on ethics and responsible AI. Was it hard to get people to talk about that? It is in the zeitgeist, but I wonder if it's hard to get real world stories of tricky issues.
Alyssa:
Yeah. Easy to talk about at an abstract level, easy to talk off the record with people around lessons learned and challenges, harder to get them to go on record about failures and specifics of those failures in large public companies. Yeah.
Lukas:
Can you talk about some of the failures and what happened, some of the anecdotes that you have in your book?
Alyssa:
Yeah. I'll start with a personal one that I think we've talked about before, but when I was at IBM, I was new to machine learning and we were launching a visual recognition system. The API did a very general thing, but we were improving the accuracy of it, and I was new and I was like, "Well, how do you know it's better? How is the accuracy better?" The team settled on the F1 score as a fairly good measure of that, and our F1 score improved, and there was a big delta, and we were excited to launch the next version. A couple days before launch, one of the team members reached out to me and said, "Alyssa, we can't launch this," and I was like, "What are you talking about? It's better. We've all agreed. There's a lot of energy behind this." He sent me an image that I tested against the algorithm, and the tag that came back for that image was the word loser, and the image itself was a picture of someone in a wheelchair. I was horrified, and I thought that that was terrible bias that we didn't want to encode and we certainly didn't want to launch. It really gave myself and the team a wake-up call to, hey, how could this have gotten into our data when the accuracy is supposed to be better. The aha moment for me as a newbie was, well, stupid, of course, it's the data. The data and the training data and the tags that you've associated with that are the problem. So we have a great team where we went back and reviewed every single tag, which was thousands and thousands and thousands of tags and millions of image, and we reviewed it by hand as a team. We divided and conquered, and we pulled out quite a few objectionable things that we didn't want to be the public face of IBM. That took time and money and pain, and we were able to relaunch something that contained less what I would call unwanted bias in that particular system, but IBM's certainly not free from that, and many others have had challenges with visual recognition systems. Particularly, there's been a lot of talk recently about bias in facial recognition systems, so they're tricky to get right.
Lukas:
It seems hard to fix, but maybe even more concerning is that it was just caught by someone who happened to be trying something. Do you have recommendations on diagnosing these kinds of problem?
Alyssa:
Yeah. So that was a long time ago, and, since then, there's a ton more. But what I always say is, "Be proactive around the biases that you're looking for." There's a handful of biases that are regulated, right? You don't want to have gender bias, you don't want to have racial bias or other ones. So depending on what your system is deployed for, you're going to want to set up a proactive monitoring system to take a percentage of your real time public data, do it also in tests, but also, once you've launched the real time data, and siphon it off and review it, typically with humans manually, or at least set up some alerting if things fall or skew outside of what your normal expectations would be. That involves usually dashboards and a lot of data and looking through tags but also proactively setting up a feedback mechanism so that people can report things that you didn't hear about or you didn't think of and being able to escalate those quickly and react quickly and adjust and hopefully be able to retrain your model or remove it or have a back-up plan that does not include your model if you need to take it down for some reason for an extended period of time to mitigate things that you didn't anticipate.
Lukas:
But then, I guess, fixing it, it doesn't seem realistic these days to go through all of the training data set and take out everything objectionable, or maybe it is. On a model that's trained on millions or billions of records, do you have recommendations there for how to improve the quality of the training data in a maybe more cost-effective way?
Alyssa:
Even if it's millions or billions, it's where are you getting those millions or billions, and is there selection bias in where you're getting that data from? So take a speech recognition example. The speech recognition systems today available in the US are better at understanding men's voices than they are women's voices, or they are better at understanding people who speak English as their first language versus people who speak English as a secondary language, and that's largely due to the data that is collected, and it's thousands of hours of data collected, but if you're not actively collecting data from the populations that you want to serve, you're going to have a challenge there. So I think, even in aggregate, it's appropriate and quite feasible to think critically around where you're getting your data and does it reflect the community or the people that you are going to be serving with the model.
Lukas:
I guess it's a little bit of a different issue maybe, men and women versus English as a first language, English as a second language, at least if you think about ... Well, I don't even know, I guess. I guess there's more people with English as a second language, but you could imagine a case where there's a smaller group of speech patterns, for example. Do you think that you should collect at the ratios that you have, or should you try to over-collect the more rare cases? Do you have thoughts on that?
Alyssa:
I think that's where a team comes in, and I'd ask you ... I'd certainly skew towards over-collecting rare cases but definitely monitoring for those cases to understand how those are performing. I think as a team you need to understand and balance the business priorities because it's not always feasible to collect. So let's say you're trying to deploy audio recognition in a call center, and let's pretend you're, I don't know, Walmart, right? You serve most of the United States, but look at your customers. Do they skew to people who speak English as a first language or people who don't speak English as a first language? Are you going to deploy in your entire call center, or are you going to start with just California or just Texas? Start to look and deploy models in a small way is what I find often works best to find a narrow place to apply, and then scale up as you can prove success and also collect more data typically. Because let's say you deploy it in Texas, so Texas has a heavy Spanish-speaking population and you get a model that works well. Let's say it's only for handling returns. But if you then want to expand, say, to Georgia, well, a Southern drawl is going to come into place, and that model that you built for Texas is probably not going to work that well for the population in Atlanta, which skews more African-American versus Latino and that's a different sort of speech pattern. So you could deploy the same model, but you're probably not going to get the same results, and so needing to collect more data, mature it, and tweak it. I think less around, okay, how do you start right from the very beginning to try to do everything well, and it's more like, hey, start small with a specific and narrow business problem and do that well and then gradually grow and use perhaps different or related data as you grow in order to address those additional needs.
Lukas:
That makes sense, and that just seems like best practice for any case, even setting aside ethical concern.
Alyssa:
Yeah. I think one of the big not mistakes perhaps but challenges coming into machine learning is that there's a lot of hype and everyone thinks you can solve a really big problem with machine learning with magic and that's just never the case. It's much, much more successful to start narrow and start small and build out, and that's also a good way to address unintended bias, is by narrowing what you're trying to do because it narrows the data set that you need, it makes it less expensive, and it allows your pilot to be more successful.
Lukas:
I guess, as you were researching the book, what other sorts of anti-patterns or failures did you see besides those types? What other things did people run into?
Alyssa:
We talked a little bit about the Goldilocks problem, which is trying to pick the right problem to solve and pick the right size and narrow problem to solve that's well-suited for AI. I think another challenge is around team and getting a successful team in place that has the right mix of skills in order to successfully deploy something to production. This is not a case of a lone data scientist or even a team of data scientists building something. In order to actually get something into production, you need DevOps, you need data engineers, you need a UX designer, typically. You need a product manager, you need regular front-end software engineers and backend engineers. You need a whole team of people that is responsible for actually deploying something into a production environment, and that can often be, at many companies, harder than developing the model itself, is actually getting to production, because you start to run into things like legal and security and risk tolerance. All of those things mean you have to have a back-up plan and you have to understand how you are going to handle unknown and you're going to need escalation paths, and putting those sort of business and technical processes in place, often, it's the business thinking through that implications of the model or what happens when a decision is made and what happens if you need to explain why that decision was made to auditors or whoever is going to be scrutinizing this. That conversation, if you don't start early and don't involve those people early in the process, can be big blockers to launching something to production. So what I encourage folks to do when they're starting out is think broadly around the cross-functional team of people that you want to have on your bench. They need to be diverse, right? The finance people should certainly get involved. Sometimes, HR needs to get involved, and it's not just the engineers.
Lukas:
Is there any specific stories you can share where legal came in at the end and blocked something or HR or finance wasn't involved and then the project couldn't launch, even though the ML model was working well?
Alyssa:
So I'll use the Amazon one for HR, right? There's a very public story around how Amazon was trying to use machine learning to predict who was going to get hired at Amazon or who would be really strong candidates for jobs there. I don't know if it was HR that blocked it at the 11th hour, but they found it to be not serving the HR professionals and the goals of the HR professionals because it was super biased and it was biasing against women pretty heavily. So that's a scenario where the model was working very well, I think, at the beginning, which is predicting who would be strong candidates, but they weren't considering some other goals that were really important to Amazon, which is hiring a diverse employee base. So that's potentially a case where the training data wasn't appropriate, or I'm not sure exactly what went wrong behind the scenes there. Perhaps you know those people. I don't. But those are the types of things where legal or HR can say, "Hey, you know what, can't do this." I know Uber has also had challenges in terms of making sure that their escalation path for support tickets that they use machine learning to classify appropriately classify the right tickets in the right way to the right level of severity and scrutinizing that process. Because if you miscategorize something that's really urgent, that's a potentially legal challenge for the company.
Lukas:
I guess channeling what our audience asks me about all the time, I'm curious if you have suggestions for an ML practitioner who wants to work on something meaningful or wants to work for a company that really embodies responsible or ethical AI. Do you have any suggestions in what they might look for in the interview process or before that or maybe even companies that you think do this really well?
Alyssa:
Sure. I recently got out of the AI business, and I got into healthcare, which a lot of well-meaning mentors and people I admire are scratching their heads being like, "You left all these lucrative job opportunities on the floor to go into an insurance company. Alyssa, are you out of your mind?" Maybe I am, but what I looked for, I followed the money when I was making that decision, and I don't mean personally. I mean follow how the money goes in the business or the organization. So I chose to work at Blue Shield, which is a nonprofit organization, and the incentives for the company are to cover more people in California with health insurance at a lower cost. By law, we cannot charge more for premiums. If we accidentally take in more money than we pay out in healthcare, we have to give it back to the people of California, which, this year, because of the pandemic, the model's all over the place and wrong, and we ended up giving a lot of money back to our subscribers. So, for me, understanding how a company makes money and what drives the business will ultimately drive some of the models that take place. If you look at Facebook or you look at Google or you look at Amazon, these are for-profit companies. Facebook makes its money on advertising, and so they have some of the most sophisticated advertising models in the world around encouraging the right content in front of the right person. For me, that wasn't something that I wanted to spend my time doing. There's a lot of awesome people that work there, but it's not for me, and I decided to go in a different direction, and I think it can be really hard for people to take a really hard look at where they want to spend their time and their day-to-day and what problems they want to think about. I'm feeling really fortunate to be thinking about how to get more vaccines in arms. There's not much machine learning that's going into that, frankly. It's spreadsheets and pretty basic data analysis, but I'm thrilled to be spending my time doing it, and I hope that Blue Shield can work on some cool interesting machine learning problems in other areas.
Lukas:
I guess I wanted to ask you about that. It does seem to me like there are a lot of really interesting ML applications in this field. One of the Blue Cross, Blue Shields ... I think they separate by state, but I think one of them is actually a Weights & Biases customer, and I think Figure Eight had some customers in that realm. Do-
Alyssa:
Yeah. There's tons of interesting use cases in health insurance.
Lukas:
Can you tell me about some of the use cases in health insurance?
Alyssa:
Sure. A simple one that you know super well, Lukas, is around looking at healthcare data. If you're looking at an aggregate for thousands or millions of people, you're trying to understand what are patterns in terms of a patient's record over their lifetime that can be indicative of good outcomes, right? For example, I've been having carpal tunnel challenges from working at home and not moving nearly enough, and I went to the doctor, and they prescribed some steroids and some physical therapy and whatever else. But if you look a few months later, my hand was still bothering me. That didn't really work that well. So are there patterns that you can look at at a population level to recommend particular courses of treatment that work? From a machine learning perspective, if, and this is a big if in healthcare, if you have a good data training set that's cleaned and well-organized and you're able to access, you could look at large outcomes like that and say, "Hey, did Alyssa need follow-up after that? Did we have to spend more money on healthcare? Was her problem solved or not?" That's actually one of the challenges in the healthcare space, is often you don't get the answer to did this treatment solve the problem. You either get nothing happened after that, right, or maybe I went to a different doctor or somewhere and you just don't have the data, or maybe I didn't take my meds because I didn't pick them up or whatever else. But there's a lot of challenges in the healthcare space with actually getting good data sets in order to do machine learning. So that's one use case. There's other use cases that I would call simpler, like chat or people try to file claims or have billing issues and being able to respond faster to people and make our call center agents more efficient with their time by automatically answering tier one support issues like I lost my password or whatever else and being able to handle that in a lot of different languages. For example, some machine learning can support those types of use cases.
Lukas:
I guess, how real is this? Is ML chat used today? Like if I went to the Blue Shield website, would I interact with a chat bot?
Alyssa:
Yeah, we're rolling out chat. I don't know if you went today. I'd have to get back to you. It's not my particular area of ownership. But we certainly have chat, I think, also for the providers. I've learned a ton about insurance. It's an interesting space because you have customers that are members, like you buy health insurance from us or you get it through your employer, but we also have doctors who interact with the insurance company for lots of different reasons, so that's the providers, and then, also, the employers or brokers or HR people, and all those people need help. So I'm pretty sure our chat is rolled out for employers and brokers and providers. I'm not sure if it's for members. Then, also, we certainly have it internally as well, so if I need something as an employee, I can use our virtual assistant internally to order a new mouse or get provisioned access to a system or whatever else for IT support. That's actually been a really successful use case for us.
Lukas:
The health record stuff seems so evocative, right? I would love to be able to do a deep data analysis on my own health record and-
Alyssa:
Yeah, if you could get it. Ask your wife.
Lukas:
If I could get it, yeah, and look out into the future. Maybe these do exist. Would you say that your employer is currently doing analysis of health records to forecast what might happen to people?
Alyssa:
Yeah. Absolutely. We look at population health, and not just us, but we work with other companies who perhaps do some of this analysis. Then we actually consume the insights from those analysis, and we work with a lot of different partners. There's a platform we call Wellvolution, so I'll give you an example. We work with one company that has done a lot of analysis around kidney disease, so people who are on dialysis and getting good outcomes there. They figured out, "Hey, here is the right way to treat kidney dialysis patients that has better outcomes," and so we encourage and steer our patients towards this particular program because it's proven that it has better outcomes than perhaps treating it without this program. So that's an example where we try to recognize patients who have a particular diagnosis or condition and then encourage them to use the programs that have the best outcomes.
Lukas:
I see. So it'd take one hypothesis and just test it based on...
Alyssa:
No, not that I'm aware of. Maybe there are other people that I don't know about. But it's more around, if you look at population, the big things are the same things, right? It's diabetes. It's hypertension. These are the big things that impact our population, and so if you can encourage people to shift their often lifestyle habits to things that are going to be more successful, you can have better outcomes, but as it turns out, it's not easy. It's a lot easier said than done to get people to take their health seriously, and some people don't, right? Some people are like, "It's just not a priority for me to change my lifestyle to be healthier," and other people are super, super eager to do it, and then there's a bunch of us probably that fall somewhere in between on that spectrum, and we're willing to make certain accommodations or changes in our lives and others we're not. So how do you use different tools or different approaches for different populations to move them into healthier lifestyles? Because if you take a step back, at least, Blue Shield, it's not that we want to pay less money in healthcare costs. We want to get everyone healthier because healthcare as a force in the macro US economy is an incredibly inefficient expense that we simply can't afford. It represents a huge percentage of our spending as a country, and it's not sustainable, and so we need to find ways to get our healthcare costs down as an industry overall because it's just not something our economy can support.
Lukas:
What have you been working on, before the vaccine and post leaving Appen? What projects have you been-
Alyssa:
I was working on this longitudinal healthcare record problem. We launched a pilot, which I'm super excited about, and so for a certain percentage of our members, they actually can get their longitudinal patient record with every provider data, if we have access to it, so that's a big if. But if you've submitted a claim to Blue Shield or your provider participates in one of the statewide networks, in California, it's called Manifest Medex, and they have thousands of providers that send data, and by provider I mean doctor. So if you're a large healthcare institution, you may participate in this, and then we show it to you as a member, and you can look at your record. Then we also recommend things that perhaps you haven't done, so if you haven't gotten your flu shot or if you haven't been to your annual check-up or you're overdue for a cancer screening or something like that, we'll say, "Hey, Alyssa, you haven't gotten your pap smear this year. Go get it done." You can interact with us, and you can say, "Oh, actually, I already did it, and you just don't have the data," or, "Thanks, let me set a reminder to go get that done." So that's my baby, my project that I was working on before I got pulled into the vaccine work.
Lukas:
That's so great. If I'm a member of Blue Shield, could I use it?
Alyssa:
Yeah. It's rolled out to, I think, about 50,000 people right now, and we are working on it and hopefully going to ramp it up to more Blue Shield members in the future.
Lukas:
That's so cool. I would think, for myself, there's things that no ML algorithm would be needed to tell me would make me healthier.
Alyssa:
Yeah, a lot of it is pretty simple, right, like, "Hey, you did or you didn't do this." It doesn't require machine learning. But what has required machine learning type of thinking in this project is around, frankly, data-cleaning. So we may have multiple records of the same medication for the same member, right, like I get prescribed birth control every single month and I have multiple prescriptions assigned that overlap with each other, so if you look at the last 10 years, if I'm displaying that to me as Alyssa Simpson, I don't want a list of 10 years of data worth of every medication I've ever subscribed. I want you to group it logically by the brand name or the medication type, and so that is a data-cleaning machine learning exercise around grouping medications together because one pharmacy may have reported it with slightly different wording or dosage or something versus another pharmacy or another doctor, and so organizing that information, machine learning could be and natural language processing could be super useful there.
Lukas:
I guess I should say, and we were joking about this, but my wife runs a company called PicnicHealth that does a lot of this stuff, so-
Alyssa:
Which does a bang-up job, by all accounts.
Lukas:
Yeah, it does. In my unbiased opinion, it's fantastic at doing this kind of work.
Alyssa:
I've heard. I've heard they're really good at that, yeah.
Lukas:
I guess, why do you think that these health records end up so hard to structure?
Alyssa:
Ask your wife. She knows way better than I do. From my limited understanding, I think it's because the healthcare system in the United States is just really, really fragmented and there's so many different entities in the data chain. I'll use a personal example. This week, I get headaches, and a doctor prescribed me a new medication. I have had headaches for a long time, so I've cycled through all the normal ones that someone would use, and this one is an expensive medication and it's out of the bounds of normal, and she prescribed it a week ago Monday to me. My pharmacy followed up with me that same day saying, "Hey, we're working with your doctor and your insurance to get this covered, and we'll get it out to you." A couple days go by, I still don't have my medication. I followed up, and they say they're working on it, blah, blah, blah. But the number of different entities that have to touch or approve this end to end from my doctor and me having a conversation and her prescribing it to me getting it is, I'm not joking, probably 10 different systems, right? It has to go from the electronic medical record that my doctor is using, and that has to go into an intermediary system that goes in between the doctor's office or the hospital and the insurance company, and so there's a third party in between that processes what's called prior authorizations. Then the insurance company has to ... We don't directly integrate with that particular third party, and so we have to do some data moving around in order to get it to the right person in our system to approve that. Then it has to go back to the doctor's office, but then there's this pharmacy over here that hasn't been involved in any of this so far, and there's a bunch of systems that they use in between. The short answer is there's a lot of different systems involved, and they don't all talk to each other very successfully, and the data gets manipulated and changed, and there's different standards and different data systems. Even though there are standards around healthcare data, I think July they go into effect in California in terms of being mandated to follow certain types of standards for certain narrow use cases, but there's just not a ton of structure for these different data types, and so they've evolved in different ways. Even the electronic medical record, we're dealing with this in the vaccine world, how does your doctor know that you've been given a vaccine? Well, that's kind of a challenge because let's say you got it at Walgreens. They may have taken your insurance and then maybe they submitted a claim to your insurance, maybe they didn't do that if you didn't have insurance, but they are not reporting it back to your doctor's system. Anyway, there's a lot of different software systems that are being used and there is not standards, whereas if you look at different countries that have more nationalized healthcare systems, there's one, two systems, and so there's just a lot less fragmentation, whereas California there's 8000 different providers and there's 10 major electronic medical record systems, three of which are really big, but there's a long tail for the rest of them. A place like Walgreens doesn't use electronic health records system or Safeway. They are a pharmacy. They use pharmacy systems, which are different than the hospital systems. So that's a long answer to your question, but it's the basic data.
Lukas:
Yeah. It's funny. I could see how years of working in ML would prepare you well for the American healthcare system.
Alyssa:
But it's basic data problems. It's not particularly sophisticated machine learning problems. It's data hygiene.
Lukas:
Right, right. Although it seems like that's the problem everywhere, right?
Alyssa:
Yeah, that's the probably everywhere. Yeah, exactly.
Lukas:
Were there other surprises going from, I guess, a start-up to an insurance company? How similar is your job doing product there?
Alyssa:
I think product management is similar no matter where you do it. It's always balancing stakeholders and priorities, and the day-to-day is certainly different in different types of companies, but I think fundamentally my skillsets are the same and the job I do is roughly the same. I think the problem space is really different and the excitement I get around it is really different. To answer your question earlier around how do people navigate into working on problems that they really want to work on and really love and how do you do that is follow where your interests are. I'm thrilled to get into the weeds of doing data munging, and I personally wrote, I think, 500 different data validation rules for prescription organizing, looking through hundreds of records of different types of prescriptions and how to organize some basic data hygiene rules. That was super fun, and I was thrilled to do it. It was painful work, and, certainly, I have other skills, but I was real excited about the problem that we were solving, which was launching a cogent experience to my friends and families who are members of Blue Shield around being able to look at their longitudinal patient record and not have all this messy duplication of data that they're showing, and so ... Sorry, I got a little off-track, but-
Lukas:
No, no. I totally, totally relate to what you're saying, and I think that's incredibly good advice.
Alyssa:
When I was at Appen and Figure Eight, some of the problems we worked on were super interesting and awesome and others weren't as close to my heart. We were optimizing advertising dollars or whatever else, and those are things that I just get less excited about, personally.
Lukas:
Totally. Well, I guess we always end with these questions, but it's funny because they're so relevant to your book, because what we want to talk about on this podcast is really just making ML work in the real world. But I want to ask them to you and get your take from all of the research that you've really done, and maybe get as specific as you can, but what do you think is an underrated aspect of machine learning that you think people should pay more attention to than they currently are?
Alyssa:
I think teamwork. We talked a little bit about this, but it's really teamwork. I think there's a misconception that machine learning work is pretty solitary and you can teach yourself to do it or you can do it by yourself on a laptop or whatever, but it's a team in order to deploy anything functional that matters, and it takes a lot of different skillsets. For the team to work together successfully, it's really around best practices of any team functioning successfully and has less to do with machine learning, but I think that often gets overlooked because there's a lot of focus on the technology and the right hard skills and the right technical systems. I think it's really easy to overlook the team dynamics of getting people to work together well, whether that's quality engineers or data folks or project managers or designers or scrum masters. You need a team of people who trust each other. We certainly have plenty of those problems at Blue Shield or any team that I've ever worked on, where people don't necessarily trust each other and they may be critical of others' work or they may have communication challenges or whatever. Particularly remote, some of those things are harder to smooth over, but for successful machine learning teams that I've worked with, they have high trust, they have high collaboration and cooperation with a diverse group of people, and they welcome outside ideas and people who are willing to roll up their sleeves and get dirty.
Lukas:
If you think of ML practitioners that you've worked with, for someone that's listening to this, is there any resources that you'd point them to to become a better team member? Has there been a book that you've read or an article that's helped you with this?
Alyssa:
One of the books that was recommended to me that I really like around teamwork is called Turn the Ship Around, and it's a book that goes behind the scenes of a nuclear warship that was being deployed, and it was written by the captain of that ship. He came in, and he took over the ship, and it was a low-performing team, but, at the end of the day, it was a nuclear ship, and I'm going to totally botch all of the military stuff and get it completely wrong, but really important to do it well, can't screw it up.
Lukas:
Yeah, yeah, totally.
Alyssa:
The team hadn't been collaborating well, and he goes behind the scenes and talks about his time literally turning the ship around to get it ready for deployment, to go back out into doing whatever it's supposed to be doing, but it couldn't leave the harbor until it passed all its safety checks and the team was functioning better. They were working on this top-down approach and everyone covering their own butt and not necessarily really thinking critically about what they were being asked to do and how to do it better for the right outcomes. Anyway, I love this book, and I think it applies in business and all sorts of different settings, particularly machine learning, because it's high stakes often, what machine learning projects are being asked to do. The problems are big, and they're important, and they're worthy of solving, but they can also have pretty dangerous or negative consequences if they're not done well, and so this is a book with an analogy that I like to a nuclear warship because it's an important problem and it requires a huge team of people collaborating towards the right outcomes.
Lukas:
Wow, I love it. Oh, I'm going to read that book.
Alyssa:
I'll send it to you.
Lukas:
Awesome. The question that we always end with is ... And this is what you spent, I think, most of your career on, so I'm curious what you think is the biggest thing here. But we always ask what's the biggest challenge of making machine learning work in the real world or where there's specific pitfalls where you see machine learning projects fail.
Alyssa:
Yeah. We certainly talk a lot about this in our book. I think a few major areas are not having the right team, not having the right problem, not having the right data, and ... I don't know. I could go on. I think those three are probably the big ones. The data to me is often the long pole.
Lukas:
I guess, for more, read the book.
Alyssa:
For more, read the book, yeah.
Lukas:
We'll put a link to it, and, yeah, you should read it. Thank you so much. I really appreciate it.
Alyssa:
Thanks, Lukas. It's a pleasure to be here, as always, with you.
Lukas:
Thanks for listening to another episode of Gradient Dissent. Doing these interviews are a lot of fun, and it's especially fun for me when I can actually hear from the people that are listening to these episodes. So if you wouldn't mind leaving a comment and telling me what you think or starting a conversation, that would make me inspired to do more of these episodes, and, also, if you wouldn't mind liking and subscribing, I'd appreciate that a lot.