Kathryn Hume — Financial Models, ML, and 17th-Century Philosophy

Kathryn explains how the Royal Bank of Canada is using machine learning, and explores what Descartes and Newton might have thought about ML.
Angelica Pan

About this episode

Kathryn Hume is Vice President Digital Investments Technology at the Royal Bank of Canada (RBC). At the time of recording, she was Interim Head of Borealis AI, RBC's research institute for machine learning.
Kathryn and Lukas talk about ML applications in finance, from building a personal finance forecasting model to applying reinforcement learning to trade execution, and take a philosophical detour into the 17th century as they speculate on what Newton and Descartes would have thought about machine learning.

Connect with Kathryn

Listen

Apple Podcasts Spotify Google Podcasts

Timestamps

0:00 Intro
0:54 Building a personal finance forecasting model
10:54 Applying RL to trade execution
18:55 Transparent financial models and fairness
26:20 Semantic parsing and building a text-to-SQL interface
29:20 From comparative literature and math to product
37:33 What would Newton and Descartes think about ML?
44:15 On sentient AI and transporters
47:33 Why casual inference is under-appreciated
49:25 The challenges of integrating models into the business
51:45 Outro

Links

Watch on YouTube

Transcript

Note: Transcriptions are provided by a third-party service, and may contain some inaccuracies. Please submit any corrections to angelica@wandb.com. Thank you!

Intro

Kathryn:
We would love to go in, have the business partner say, "Here's my task, here's my baseline performance. Let's see if this is viable. If you can increase performance upon this baseline by X%," or whatever. I don't think I've ever seen an instance where that happens in real life.
Lukas:
You're listening to Gradient Dissent, a show about machine learning in the real world, and I'm your host, Lukas Biewald. Kathryn Hume runs Borealis AI, which is the AI arm of the Royal Bank of Canada. She works on a slew of machine learning applications that we get into. She has a background in comparative literature and speaks Latin, and that's surprisingly relevant to the work that she does and to our conversation, and to find out why you're going to need to listen to this one.

Building a personal finance forecasting model

Lukas:
All right. Why don't we start with what you work on? You seem like you have like a really interesting job in an interesting organization. Maybe you could describe it and talk about what a day in the life is like?
Kathryn:
Yeah, for sure. So I lead up a group called Borealis AI, which is the machine learning research lab for the Royal Bank of Canada. For those listening in the States or outside of Canada, you might not know of it, but it's actually the largest bank in Canada, and I think it's the ninth largest bank in the world, so it's a pretty big shop. There's 90,000 employees in the company. Borealis was founded in 2016 as just the ML research center for the bank. Do day to day in my team, I think like many other ML shops, we learned over the years, that it takes more than just scientists to really make ML production, ML systems work. We've got a good group of machine learning scientists. We have ML engineers who do a lot of the work in building...taking the code from the scientists and really building out production ML systems. We have product managers who do what product managers do, figure out what we should build and really collaborate to make sure we ship things on time. And then we have a group of business development experts who work with our business partners. In the bank there's mini markets, if you will. There's the retail bank, which works with people like you and I, so checking accounts, savings accounts. There's wealth managers who help manage people's assets. And then there's capital markets, which is sort of the institutional investing. They partner with those various teams and help us find ML use cases.
Lukas:
Could you describe what those use cases are?
Kathryn:
Yeah, for sure. I'll talk about some of the products that we've worked on and various use cases that we see. So it's a broad variety, I'd say, of applications. If we go into the retail bank, things that are helpful for people like you and I, one of our recent applications was a cashflow forecasting application for day-to-day customers. Probably there's lots of customers out there who have missed a bill payment or potentially overdrawn their account and gotten fees from the bank for having insufficient funds in their account. We built something that's trying to predict upcoming payments in the next seven days. We stuck at a target of about a week out to give people reminders that say, "Hey, this thing is coming. You might want to either pay it now or take different kinds of actions in managing, moving money from savings to checking, et cetera, to be able to cover those expenses."
Lukas:
Wow. Is that really live? That really works?
Kathryn:
Yeah, it's live.
Lukas:
I've never gotten a message like that from my bank. Maybe I should switch banks.
Kathryn:
It's live as of...maybe a month ago we went into production? So it's really new.
Lukas:
Wow. How cool.
Kathryn:
But, yeah. That's one of the latest ones we put out, and it's an interesting ML problem because these...if we think about ML models, they're going to take a series of time series data, things that you've paid in the past and then try to generalize a prediction, "What's that going to look like in the future?"
Lukas:
But if you have something, say it's your electricity bill or your phone bill...my phone provider decides every once in a while that they're going to increase my rate arbitrarily, and I don't know why this is coming, but it just happens. And it goes from $75 a month to $85 a month, and I have to just pay that extra fee.
Kathryn:
One of the cool things in our model was, we needed to use an attention mechanism, so as to basically not overgeneralize that trend, right? It's not going to see that there's... It's 75, 75, 75, 85, 85, and then imagine that three months later, it's going to go to 95. We had to sort of correct for those kind of stepwise changes that a provider might make, that an algorithm might incorrectly generalize as a trend. It was one of the many micro nuances to actually making this thing work.
Lukas:
That's cool. Or your energy bill might go up in the winter or I guess in the summer if you're in a hot place, right?
Kathryn:
Yeah. There's a lot with seasonality for sure. Yeah. You can't just take the average trend over a year or else you're going to end up with... I don't know. Somewhere in the midpoint in September when it's actually... It would tend to be more stepwise. Yeah.
Lukas:
Can you talk about how you came to make that? I mean, I guess I always assumed that banks kind of liked charging really high overdraft fees and would want you to maybe do that, but I guess that's wrong. How did you kind of come up with that as...working with the business to know the business wants that, and then also realizing that's a feasible ML problem that you could actually solve?
Kathryn:
I was actually quite proud of this. At the Royal Bank of Canada, there's a little bit of...it might have to do with the Canadian banking mindset, but there's...part of it is be a profitable institution, but then equal in part is, "Be a good citizen, be good to the Canadian citizens." Slightly different than in some of the US environment in that the Canadian population is very...it's not underbanked. "Underbank" is a term that we use for people who are not within the recognized banking system, so a Chase customer or a Bank of America customer, or a Morgan Stanley customer. I think in the US, I don't know exactly what the statistics are, but it's something like 30-ish percent of the population is actually not using a bank, a registered bank. They use things like payday loans and sort of the sort of on- the-side type banking products. But in Canada, I think it's 98% of the population actually is in the main banking community, which then has implications for sort of social responsibility from these pretty large institutions because you've got the whole population represented. I found it quite promising that the executives...the business executives making the decisions felt that it would be better for customer loyalty to provide a service that helped versus do this sort of nickel-and-diming on these kind of fees. Finding the use case, it's always an iteration. I think in the ideal, coming from the academic ML world, we would love to go in, have the business partner say, "Here's my task, here's my baseline performance. Let's see if this is viable. If you can increase performance upon this baseline by X% or whatever. I don't think I've ever seen an instance where that happens in real life. It's more like, "Hey, we have this idea. What do you guys think?" And then we're like, "All right. Let's play around with some of the data and see what we can find," and see if there's a there, there. We'll come and we'll say, "All right, we think this is the task," and then it's like, "What timeframe do you need?" When we originally started this, I thought cashflow was six months out. Then our partners were like, "No, no, no. One week." And I was like, "Okay. Well, that helped." So there was sort of iterations on just narrowing down the scope of the prediction, what qualified as decent performance. I often find with the business, the preference is it's accurate every time.
Lukas:
That would be the preference.
Kathryn:
Yeah. The preference is always, "Actually there's no machine learning involved and this is just a rules-based system that works like clockwork." So then it's sort of iterations where it's, "All right. Well what if this edge case, the prediction is off? How might that impact customer experience?", this sort of iterative negotiation to get to the point of, "Yeah. We're comfortable with this as a starting point." Then there is the selling and pitching and telling the story, getting the various people involved to get it to market. I'd say with our group too — since we're a machine learning team, but we partner with other groups in the bank that do design, that do just a lot of those sort of ticking the boxes around all the business processes — there's a lot of back-and-forth in stakeholder management to really get something live as well.
Lukas:
What ended up being the level of accuracy that you could get with that sort of seven-day prediction if I'm going to overcharge my account?
Kathryn:
It varied per payment type. If you had a pre-approved payment, like your Spotify subscription or whatever, it was quite high accuracy. Also the day, the day that the payment would come out was quite high. We had a multi-objective supervised learning algorithm that predicted...the one task was "How much?" And the second task was "When?" With those, it could get pretty high. I think within three days range we were at...I don't even remember. I think it goes down to 88 or 89% if it was...no, sorry. Within three days, it was up to 98. And if it were the exact day, it was more like 88, 89. So with those pre-approved, it was quite high. Once we got into things like loan payments, anything that's sort of an arbitrary e-transfer, like a Venmo payment, those are harder because there's just not a lot of predictive...there's more variability. There's not that kind of just standardization. So, yeah. It varied per payment type, but sometimes it was as high as 98 and then it would skew down to about in the high 80s.
Lukas:
You were using attention, a model that included attention? It sounds like a lot of machinery for this kind of problem. Did that really matter? Did it do much better than a simpler baseline?
Kathryn:
Yeah. It's a great question. I posed the same question to the team, being like, "Wow, this is a lot of machinery for this kind of problem." I think it came down to, again, the seasonal variability. You'd imagine that it's something where it's...it seems like it could just be a pretty standard approach. But, just with these things that creep up, like seasons, like variation and payment time per individual...some people have set up automated stuff and the minute it goes, it comes. Other people, they haven't, so they do it manually and there will be these lags in when they pay so that has implications not only on this thing is due, but how that impacts their balance. Once you get into the details, it becomes more messy and needs more machinery.

Applying RL to trade execution

Lukas:
Can you describe some of the other applications? That seems like such a surprising and cool and one but what are kind of the main bread and butter bank applications of ML?
Kathryn:
I want to talk about another cool one.
Lukas:
Oh, yeah.
Kathryn:
I want to do another cool one. I'll do a cool one first. This one is really artful because you have to scope it down really small, but it's cool. It's using reinforcement learning for trade execution. Here's the problem. Imagine you're a big hedge fund and you trade every day, right? You just come into the equities markets and you order millions of orders of some sort of stock and you trade it through the day. The question at hand is, you come in, you decide that you want to execute a million orders of Google shares over the course of the day. The stock market opens at nine, closes at 4:30. The question is, "How do you distribute that order optimally throughout the day so as to achieve your desired returns targets?" There's a common historical algorithmic approach to solving this problem, which is called a VWAP algorithm. VWAP stands for Volume Weighted Average Price, it's the average price of the stock throughout the day, weighted by the volume that's traded, as the name suggests. I don't know when these kind of algorithms came into being, but I think they date back to the 90s or something, so it's-
Lukas:
Sorry to interrupt it. That seems like it would be a number. How is that an algorithm?
Kathryn:
It's a curve. It's a number, but it's a number...if you trace it throughout the day, it will change slightly, right? The algorithm that we took was, "What trades do you place to hit that number?"
Lukas:
Oh, I see.
Kathryn:
I won't go into the complexities of how the limit order system works, but effectively think, "Do I sell, hold or buy the stock?" It's not exactly that, but I think for the sake of...we don't have to go into this arcane detail. You could buy or sell or hold, right? So this is where reinforcement learning comes in. You have a sequence of a couple of kinds of actions.
Lukas:
Okay.
Kathryn:
You might sell, you might buy, but yeah, you're trading stocks throughout the day.
Lukas:
What would the simple algorithm be? Because you don't know what the stock price is going to be, right?
Kathryn:
You don't know what the stock price is going to be, but the simple algorithm, from what I understand, is you've got historical price curves. How the stock performed yesterday, two weeks ago, a month ago, et cetera. You'll use that to make a guess on how much you should buy at a certain time of the day in order to achieve your target goals, the target money-making goals that you have for the day. The algorithm basically releases – buys — or holds, right? Or doesn't release. And it partitions that at timestamps, "At 9:00 AM, I'm going to do X, at 9:15." What you don't want to do is — say if it's a large order, a million orders of stock — if at 9:00 AM you say, "Buy a million," that has a big impact on the market price and it will sort of shake things off, right?
Lukas:
I see.
Kathryn:
You try to dose it. Just a little bit at a time, without disrupting the ship, right? With keeping the market relatively stable. Because you're one of the participants, but there's going to be however many others on the exchange at that time. So what we did here is we said...we were trying to hug that number, the price number across these time stamps. What we said is, "Can we use reinforcement learning to optimally distribute, dose our buy decisions to stay as close as possible to that curve?" The value function was basically just, minimize the distance. We can observe that number.
Lukas:
Sorry. The curve is the volume?
Kathryn:
The curve is the average price weighted per volume.
Lukas:
It's the price weighted by the volume. Okay. And you want the curve to do what?
Kathryn:
What you want to do, there's a trading strategy where you want to hit that curve. You want to sell or buy in such a way that the price-
Lukas:
It matches.
Kathryn:
-that you arrive at matches that curve. Exactly.
Lukas:
Does it work?
Kathryn:
Yeah. The cool thing is it works quite well, even when there's a lot of volatility in the market. In March of 2020, when COVID hit, it was just much more volatile than the stock market normally was. What was nice is, it adapted. I don't know the exact time it took to adapt and nonetheless get superior trading returns. It may have taken a day. It may have taken a couple of weeks. I'm not sure on that timeframe, but I know that it adapted much better than a standard trading algorithm would.
Lukas:
I guess you're constantly retraining the algorithm then?
Kathryn:
Constantly retraining the algorithm. A different team — not our team now, but the team that now owns that algorithm — are working on adapting the task to different kinds of trading styles. Not this "Hug that VWAP curve" that I described, but there's other approaches and strategies that one could take when trading, and they're retuning it to see if it could work there. There's a lesson though in reinforcement learning and that it's not...you can't just scale it to a new use case. It requires significant effort to write a new algorithm that will work with a different task.
Lukas:
Right, right. Interesting. What are the other kind of important applications to you?
Kathryn:
Yeah. So other, more bread and butter applications.
Lukas:
Or other cool ones, I guess if you've got other ones you want to talk about.
Kathryn:
There's another cool one we could talk about down the line that's that's a little less related to banking. I like to think about it this way. What does a bank do? A bank takes in money at one rate — you put money in your checking and savings account — and it loans out money at a different interest rate and it makes money on the spread, right? That's kind of basically what a bank does. Historically, when banks have used models, statistical models, to decide how much they should lend to a given customer, there's plenty of background models that are using linear regressions, et cetera, to do this. But there's a lot of opportunities to upgrade some of those decisions using ML with more data, different types of data. But the basics of, "Who should we give a loan to? How much should that loan be? What is the risk that the person will...that we incur that the person might default on this loan? If they do default, when should we call them?" Ranking that ordered queue. There's process optimization. We have our call center. Very often, we go into... especially today, we have digital banking. You go on, your password doesn't work. Something happens, you're stuck. You have to call somebody. There's a lot of applications in call center automation. The conversational AI work automating some of that queue, rank ordering queues. Whose call should we take first to approach this? Banks often have a series of products. Which product offering do we send to which customer next? Those are kind of more standard industry problems. I think that exists everywhere, it's not unique to banking. It's sort of the next-best product offering optimization.
Lukas:
Do you work on all of those problems? You and your team?
Kathryn:
We don't work on all of those problems. No. We do a lot of work in credit, but there's other teams in the bank who work on various other data science problems like this.

Transparent financial models and fairness

Lukas:
Well, tell me about credit. That's something I don't know a lot about, but I do know that we've had various sort of mortgage crises that I think were, at least...the publicly available information seemed to indicate that it was too much machinery leading to bad decisions. Do you think that's accurate? How do you think about machine learning in that context?
Kathryn:
I'll speak on this from the perspective of somebody who was not a banking expert in the 2007, 2008 credit crisis. This whole collateralized debt obligations, right? There's the tip of the spear, which is two people deciding to...let's say, I decide to lend you 10 bucks, and I think about whether or not you're going to give that back to me. And I say, meh. If he doesn't pay it, it's also okay. All the way to, "I've got a set of mortgages and I'm a different institution who is going to hedge a strategy on some other institution's mortgages, and I don't have insight to the quality of them." That's when you get into these sort of layered risk management strategies. In terms of ML's engagement here, I think the big thing is the regulators have caught up. There's always...some action happens and then activities, and then there's been a lot of regulatory oversight since 2007, 2008, to try to protect the economy by putting some limits on banks. There's a thing in Canada, we call it the CET1 ratio, which is a ratio that's used to manage the liquidity that a bank has, this sort of overall cashflow over risk-weighted assets. These are something like a mortgage, right? The asset that a bank might hold that has some risk associated with it. A bank has to manage that ratio in such a way that if something bad happens, there's still relative stability. If you think about adding ML into this mix...let's say we were to use ML to calculate the risk factor in the denominator of that one equation. They want a lot of transparency and explainability, right? There's a lot of governance oversight that's like, "We're not just going to put in a black box neural network and see what happens." High need to select models for those kind of use cases that are quite transparent and audible and where you can clearly understand how input feature is leading to output.
Lukas:
What kinds of models do you end up using?
Kathryn:
It varies per use case and context. Ranging from the cashflow one that I talked about is a deep LSTM. There's an LSTM backbone also in the reinforcement learning one for trade execution. Two, sometimes. I'm not sure if my team has done much, but there's a lot of decision trees. There's a lot of XGBoost models for some of the credit work. We have a governance tool that we've built that is optimized for decision trees, because there's a lot of models in the bank that use those.
Lukas:
This is a single tree or a boosted set of trees?
Kathryn:
It varies per use case again.
Lukas:
It seems like probably a lot of applications, but mortgages specifically has kind of a long history of at least racial inequality. How do you think about that? Are you able to look at the models and get some sense if they're being fair? How do you even define what fairness would mean?
Kathryn:
Yeah, great question. We haven't done any work on mortgage predictions in particular, but we have done some work with credit and we do fairness. There's a lot of fairness tests prior to putting a model into production. At the bank, there's a group called Enterprise Model Risk Management, and there is...it's interesting. I don't actually know if there's a preference for individual- or group-level fairness testing. I do know that there is a tool we've built that focuses on individual fairness.
Lukas:
Sorry, what would that mean? Individual fairness versus group fairness?
Kathryn:
If you've got two groups where a group is defined by some similarity on a feature — let's take the example of race — so you've got the black group and the white group. The group level fairness is going to be, "Is the error rate on the black group proportionate to the error rate on the white group for some prediction task?" If you go into individual-level fairness, if you have a set of features that are similar to my set of features, then if I get a $5,000 loan, you too get a $5,000 loan. So we have tools, but I still believe there's a decent amount of subjective interpretation that goes into, "What aspects are we trying to calibrate as 'fair'?"
Lukas:
Yeah. I mean, sometimes it seems to me with machine learning, it forces us to be more clear about what we mean by fairness and that can...just the way it's easier to kind of quantify the unfairness sort of leads to a lot of debate, right? I mean, how do you account for features that are correlated with group fairness, right? It seems always challenging. What does it mean to really prove that your model is being completely fair? It seems like a hard thing to rigorously define, although I'm sure a lot of...I mean, we should get people who have thought about it deeply on this podcast, for sure.
Kathryn:
The last I checked, there were 21 current interpretations, technical interpretations of what fair means.
Lukas:
Is there a list somewhere? Do you have a link that you could give us?
Kathryn:
Yeah, I can definitely find it. There's a paper from..this is from 2019, so maybe it's been...or something like that. But, yeah. I can definitely send a link after this call. I've seen things like...at the bank, there was one where these proxy correlations, you might want to say we don't want to discriminate by gender. It was one for a business loan, but they kept in the business code type. It was restaurant, retail, manufacturing, blah, blah, blah. And one of them was beauty and spas. As it happens, some very high percentage of the proportion in Ontario that are beauty and spa owners are women, so there's this proxy encoded. They sneak up all of the time, right? If you really dig into it, you can keep going and uncover these potentially unfair variables, so.

Semantic parsing and building a text-to-SQL interface

Lukas:
Interesting. We found some other interesting applications that you've talked about or your team's talked about, like a text-to-SQL database interface. Would you want to talk about that at all?
Kathryn:
We built this tool called...we called it ALANN, which was "A Listening Answering Neural Network," I think is what the acronym stood for in the beginning. It's a text-to-SQL interface. Basically, user comes in, poses a question like, "Find the highest-rated stocks in my portfolio" or something like that. And the system takes that query, goes into an SQL database, and — one —parses from a natural language utterance into something that's a little bit more structured so it looks like a SQL field, and then — two — can actually go and compute the operation and output and answer. So, "Google is the stock that has the highest-rated portfolio" or whatever it was that I said as the potential question.
Lukas:
Right, right. How did you frame that even as a machine learning problem? How did you get training data? Did you view it as an NLP, like a sequence-to-sequence model type thing? Or how did you think about that?
Kathryn:
Yeah. It's a great question. The person on the team who built it, Yanshuai, would be better equipped to answer it than I, but it was framed that way. Framed as a sequence-to-sequence mapping problem. When we started the application, Transformers hadn't really taken off yet. Midway they had, and it ended up being sort of this, "How can we adapt Transformers to very small datasets?" Because we have very small...there's close to no training data mapping natural utterance to extremely structured, pseudo SQL. So we built this, we kind of bootstrapped this pseudo-SQL database. I had a bunch of labelers come in and be like, "Yes, this is what..." It was sort of a pick list. It was like, "If you say this question, does it mean X, Y, or Z?" They labeled the pick list and we had that as our bootstrap training dataset and decided on the application because there's a lot of SQL databases in the bank and in a lot of large enterprises. Often you've got a handful of folks who are the analysts who are called upon to go and do these queries and find answers. They'll build dashboards, like a Tableau-type dashboard where that's sort of commonly posed questions. They're FAQs where it makes sense to automate. Every month, you see the chart. But our original hypothesis was there's probably lots of long-tail questions that it doesn't make sense to program, but that it would be really nice...but you also don't want to have to call in the data analyst to do the work on. Can we just have people ask those questions to the tools?
Lukas:
Interesting.

From comparative literature and math to product

Lukas:
Switching gears a little bit, I was hoping to hear a little bit about your career and how you came to this really interesting job. You talk about coming up through humanities — although I think you do have a math degree, which is a kind of a technical side of humanities — and then you did grad school in comparative literature, right? Which is a little bit of an interesting switch, although I had a couple friends in college that did math and comp lit, but I was always struck by that. I wonder if you could talk about what you were thinking at the time and how that informs your work today?
Kathryn:
I'm glad you noticed that I also have math background, because people often are like, "How does literature and then machine learning...?" And I'm like, "Yes, but I did do a lot of work in linear algebra," so at least I can imagine functions. It's a great question. I wish I had a master plan, but I didn't have a master plan. I actually intended originally to be a physics and philosophy major. Those were the things that interested me most. I was kind of a klutz in the lab. I really didn't like the lab, so I was like, "You know what? None of this physics stuff. I'm going to do the part where you don't have to go into the lab and just do math." I always loved humanities and I spent my junior year abroad in Paris, and I didn't have to take any math courses because I had enough sort of standing credits. So I took courses in philosophy, film, literature, and I really loved it. I decided to change my major my fourth year in college, and instead of just doing math, do a double major in math and comp lit. The good thing about comp lit is that it's kind of...well, the good and bad thing. The bad thing is that it kind of lacks identity as a discipline. It's kind of a grab bag of it used to be...imagine you take a theme like "love", and then you say, "How do the French write about it? How do the Germans write about it?" And you find these sort of cultural role overlaps, which was the comparison. As the discipline has evolved, it's kind of become...some people focus on philosophy and literature, some people do cultural studies, some people do rigorous sort of history of a national literature. The ambiguity was good for somebody like me, because it was like, "Sure, you want to do math, history of philosophy, history of literature, languages, semiotics? Great! Great place for you." I went into it. I really liked languages and I thought it provided a lot of freedom to explore. I wrote my dissertation on 17th-century epistemology, basically what was knowledge at the time, and focused on Descartes, Leibniz, Newton. Sort of the old, dead white guys, and-
Lukas:
Classic math guys.
Kathryn:
Classic math guys. Exactly. Yeah. I know a lot about 17th-century math that's not really as relevant today.
Lukas:
Oh. Tell me some stuff about 17th-century math.
Kathryn:
Favorite things in 17th-century math. It's the dawning of calculus, right?
Lukas:
Yeah.
Kathryn:
You've got Newton. Newton in particular is really, really fascinating. Leibniz and Newton, both of them. Leibniz was...he had this thing called "Cogitationes caecae, "blind thought". He really thought that basically we could just let the symbols do all the work and it doesn't matter if we can visually represent some mathematical concept or if it really has a tie to the real world. It was just, "Let's go calculate stuff." With that sort of focus on formalism, he did a lot of...he had a lot of development of thinking about infinitesimal ratios and some of the mechanisms that go into making differentiation and integration possible, that just kind of worked. Newton on the flip side, kind of started off more on this formal track, but then he was influenced by a bunch of traditional focus on Greek math that was really prominent in 17-century England. There, they were like, "You have to visualize. It all comes back to geometry." Geometry started with farmers out trying to measure distances in a field and it needs to be grounded. He grappled a lot with thinking about the gap between a limit and zero, right? You see that through the Principia. I wrote a paper at one point on his notion of...he called them first and last ratios, which were basically proto-limits. He kind of held himself back because he was really so focused on keeping things tangible, which I found really interesting between the two of them. So, yeah. One 17th-century math tidbit.
Lukas:
Did you continue this line of research in grad school? Or was it something else?
Kathryn:
I continued the line of research on 17th-century math and philosophy in grad school, wrote a dissertation that five people have read, on this topic. Then afterwards basically, with comp lit...word of the wise for any listeners who decide to be comp lit grad students, there's not a lot of comp lit departments — there's a lot of national language department — and there's not a lot of availability.
Kathryn:
I think if I had been able to become a philosophy, history of philosophy, professor, I probably would have stayed an academic. But I was sort of prepared to be a French literature, 18th-century professor, and I was like, "I don't know if that's really me." There's not a lot of jobs. So it's like, "Do I go to Nebraska and fight for my assistant professorship? Or do I go into tech?" I was out at Stanford, so I just decided to switch careers. What is the humanities training, is that still with me, besides having arcane knowledge that not many people want to talk about? But I'm glad you do. Normally it's a liability for me at work because I get feedback on performance reviews that are like, "Kathryn's really great, but sometimes she goes off on these philosophical digressions and we're not really sure why." But I think one thing that I've brought with me is...I trained as an intellectual historian in grad school. If you're a philosopher, often today you're evaluating arguments for, "Is this right? Is this true?" And then there's people who come in and say, "Well, there's no such thing as truth in the first place and everything's relative." I think as an intellectual historian, I didn't care if Descartes was right about the motion of planets and space. I was really interested in understanding what he thought he was thinking. Why this? What was he reading? What was happening around the time? Sort of saying, "All right. I'm reading this as a 21st-century reader and I'm coming with all of my prejudices and predispositions of thinking like somebody who's on the internet and viewing the world in a certain way and thinks that universal gravitation is second nature," but for him it was not. I think there was a lot of training in suspending disbelief, ensuring that one didn't bring in one's own subjective predispositions and really understanding a foreign thinker. I actually think that's really good training for product management. I think it's good training for executive work. You're constantly in situations...like with a customer, it's not, "Here's how I want to use my cashflow forecasting app." There's going to be a distribution of millions of customers who are totally different from me. I guess I'm always approaching problems from the perspective of, "I'm not going to assume that there's one right answer, and I'm not going to assume that this person thinks similar from me or comes from a similar place," and I think that's been really good training in doing product work eventually.
Lukas:
I mean, you didn't just study any comp lit, it's very like, different technical points of view. I feel like you see that in ML too, right?

What would Newton and Descartes think about ML?

Lukas:
I had a boss who always said he preferred to hire biologists over physicists, and I think what he meant by that is he liked people that didn't really try to figure out the underlying structure of models, but just examined them from the outside of what they do, right? Take this kind of open mind, "We're not going to make assumptions." But then I think about Newton actually, and it seems to me like...you tell me, actually. It seems like Newton made this leap into a lot of structure. He must have wanted to put an underlying structure on the world really badly to come up with such an amazing structure. Do you think Newton doing ML, it would have driven him nuts that we have this point of view of looking at the models from the outside and just examining what they do and maybe not worrying about exactly how they work and making them more and more complicated?
Kathryn:
Yeah. That's a great question. I think there probably would be aspects of ML that would have driven Newton crazy. There's other aspects where I think there's some kinship or predecent thinking. I'm influenced here by one of my dear friends and mentors, a man named George Smith. He's a professor at Tufts who...if you really want to know about Newton, talk to George. He's the guy. He's taught this course on basically how Newton changed the standards for high-quality evidence for 25 years and really knows a lot on this topic. One of the things I learned from him is that Newton always assumed that the system that he was trying to model was infinitely more complex than the deductive mathematical model that he could apply to it. There's a lot in the Newtonian scientific paradigm that's like, "All right. We're going to put this hypothesis out there, or this deductive model. Then we're going to make observations and there's going to be a gap between what we observe and what we've modeled. The progress of this paradigm is to continuously watch that gap and close it when possible by refining our mathematical model, but sometimes realize where it's just completely off the mark and we might need to sort of shift our thinking." To that extent, I think there's some...there's more affinity within sort of the ML mindset than a traditional rules-based computer programming mindset, or even the GOFAI-type mindset, right? As long as we can articulate the structure of the thinking, we can model the world.
Lukas:
What is a GOFAI-type mindset?
Kathryn:
"Good Old Fashioned AI", expert systems.
Lukas:
Nice. I didn't know that acronym.
Kathryn:
It's always been a plight of mine. Definitely spent a lot of my time in comp lit working on rationalist, hyper-structured, 17th-century thinkers and drove my comp lit colleagues crazy because I came from the math background and my papers were proofs versus more exploratory. I envy the ML mindset too, because I think coming from more of that "always trying to prove things," it's not always the best approach to running a company either.
Lukas:
Do you think Descartes would have had a different point of view on ML?
Kathryn:
This is another loose analogy, but basically this whole...you know the famous, "I think therefore I am", "Cogito, ergo sum". He phrased it that way in 1637, "A Discourse on the Method," which is like, "Here's my method," and then he runs it through three examples, one of which being the geometry. One was on-
Lukas:
Sorry. Could you explain what that means? I've heard that a zillion times, but I don't think I know the implication of "I think, therefore I am".
Kathryn:
When he first stated it, what he was trying to do was big, bold 17th-century work. Prove that God exists, A, but then, B, put forth a new way of thinking and doing science that was cleaner and upon which one could actually feel like they had sort of...they could believe these statements and propositions of truths versus the predecessors, which were always citing the ancients. It's like, "Why is something true? It's true because Aristotle said it was true," versus "It's true because I have used logic to come to a propositional type of truth." When he was starting his "Let's prove that God exists," he says, "Well, where do I start? Why don't I start by proving that there's some clear point that I can stand upon where I know this is what truth looks like." And so the "Cogito, ergo sum," was that point where basically he's like, "No matter how hard I try, if I try to pretend I don't exist, there's got to be somebody there doing that thinking, therefore I must exist." It's kind of this proof by contradiction. There's got to be some voice there. What's interesting is he rewrote this. In his second attempt at it, he got rid of the thinking. So he didn't say "cogito". He just said, "I am. I exist." And then he said, "If you want to understand how this truth works," he didn't use these words, but my paraphrase, "Go sit in a room and meditate for days." Do it, repeat it, and do this for 30 days, and eventually you will have trained your mind to think clearly. I looked at that and I was like, "Well, that's different. That's not quite what I thought Descartes was about." Go sit in a room and repeat things until you train your mind to think that way? That was really interesting to me. This is a loose analogy, but I think there's something similar to supervised learning, when it's like, "Is this a dog? Is this a cat?" It's just like, "Show me 50 examples" and repeat until you've established the input, output pattern. That's not really there, but I think it's kind of there, and I think it's interesting that there's sort of this intellectual foundation for supervised learning in Descartes.
Lukas:
Although it seems like with Descartes, there's maybe no input if you're meditating?
Kathryn:
There's no input besides your own training your mind to rewire. It's like, "Rewire your mind to think this way."
Lukas:
I see, I see. Interesting.

On sentient AI and transporters

Lukas:
Do you then have thoughts on AI being sentient? Do you have opinions on things like the Turing test? Or...what are those classic ones that you learn in your first AI class on the room with the person in it, and the book's doing Chinese or something?
Kathryn:
Yeah, the Chinese...the John Searle Chinese room argument? To be honest, not really. I find the Turing test interesting conceptually, but I struggle with the arguments that are this sort of singularity type arguments, like computation is rising and the models are more complex and these models are going to get to the point where they come into consciousness. I just don't really see it. Do you? I don't know.
Lukas:
Well, we have this existence proof with humans, right? That seems like consciousness kind of comes from some type of process. It seems to me like unless you think there's something really like God in there, in the physics somehow, there must...it sort of must come from increasingly complicated computation, right?
Kathryn:
Yeah, for sure. I sort of fall into the materialist. While I've spent a lot of time with the 17th-century philosophers, I don't share the sense of the soul and there's a God in there that makes things different. Fair enough, fair enough. But it's interesting. There's still that gap between...I don't know if it's the plasticity of our neurons, the fact that there's just thousands of billions, trillions of very plastic processes going on in there. Or if it's like a Dan... I don't know if you know Dan Dunnet-type argument where the self is basically a user illusion, right? In the same way that we interface with...I'm looking with you on a Zoom screen right now. My iOS operating system which makes it easy for me to engage with the computer versus seeing the nitty gritty insides, maybe it is a useful illusion that we've gained through evolution, but that it's not really real. I kind of buy that argument. Basically consciousness is a red herring, is kind of what that argument would be.
Lukas:
Are you the kind of person that would get into a transporter that would disintegrate your body and reassemble it somewhere else? Would you feel like that's a safe thing to do? There's some very strong opinions on different sides of that at Weights & Biases.
Kathryn:
That's a good question. I've never really thought about that. I might change now that I have a son, now that I'm a mom. If you met me a year ago, maybe I'd say, "Yeah, sure." But now it's almost like, "Would that mean that there's some implication on my relationship to my child?" I'm not sure I want that.
Lukas:
That's funny. I have a small daughter too, but I think I've...my whole life, I would just never get in that machine. To me, it seems incredibly unsafe. I don't know if I could justify it, but I just would not do that.

Why casual inference is under-appreciated

Lukas:
All right. We always end with two questions that are a little more on the practical side, but this is really fun. So one is, I guess what's a topic in ML that you think is understudied or underappreciated?
Kathryn:
I don't know if it's understudied, but I think it has been underappreciated and now is becoming more appreciated, but this is causal inference. The Judea Pearl-type work. This is something we're starting to really look into at the bank because of this need for sort of more interpretable models. And lots of conditional probabilities where if we could understand what happens in one variable and how that relates to another variable, it would be really useful for sort of macroeconomic modeling. It's a topic that a person like me is going to be interested in because it's philosopher candy as well. There's lots of interdisciplinary approaches to this problem. What is a cause? If we can even really define it well, and it's represented formally in machine learning in one particular way. I think it's going to be interesting over the next couple of years to see these sort of traditional causal inference methods interacting with deep learning and the deep learning community, so that's one thing that we're... I'm personally excited about, but also Borealis is looking into these days.
Lukas:
Interesting. Super cool. Is there a paper that you could point people to if they were interested in learning more about?
Kathryn:
Yeah, yeah. I'll send a link after. There's a paper recently. I know Elias Bareinboim, who is one of...he was one of Pearl's students and he's at Columbia, he was a co-author. Yoshua Bengio was a co-author, and then there's two others that I know as well. That's all about sort of deep learning and causal inference. That's probably a great place to start.

The challenges of integrating models into the business

Lukas:
Cool. The final question we always ask — and you've seen so many different applications, I think you'll have a really interesting perspective on this — is basically kind of going from wanting to build a model for some purpose and kind of getting it deployed in production and actually doing that purpose. Where do you feel like there's the biggest painful... Or most painful bottlenecks?
Kathryn:
Yeah. There's often a lot. It's hard to say the most painful. I think at the highest level, it's really deep integration into the full business process. This is really coming also from an "enterprise ML perspective" versus a sort of "ML for software" company. I've seen tons of projects fail where you might have a good...given a task, build a model. If it's just handed over to the business without considerations of, "All right. Where does the production data sit? How do we get that data from that environment to the environment where our model sits to do inference?" There's always questions on just the timeframe, or is this batch monthly, weekly, real-time? I've seen stuff where there is...we think we can easily do a batched output, it's just monthly. Output is set up for predictions that are going to go into some call center list. But there's some nuance in the process where the third week of every month, they do this to the data and that's going to mess this up, and so it's always in the details of what that full flow will look like. Then the third, with the business process is, "All right. Now you've output the prediction, but how does the process change?" If people just use it and they continue to do what they're doing, I don't think you're really taking advantage of, "Now that we have this, we can shift our approach." Let's say it's a call center automation thing. We can shift the number of people we have on staff at a given time. We can collect the following new data to improve the process in some way. I think you have to think about it holistically, right? In terms of, "What's the end result? Where does it sit? How do you measure it?" That's kind of all of the production ML pipeline, but I actually think it's all there and it all matters, so.

Outro

Lukas:
Spoken like someone who's done a bunch of production pipelines, I think. Thank you. Thank you very much. That was really fun. If you're enjoying these interviews and you want to learn more, please click on the link to the show notes in the description, where you can find links to all the papers that are mentioned, supplemental material, and a transcription that we work really hard to produce, so check it out.