Skip to main content

Finding Which Lemons are Lemons: Assessing and Optimizing Business Value With ML

Inspired by our recent MLOps course, here's a deeper dive into creating business value with machine learning but without learning how to assess if a project is feasible in the first place
Created on August 31|Last edited on October 31

Table of Contents (click to expand)

Background

Assessing and tuning ML projects to maximize business value is not something that’s part of most data science training. However, it can be an essential skill for career success as a data scientist. In fact, we discussed this very topic in Lesson 3 of the Weights & Biases MLOps Course. In this article, I’ll walk you through a concrete example of assessing business value for a hypothetical project.
There are many ways machine learning projects can add value to a business. These different types of machine learning projects require different methodologies for assessing business value. During my career, I have encountered two main kinds of projects:
1) Optimizing an existing business process. This usually involves automating, at least partially, a process that is currently manually performed by people. This also may involve optimizing a system to be more efficient that has no human involvement. Examples include:
  • Forecasting sales to allocate inventory more accurately and reduce waste while fulfilling demand
  • Recommending better products for customers to buy
  • Predicting fraudulent behavior to flag for human review
2) Machine learning is the product. This is when the outputs of the model are being sold directly to customers or where machine learning is expected to transform a product class into something completely new. Examples of this include:
  • Generating art from natural language, like DALL-E mini.
  • Self-driving cars
  • Models that help developers code with better auto-completion, like GitHub Copilot.
It should be noted there are project types other than the ones listed above. However, I believe these two categories are the kinds you are most likely to encounter as an applied machine learning practitioner. In this blog post, we will walk through an example of the first kind of project, where you are asked to optimize an existing business process.
Hypothetical Scenario: Darek owns a lemon farm. For quality control, Darek employs a sizable workforce of produce inspectors. But there’s one consistent quality control problem that has always plagued Darek: mold.
Moldy lemons are bad, not only because customers lose confidence in the quality of the product but also because Darek must refund the lemons and associated shipping costs. Furthermore, due to labor shortages and other market conditions, it has been increasingly difficult and expensive to find produce inspectors.
Thankfully, the farm keeps digital records of each individual lemon as it is processed for packaging, including labels for those lemons that are randomly inspected. Given the presence of all this data, Darek turns to you, the data scientist, and asks you if it is possible to automate some of this process.
None of these lemons are moldy but all of these lemons were created by Stable Diffusion

Assessing the Feasibility of The Project

In my experience, business partners often assume that machine learning is the answer to their problems when it may not be. And that can often make it quite tempting to jump straight into an ML project and start writing code. Past that, enthusiasm for ML can often lull data scientists into prematurely kicking off a project as this can be music to their ears: they are the hero the business needs to save the day.
But the reality is: picking the right projects to work on and assessing business value up front is critical for long-term career success as a data scientist or ML practitioner.
My advice is to maintain healthy skepticism and approach problems objectively, no matter how excited those around you may be about machine learning. You can find a great set of exhaustive questions to ask to get you oriented before you even embark on a data science project here.
For our hypothetical problem, we will want to collect some data that will help us assess the feasibility of the proposed project. We want to think critically about the problem from jump street:

1) What is causing mold in the first place? Is there anything that can be done to mitigate or reduce mold?

  • Is there data that shows the prevalence of mold over time? Are there seasonal or other factors that we can find in the data with clues on what may be causing mold?
  • What kind of spoilage rate is common at other lemon farms?

2) What are all the costs associated with the current process?

  • Cost of labor (lemon inspectors) -- and to what extent may we be able to reduce the cost of labor?
  • Costs of false negatives (not catching mold) -- and what is the efficacy (precision, recall) of the current process in identifying mold?
  • Cost of false positives (you say its mold but you're wrong)
  • How many lemons are sold on average per month?
  • Roughly, what percentage of lemons have mold?

3) What are reasonable targets for improvements to the current process?

  • Do we have any benchmarks, studies or other hypotheses that suggest what a reasonable accuracy metric could be?

4) What would be the costs associated with an ML approach?

  • The incremental cost of data scientists, tools, and infrastructure.
  • Can we make assumptions that costs will be amortized over many projects?
Notice how the first question has nothing to do with ML at all! It is vital for you to understand the problem you are trying to solve as holistically as possible, including if the problem can be solved differently.
Armed with your data skills, you can often understand business processes and spot problems in different ways than other people. I have found that in a large number of cases, I have been able to find a root cause for the business problem that would make ML unnecessary. In other cases, I have been able to unlock additional business value that nobody expected because I took the time to understand the problem deeply.
Finally, doing this exercise will give you important domain knowledge that will help you communicate with stakeholders and build a better solution. In this scenario, let’s assume that there are no external processes that can be changed to reduce mold materially and that the mold that is occurring is typical amongst lemon farms of this type.

Collecting our data

Now it’s time to collect the relevant data. Below is the data we have collected in this scenario. In practice, you might have to make educated guesses for some of these numbers with your business partners. This process may take some time; however, it is well worth it so you can properly assess the feasibility of the project.
  1. Number of lemons sold per month: 1.5 million
  2. Cost of labor: $100,000 per month
  3. Percentage of lemons with mold: 15%
  4. Current false negative rate: 27%
  5. Target false negative rate: 20%
  6. Cost of false negative per lemon: $1.50
  7. Cost of false positive per lemon: $0.25
  8. Current false positive rate: 3%
  9. Cost of Data Science time & infrastructure: $15,000
Your stakeholders should be able to help you obtain estimates or relevant data that will allow you to collect the above metrics you need. One notable exception to this is our last point above: Cost of Data Science time & infrastructure. For example, if the data, labels, or tooling you need are not present, then the time and infrastructure required to obtain that may make the project infeasible. You may also want to confirm this assumption with your stakeholders, for example, if you can assume this cost can be amortized over many future projects.
Below is a video that explains how I would go about using this information for a feasibility study. To follow along, you will want to download this Excel spreadsheet.



Building & Evaluating the Model

After you have determined that this project is feasible and have reviewed your assumptions with business stakeholders, you can proceed with building the model! You must take care to partition your evaluation data carefully, as well as select a good metric for model evaluation. Furthermore, you should perform error analysis on actual data to diagnose and catch any problems early on with your approach.
We discussed these concepts in our MLOps course (sign up here for the on-demand video).

Tuning the Model For Business Value

Now that you have a model, the next step is to evaluate the model’s potential business value and calibrate how you will use the model to make decisions. For example, at which probability will you decide to throw a lemon away? Depending on your cost for false positives vs. false negatives, it may make sense to calibrate your decision threshold accordingly rather than arbitrarily setting it at 0.5
One convenient way to do this is to use your model’s predictions on the validation set to simulate how much incremental value your model will create. For this reason, it can be convenient to log these predictions to Weights & Biases for quick retrieval.
Here is code that you might use to log predictions to Weights & Biases:
inp,preds,targs,out = learn.get_preds(with_input=True, with_decoded=True)
inp.shape, preds.shape, targs.shape, out.shape

imgs = [wandb.Image(t.permute(1,2,0)) for t in inp] # we need to put as channels last for wandb.Image
pred_proba = preds[:,1].numpy().tolist()
targets = targs.numpy().tolist()
predictions = out.numpy().tolist()

preds_df = pd.DataFrame(list(zip(imgs, pred_proba, predictions, targets)),
columns =['image', 'probability', 'prediction', 'target'])

run.log({'predictions_table': wandb.Table(dataframe=preds_df)})
run.finish()
Likewise, here is an example code that you might use to retrieve predictions:
import wandb
import pandas as pd
from fastcore.all import Path

def val_pred_table(run_id, entity='wandb_course', proj='lemon-project'):
"Get prediction table on the validation set for the lemon project."
api = wandb.Api()
path = api.artifact(f'{entity}/{proj}/run-{run_id}-predictions_table:v0').download()
preds = (Path(path)/'predictions_table.table.json').read_json()
return pd.DataFrame([{'pred':p, 'label':t} for _,p,_,t in preds['data']])

if __name__ == '__main__':
preds = val_pred_table('zxum4ef0')
preds.to_excel(Path('Validtation_Predictions.xlsx')
The next step is to load your predictions into a spreadsheet and to fill out your assumptions, like that of the feasibility exercise. The purpose of this is you can use your model predictions to run simulations with different decision thresholds in order to assess the potential business value and to choose an optimal decision threshold.
Here’s what a spreadsheet of this kind might look like:


In the video below, I walk you through this spreadsheet simulation and explain what all the pieces mean. To follow along, you will want to download this Excel spreadsheet.




Profit Curves

One useful visualization that is relevant to calibrating your decision threshold is profit curves. Profit curves plot the estimated incremental business value for different thresholds. Below is the profit curve for the lemon example:

We can see that setting the decision threshold too low or high results in negative business value. If the threshold is too high, this results in deciding all lemons have mold. If the threshold is too low, this results in deciding no lemons have mold. Either of these extreme cases would negatively impact the business. In this case, the optimal threshold is somewhere around 0.3, which is a function of false negatives being much more expensive than false positives.
Making profit curves can also be a good debugging step in terms of spotting mistakes in your simulation. You can generate profit curves by plotting the marginal business value for different thresholds. In the above example, I constructed the graph manually by adjusting the threshold in the simulation in increments of 0.1. You can also use code or other methods to construct profit curves however I suggest that you start with a spreadsheet to iterate quickly.

Conclusion

It should be noted that this is a toy example meant to illustrate what assessing and calibrating an ML project for business value might look like. We assume that the validation data is representative of the types of things you will see in the future. We also assume that costs are stable. These are big assumptions, and you will have to assess if more complicated simulations or financial models to assess business value are appropriate.
Still: my advice is not to overcomplicate things if possible. I believe that the exercise of going through this process is valuable because it forces you to become fluent in the domain and communicate better with stakeholders. Furthermore, I have found that being good at expressing the business value of your projects can often differentiate senior from junior data scientists. If you want to learn more, I highly suggest looking at lesson 3 of this course.
Iterate on AI agents and models faster. Try Weights & Biases today.