A Gentle Introduction to Machine Learning Models

Have you ever wondered what a machine learning model really is? In this beginner-friendly article, we'll introduce the concepts of ML algorithms and models.
Angelica Pan
Created on November 11|Last edited on December 15
Comment
This piece would not have been possible without the incredibly valuable help and feedback from Bryan Bischof and Stacey Svetlichnaya. Thank you!
OverviewThis article is a high-level and beginner-friendly introduction to machine learning models.
Some basic knowledge of linear algebra, geometry, or statistics may be helpful, although not required; this article does not contain any mathematical implementation details and is written for readers newer to or unfamiliar with machine learning.
We start by explaining what an algorithm is: a machine learning model is a type of algorithm, so understanding the concept of an algorithm is fundamental to understanding what a model actually is.
Then, we introduce the three main machine learning paradigms — supervised learning, unsupervised learning, and reinforcement learning — along with a brief overview of commonly used machine learning models.
As a bonus, we include a quick introduction to deep learning and deep learning models.
Here's a table of contents in case you'd like to skip ahead to a particular section. Happy reading!
Table of ContentsOverviewWhat Is an Algorithm?What Is a Machine Learning Model?Types of Machine Learning ModelsWhat Is Supervised Learning?What is Unsupervised Learning?What is Reinforcement Learning?A Quick Introduction to Deep Learning and Deep Learning ModelsSummary
﻿
What Is an Algorithm?Broadly speaking, an algorithm is a sequence of steps that describes how to accomplish some goal, like completing a task or solving a problem. It takes some input(s), and via the specified list of instructions, produces some output(s).
A cookbook recipe is an algorithm (”Prepare these ingredients in this order, and create some desired food item”), as is deciding what time to set your morning alarm (”Take the start time of the event and subtract the time it takes to get ready and commute”).
This graphic from Wikipedia’s "Flowchart" article illustrates a simple algorithm for deciding what to do with a lamp that doesn’t work.
That being said, most people use the term “algorithm” in a context where the goal that the algorithm describes how to accomplish is mathematical in nature, such as “Put the elements of this list into the desired order”, and the steps to achieve this goal are similarly computational.
For example, imagine that you’re writing all of the whole numbers, or integers, from 1 to 100. Although you probably don’t think of it in this way, you probably do this algorithmically:
Write the starting integer, which in this case is 1
Write the next biggest integer, aka the previous integer plus one
Continue Step 2 until you reach the final integer, which in this case is 100
Stop writing any more integers
In a computational context, the word “algorithm” generally refers to an algorithm expressed in a programming language like Python or C++. This type of implemented-in-code algorithm can then be executed by a computer. An algorithm expressed in plain language is typically referred to as pseudocode.
The blocks below are an example of an algorithm for deciding what to say in response to a user-given number, implemented in pseudocode and Python respectively. According to this algorithm, if the number is 5, 6, 7, 8, 9, or 10, it’s just right! Otherwise, the number is too high or too low.
Pseudocode
Instruct the user to enter a number
﻿
If the number is larger than 10
	then say "Too high :("
else, if the number is less than 5
	then say "Too low :("
else, if the number has any other value
	then say "Just right!"
Python
num = int(input("Enter a number:"))
if num › 10:
	print ("Too high :(")
elif num < 5:
	print ("Too low :(")
else:
	print ("Just right!")
Throughout this article, we’ll use the word “algorithm” in both the broad sense and the programming sense. However, don’t get too stuck on what exactly is or isn’t an algorithm! There’s actually no formal definition of an “algorithm”, and the term “algorithm” can be and is used in a wide variety of contexts.
How Are Algorithms Used?Much of computer science — and by extension machine learning — is understanding how to frame a human goal as a computational goal that a computer can work towards by executing the instructions of an algorithm.
For example, there may be many possible ways to get from Point A to Point B, but what is the best one?
You can frame this question as a search problem, and use Dijkstra's algorithm to find the shortest path between these two locations. Of course, in real life, the shortest path is not necessarily the quickest path, and the quickest path is not necessarily the “best” path. 
Nevertheless, route planners like Google Maps do their best to suggest optimal directions, which they identify via some combination of multiple algorithms that calculate or predict different factors of a transit route, including physical distance, traffic, and the user’s personal preferences. 
What Is a Machine Learning Model?A machine learning model is an algorithm whose parameters have been determined or trained, via some algorithmic, statistical learning process. In practice, a model is a computer program that can perform certain types of tasks without explicit human instruction.
In traditional programming, the developer (a human) writes a program by specifying a set of rules that a computer must follow when executing the program. In machine learning, the practitioner trains a model by telling a computer to execute a learning algorithm over some data until the learning algorithm — via the computer (a machine) — produces a set of rules that meets some criteria.
This set of rules is a machine learning model.
This diagram is based on Sebastian Raschka’s blog post “How would you explain machine learning to a software engineer?”
Machine learning and machine learning models are incredibly powerful because the training process enables a machine learning model to generalize what it has learned and solve problems without being explicitly programmed to do so. A model can even learn how to generate new data; DALL-E 2, Midjourney, and Stable Diffusion are diffusion models that went viral this summer for their impressive art-generating abilities.
What Is the Difference Between a Machine Learning Model and an Algorithm?A common misconception is that a machine learning model is inherently different from an algorithm. In fact, a machine learning model IS an algorithm. Specifically, an algorithm whose parameters have been trained via some statistical learning process.
That being said, a machine learning model is different from a learning or optimization algorithm, the general term for the statistical learning process that trains a model. 
Here’s an example of the difference between linear regression (a learning algorithm) and a linear regression model (a machine learning model):
Linear regression: an algorithm, but not a model
Input: A dataset that contains examples of a potential relationship between a dependent variable and one or more independent variables
Output: The coefficients of the linear equation that best approximates a relationship between the inputs and outputs (aka a linear regression model)
A linear regression model: an algorithm and a model
Input: One or more independent variables
Output: A prediction of a dependent variable
In practice, it’s common to use the same term to refer to the learning algorithm and the machine learning model, i.e. the term “linear regression” may refer to the technique of finding the aforementioned optimal line, or to the line (the actual model) itself.
Also, keep in mind that in some contexts, it may be more useful to talk about the general technique, or learning algorithm, for training a particular type of model. For example, linear regression is an algorithm for training linear regression models and ultimately trains each model in more or less the same way (at a high level) every time.
In other contexts, it might be more useful to talk about the model and the type of task(s) it performs, since there may be multiple ways of training a particular type of model. For example, decision trees predict a target value in more or less the same way (at a high level), but the decision trees themselves may have been trained using different learning algorithms.
Types of Machine Learning ModelsThere are three main machine learning paradigms: supervised learning, unsupervised learning, and reinforcement learning. 
﻿Supervised learning: The process of using labeled data to learn relationships between features of the data, in order to make predictions about unseen or future data.
﻿Unsupervised learning: The process of using unlabeled data to learn relationships between features of the data, in order to extract meaningful information from the data.
﻿Reinforcement learning: The process of using actions within an environment to learn properties of the environment, in order to determine the best actions to take.
Other types of learning include paradigms that are a combination of supervised and unsupervised learning, like semi-supervised learning and self-supervised learning, and broad concepts, like transfer learning and online learning.
What Is Supervised Learning?In supervised learning, the machine learning algorithm is provided with a dataset that consists of inputs and their desired outputs, or labels. The goal of the algorithm is to model the relationship between inputs and outputs as closely as possible.
These labels serve as the machine learning algorithm’s “ground truth”, the reality that the algorithm tries to model. If the labels are bad — in the sense that they do not reflect the reality that you are actually trying to model — then the machine learning model that the algorithm produces will also be “bad”.
This is generally referred to as “garbage in, garbage out”, the idea that poor quality inputs (bad labels) produce poor quality outputs (bad predictions).
For example, if you’re studying for an upcoming exam, you might take some practice tests and then check your answers. However, imagine that the answer key is wrong, and sometimes tells you that you’ve answered a question incorrectly when in fact you’ve answered it correctly, and vice versa. This answer key would have very bad labels!
Examples of Supervised Learning TasksThe relationship that a supervised learning algorithm models generally falls into one of two categories: regression or classification.
Broadly speaking, the goal of regression is to predict some numeric and measurable value, like the price of a stock tomorrow or the probability that a customer will buy a certain item. The goal of classification is to predict some categorical value, like whether this sample of breast tissue is cancerous or benign, or what breed of dog this animal is (Note: this animal might not be a dog at all! In this case, “not a dog” would be a good category to predict). 
That being said, other types of supervised learning algorithms exist, such as linear discriminant analysis (LDA), a supervised dimensionality reduction algorithm. Dimensionality reduction, the process of transforming high-dimensional data into lower-dimensional data, is normally considered an unsupervised learning task.
What Is the Difference Between Regression and Classification Algorithms?Regression and classification are two different types of predictive tasks. Regression predicts the value of a continuous variable, which has infinite possible values. Classification predicts the value of a variable that has finite possible values.
Example of a regression problem: Predicting the monthly rent of an apartment (the output), given factors like its square footage, location, age, and amenities (the inputs).
Example of a classification problem: Identifying whether an email is spam or not. In other words identifying if an email (the input) belongs to the “is spam” class (a possible output) or to the “is not spam” class (the only other possible output).
Common Supervised Learning Algorithms and ModelsThis section briefly explains some common supervised learning algorithms and models.
Linear Regression﻿Linear regression is a supervised learning algorithm used for regression problems. The goal is to identify the hyperplane that best predicts the value of some relationship between two or more features within a specified dataset, in order to predict new values.
You can read more on linear regression in this article.
Logistic Regression﻿Logistic regression is a supervised learning algorithm used primarily for classification. The goal is to identify the logistic curve that best predicts the probability that an input belongs to some class, which is then used to map the input to an actual class.
K-Nearest Neighbors﻿k-nearest neighbors, or knn, is a supervised learning algorithm used primarily for classification problems. The goal is to predict the probability that a data point belongs to a certain class, based on which class(es) the data points near it belong to. 
Decision Trees﻿Decision trees are supervised learning models used for classification and regression problems. A decision tree learns rules that “branch” off into different predictions based on the features of a data point, in order to predict some value of a new data point.
Naive Bayes﻿Naive Bayes classifiers are supervised learning models used for classification problems. A naive Bayes model uses Bayes’ theorem to calculate the probability that a data point belongs to each possible class, in order to identify the most probable class.
The “naive” part refers to the assumption that the features of each data point are completely independent, which is unlikely to be true in real life. Nevertheless, this assumption makes certain computational aspects much easier, and naive Bayes models often perform well in practice.
Support Vector Machines (SVMs)﻿Support vector machines (SVMs) are supervised learning models used for classification. An SVM is the hyperplane that best separates different classes within a dataset, in order to classify a new data point by identifying which side of the hyperplane (aka which class) it belongs to.
The SVM algorithm can be applied to regression problems, in which case it is generally called Support Vector Regression (SVR).
What is Unsupervised Learning?In unsupervised learning, the machine learning algorithm is provided with a dataset that does not contain any labels or desired outputs. The goal of the algorithm is to identify properties of the data without the "supervision" of a known outcome.
Unsupervised learning algorithms often aim to group similar datapoints together or simplify high-dimensional data into fewer dimensions. These tasks are called clustering and dimensionality reduction, respectively.
Clustering and dimensionality reduction often go hand-in-hand — it’s generally desirable to reduce the number of dimensions, or features, within a high-dimensional dataset before attempting to group datapoints by features. This is due to the curse of dimensionality, the observation that high-dimensional data has certain properties that make it difficult to be efficiently and practically computed without being first transformed into fewer dimensions.
What Is the Difference Between Clustering and Dimensionality Reduction?Clustering and dimensionality reduction are ways of identifying patterns within data. 
Clustering group data points together, according to some defined measure of relatedness. 
Dimensionality reduction simplifies high-dimensional data into fewer dimensions.
Example of a clustering problem: Identifying streaming platform users with similar viewing patterns (the outputs), given user information like minutes watched per day and total viewing sessions per week (the inputs).
Example of a dimensionality reduction problem: Simplifying a dataset that contains 13 different attributes of different wines into two dimensions, such that the dataset can be visualized on a 2D scatterplot.
Common Unsupervised Learning Algorithms and ModelsThis section briefly explains some common unsupervised learning algorithms and models.
K-Means Clustering﻿k-means clustering is an unsupervised learning algorithm used for clustering problems. The goal is to partition data points into a pre-specified k number of clusters, with each data point belonging to the cluster with the nearest center.
Principal Component Analysis (PCA)﻿Principal component analysis (PCA) is an unsupervised dimensionality reduction algorithm. The goal is to compute a dataset’s principal components (PCs), new features derived from the original features. Typically, only the first two to three PCs are kept, allowing the dataset to be remapped into two or three dimensions.
t-Distributed Stochastic Neighbor Embedding (T-SNE) & Uniform Manifold Approximation and Projection (UMAP)﻿t-distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP) are unsupervised dimensionality reduction algorithms that operate very similarly, but with subtle and important differences.
In both algorithms, the goal is to calculate the similarity between points in the original dataset, and then identify a lower-dimensional (typically 2D) projection of the dataset that preserves those similarities as much as possible. 
However, t-SNE and UMAP differ in a few key ways, including how the two algorithms measure distance (how “close” or “far” one point is from another), how the two algorithms create the initial lower-dimensional projection, and how the lower-dimensional projection is then optimized.
For a more detailed dive into t-SNE, UMAP, and their differences, check out Andy Coenen and Adam Pearce’s blog post, “Understanding UMAP”.
What is Reinforcement Learning?In reinforcement learning, there is an agent that receives reward signals as it interacts with a given environment. The goal of a reinforcement learning algorithm is to train the agent such that it learns behaviors that maximize its cumulative reward signals.
﻿Reinforcement learning (RL) approaches generally fall into one of two categories: model-based or model-free. In model-based RL algorithms, the agent has access to a “model” of the environment, some function that maps actions to rewards. In model-free RL algorithms, the agent does not have access to such a model, and thus has incomplete information about the environment.
The world of RL is large and complicated, and even a brief overview of common RL algorithms would require additional explanations of the many underlying terms and strategies. 
For a deeper dive into reinforcement learning, check out Mukilan Krishnakumar's article, "A Gentle Introduction to Reinforcement Learning With An Example".
A Quick Introduction to Deep Learning and Deep Learning ModelsDeep learning is a subfield of machine learning that focuses on building artificial neural networks (also known simply as neural networks), a type of machine learning algorithm inspired by the actual neural circuits in animal brains.
As we have explored, supervised, unsupervised, and reinforcement learning are learning paradigms that differ in the task that a machine learning algorithm performs:
Supervised learning: Learning properties of data using labeled data
Unsupervised learning: Learning properties of data using unlabeled data
Reinforcement learning: Learning properties of an environment via interactions and rewards
The term “deep learning”, however, refers to algorithms that are structured and implemented as a neural network with many layers. Deep learning models can perform supervised, unsupervised, and reinforcement learning tasks.
A neural network is organized into layers of artificial neurons (also called “perceptrons” or simply “neurons”), where the neurons of a layer are connected to the neurons of another layer, in the sense that the output(s) of a neuron in one layer become (part of) the input to a neuron in another layer.
These connections go in one direction, from the first to the last layer of the network via some number of “hidden” layers in between. The first and last layers are known as the input and output layers of the network, respectively. 
This graphic from Wikipedia’s article on artificial neural networks illustrates the structure of a neural network, with each circle representing a neuron, each circle color representing a different layer, and each arrow representing a connection from one neuron to another.
Like any algorithm, a neural network (and each of its layers) takes some number of inputs, follows some list of instructions, and produces some number of outputs. And, like any machine learning model, the goal of training a neural network is to optimize the values of these inputs, or parameters.
Deep learning, however, sometimes takes this process to an unprecedented and incomprehensible-to-humans scale.
For example, OpenAI’s GPT-3 is a well-known deep learning model that takes a text prompt as input and produces a continuation of that prompt as output. GPT-3 is famously large: it has 96 attention layers, and the neurons in each layer collectively take in 175 billion parameters as inputs. 🤯
There’s a lot more that could be said about deep learning, but we’ll save a more in-depth explanation for another article. Stay tuned!
SummaryWe've now reached the end of this gentle (but long) introduction to algorithms, machine learning models, core machine learning paradigms (supervised learning, unsupervised learning, reinforcement learning), and common machine learning models.
Thanks for reading!
A note from the author: Hi, I'm Angelica, a technical writer at Weights & Biases — we make tools for machine learning. If you enjoyed this article, consider following us on Twitter or YouTube. Thanks!
Related ReadingIf you'd like to learn more about machine learning, you might enjoy these articles:
An Introduction to Linear Regression For Machine Learning (With Examples)
In this article, we provide an overview of, and a tutorial on, linear regression using scikit-learn, with code and interactive visualizations so you can follow.
Introduction to K-Means Clustering (With Examples)
A tutorial covering K-Means Clustering, complete with code and interactive visualizations.
Feature Report: W&B Embeddings Projector
W&B's Embedding Projector allows users to plot multi-dimensional embeddings on a 2D plane using common dimension reduction algorithms like PCA, UMAP, and t-SNE.
Tutorial: Regression and Classification on XGBoost
A short tutorial on how you can use XGBoost with code and interactive visualizations.
﻿
﻿
Add a comment
Thomas Capelle • 3 years ago
Loved the piece, it's great to get the big picture!
Tags: Articles, Beginner, Domain Agnostic, K-means
Iterate on AI agents and models faster. Try Weights & Biases today.