A Gentle Introduction to Supervised Learning

In this article, we explore the subject of supervised learning in detail, including what it is, how it works, and what it's most-frequent applications are.
Mostafa Ibrahim
Created on December 17|Last edited on February 28
Comment
﻿﻿Supervised learning is a paradigm of machine learning which has myriad applications, from tumor detection to preventing hate speech on social media. 
In this article, we'll explore the subject of supervised learning to fully understand what is, how it works, and how it is used by machine learning practitioners. 
Here's what we'll cover: 
Table of ContentsWhat Is Supervised Learning?Types of Supervised LearningPopular Supervised Learning AlgorithmsWhat Is Supervised Learning Used For?Supervised Learning in Computer VisionConclusion
﻿
﻿
Let's get started! 
What Is Supervised Learning?Supervised learning is a subset of machine learning where a model is trained on labeled (inputs) data with the resulting supervised model able predict different outcomes (outputs). The idea behind supervised learning was initially extracted from the human learning process. 
For example, think about how a mother might teach her child how to differentiate between two a cat and a dog. She might show her child images of both animals to teach them to differentiate between each. If she just shows her child the images and stops there (i.e. without informing the child which animal is which) that would be an example of unsupervised learning. On the other hand, if she identifies each image and tells her child if it's a cat or a dog, this is what we call supervised learning.
﻿Source﻿﻿﻿
Even though there are tons of supervised machine learning algorithms, all of them learn from labeled or predefined images (in this case, a photo with the label "cat" or "dog"). 
Types of Supervised Learning
Regression Regression is a type of supervised learning used to predict continuous values, a.k.a.  numerical variables with an infinite range. 
A classic example of a regression model is predicting the value of real estate. Such a model can predict any possible price value with no limitations. It can even predict the price to the nearest 30th decimal place if required.
ClassificationHeavily used in machine learning, supervised classification models categorize data points into discrete values such as classes or categories. 
An example of a classification model would be the one we teased above: classification of cats and dogs. Here, a model would be trained on thousands of labeled images of both animals, ideally, a fairly equal ratio of both classes. The model is then capable of predicting only one out of the two results (cat or a dog), returning a discrete value. 
In real-life scenarios, classification models predict boolean values, such as whether an event occurred (true) or not (false). It's important to note classification models aren't necessarily binary: you could train the cat/dog model above to identify birds or vegetables or anything else too. You'd just need to make sure you had sufficient training data. 
Popular Supervised Learning Algorithms
Linear Regression
﻿
﻿Source﻿
Linear regression is perhaps the most basic machine learning model. As the name implies, a linear regression model builds a linear relationship between the input and output features. Going back to our previous example, imagine the X-axis image above being time and the Y-axis being home prices. The linear regression model would predict the value of a home at any given time in the future.
While linear regression offers great simplicity, it cannot predict complex, nonlinear relationship patterns. A nonlinear relationship, in this case, implies that it isn't always the case that the Y-feature will always be either increasing or decreasing along the X-feature at all times.
Linear Regression Equation(Source)
The linear regression algorithm utilizes the above formula, which gives a coefficient/weight B to every variable/input X.
Polynomial regression can be considered as the improved and enhanced version of linear regression, as it provides better handling for complex problems. In such a model, a higher polynomial relationship is created between the features. Linear and polynomial regressions return continuous values.
Polynomial Regression Equation (Source)
A similar formula is used for the polynomial regression algorithm, but in this case, a variable can have exponential power. 
Logistic Regression
﻿Source﻿﻿﻿
Logistic regression is actually a classification model and not a regression one. Logistic regression models return a discrete value used in classifying objects into different classes. Logistic regression follows a similar algorithm to linear regression, but unlike linear regression, logistic regression uses a logistic function in order to draw the best-fit curve.
Logistic Regression Equation(Source)
Logistic regression utilizes the above formula in order to classify given points. In the above formula, the left-hand side determines the probability that an event does occur, while the right-hand side is the linear regression equation.
Support Vector MachinesA support vector machine (SVM) is a supervised machine learning algorithm capable of handling both regression and classification problems. However, it is more commonly used for classification rather than regression. 
Initially, the SVM algorithm plots all the data points onto a single graph. Then, the algorithm splits the data points into multiple categories by drawing a best-fit line.
﻿Source﻿﻿﻿
In cases where two or more features are utilized, the SVM algorithm will plot the data into a high-dimensional graph. Since a 2D line won't be sufficient to split the data anymore, a hyperplane will be required.
﻿Source﻿
In a 2D support machine vector model, the hyperplane equation will be defined as w.x+b=0w.x+b=0w.x+b=0﻿. In the above case, if a point returns a negative value, then it belongs to the left group (blue), and if it returns a positive value, then it belongs to the right group (purple).
K-Nearest Neighbors
﻿Source﻿
The K-nearest neighbor (KNN) algorithm is a supervised classification algorithm that utilizes a more straightforward approach for classifying its data points. The model plots all the available data points into a graph and clusters (classifies) each data point into a given class depending on its labeled value. The user then identifies an integer value for the variable K. 
When the model evaluates the class to which a new point belongs, it will plot the new point into the pre-existing graph and search for the nearest K neighboring points to the newly plotted point. The point will then be identified depending on the class with the most points that are nearest.
K-Nearest Neighbors Distance Equation(Source)
The K-nearest neighbor algorithm utilizes the above formula in order to calculate the distance between the newly inserted point and all other points. Note that there are other variations of the KNN algorithm, which add a given weight to each point depending on the actual distance. This means that if a KNN (k=2) with two different class points is the nearest to the newly inserted point, the closest point will have a higher weight settling the tie.
Decision TreesA decision tree is a supervised classification machine learning model. As the name may imply, the decision tree model utilizes an inverted tree data structure in order to categorize its data. The root node holds the most generic value. As we move deeper down the tree, classes start to be less generic and more specific till we reach the leaf nodes (the final level of nodes in a tree). The leaf node value will be the value predicted by the model. 
Say we want to classify an animal. A decision tree might begin with whether it's a mammal or reptile or insect, then branch into more granular categories. By answering a specific feature about the given animal, we can then choose which tree branch to move along. It's worth noting that decision trees can also be used for regression problems, in which the tree will return a continuous variable.
Neural Networks
﻿Source﻿
Initially inspired by the human brain, a neural network model is a deep learning technique that uses a network-like structure in order to predict its final value. Neural network models are very versatile as they can be used for regression and classification problems.
A simple neural network will consist of around four layers, in which each layer receives data as its input and returns an output, which is then fed into the next layer. The final value of the last layer is expected to be a single value. A neural network is a more complex machine learning process, it utilizes a given amount of nodes, levels, and weight for each node.
Neural Networks Loss function(Source)
A loss function optimizes the model’s performance by comparing the target value and the predicted output. The main goal of the model is to minimize the loss between those two values, resulting in the model giving a more accurate result.
What Is Supervised Learning Used For?
Supervised Learning in Natural Language Processing﻿Natural language processing (NLP) is a machine learning subfield focusing on the translation and conception of the human language.
While we have both supervised and unsupervised NLP models, both models are usually trained through data in textual or audio formats. 
Some NLP problems we solve this way: 
Semantic (or Sentiment) Analysis﻿Semantic analysis is the process of identifying the meaning (sentiment) behind a given sentence. 
A simple example here is the sentence "I hate it when it rains." Such a sentence holds a negative (sentiment) representation. A model can deduce such a thing from given keywords, such as the word "hate."
﻿Source﻿﻿﻿﻿﻿
Contrary to what you might believe, semantic analysis is heavily used in many industries. Take famous social media sites as an example. Facebook uses semantic analysis on posts to detect and prevent hate speech. 
Spam Email Filtering
﻿Source﻿
By training on thousands of already predefined spam emails, machine learning models can learn to predict if newly sent emails fall under the spam category. The model might identify spam emails by focusing on keywords such as "you won," "receive your gift," etc. 
It goes without saying that no spam filtering is one hundred percent accurate, as some spam emails might still pass to the user. But these models are quite advanced and have been around for a while now. Your inbox would be unrecognizable (and pretty miserable) without them. 
Language TranslationWith the most famous translation provider being Google with its well-known Google translate, translation apps nowadays are capable of translating up to 100 different languages from all over the world. The model is trained on two or more languages in which each language contains pre-defined word meanings. 
Supervised Learning in Computer Vision﻿Computer vision is another example of an artificial intelligence field that utilizes supervised learning. Computer vision enables computers to retrieve meaningful data from visual representations such as images and videos. Actions are then perceived depending on the received insights. A few (non-exhaustive) examples of computer vision supervised learning use cases: 
Tumor Detection and Diagnosis
﻿Source﻿
Falling under the image analysis category, tumor diagnosis is a life-saving implementation of AI. An image analysis model is trained on tens of thousands of already predefined cancer images. For each image fed to the model, a clear tumor marking is identified, making our model a supervised one.
Image diagnosis can be used to detect a single disease or to detect and differentiate between multiple diseases, for example, different tumor types. 
Autonomous Driving
﻿Source﻿
An autonomous driving vehicle is one capable of sensing its environment without any human interaction, such a vehicle can then take action to simulate human-like driving abilities. 
The main challenge with autonomous driving is providing vehicles with human-like vision. Enabling self-driving vehicles to detect and differentiate between different objects such as other vehicles, pedestrians, roads, traffic lights, and so on. 
To train such a model, a diverse variety of video segments with the previously mentioned objects already predefined are required. For example, a given video segment would be perceived from a driver's point of view, whereas vehicles are identified with a given color, such as yellow. 
By contrast, pedestrians are identified with a different color, such as orange, and bicycles with red, and so on.
﻿Source﻿
ConclusionSupervised learning is one of the most widely used subfields of machine learning out there. Whether helping us diagnose cancerous tumors, filtering spam emails out of our mail inbox, or to identify and eliminating hate speech from social media sites from all over the internet, with the correct model and data, supervised machine learning has already changed the world for the better. And it will continue to do so for decades to come. 
﻿
Add a comment
Tags: Articles, Domain Agnostic, NLP, Computer Vision, Beginner
Iterate on AI agents and models faster. Try Weights & Biases today.